Analysing the DNA of thousands of patients can help to uncover the genetic roots of diseases and shed light on the underlying biological mechanisms. This can reveal targets for drug development.

A new and very different type of genetic research has emerged this millennium – the genome-wide association study (GWAS, pronounced “gee-was”). By probing small genetic differences between people, such studies can help to uncover the biological roots of disease and have already helped to guide drug development. Researchers including Professor Chris Ponting, an expert in biomedical genomics, are asking the main UK research funders to finance a large GWAS for ME/CFS.

The genome is our complete set of genetic information – all our DNA – and is made up of more than three billion DNA nucleotide base pairs. These nucleotides, usually referred to by their initial letters of A, T, G and C, spell out our genetic content. That includes all our genes, which are the bits of our genome that tell the body how to make all its proteins.

There are roughly ten million places along the human genome where these individual letters can vary from person to person. For example, one person might have a G while another has an A. Each site where the letters vary is called a single nucleotide polymorphism, or SNP (pronounced “snip”) and a GWAS reads out around a million of these SNPs.

Example of an SNP where one person has an A and another person has a G.

The DNA in our cells is divided into 23 large chunks, or chromosomes, which come in pairs. Just as there is a SNP in one chromosome, so there is an SNP in its corresponding pair. The exception to this is the 23rd pair in males, the X-Y sex chromosomes, but sex chromosomes are usually excluded from GWAS.

So, each person will have two results for each SNP, one for each chromosome. In the example above, the possibilities are an A in both chromosomes (AA), a G in both (GG), or one in each (AG).

Researchers look for when a particular SNP version is strongly associated with a disease or a characteristic such as intelligence. For example, the “G” version of a particular SNP might be more common in people with ME/CFS than in healthy controls. This demonstrates a genetic influence on the risk of disease or on the characteristic – and what genes are involved, and what those genes do can reveal a great deal about the underlying biology.

What’s different about these studies?

At school, we learned about traditional genetics – for example, how a single gene determines our blood type. In a similar way, there are genetic diseases, such as cystic fibrosis, where anyone carrying two faulty copies of a particular gene will inevitably develop the disease.

Yet the role of genetics is usually more subtle than this. As a result of GWAS (confusingly, the same acronym is usually used for both the singular and the plural), we now know that hundreds of genes affect height, but each gene typically increases or decreases height by around 1 mm. (Environmental factors such as childhood nutrition affect height too.) Ponting told me in an email conversation:

“SNPs are like small pieces of paper that – placed under the legs of a pool table – tip the balance of the table so that a ball rolls in one direction more than another.”

The situation is similar for many diseases, such as rheumatoid arthritis and Type II diabetes, where many genes each have a small effect, rather than one making a big difference.

In these diseases, it is not usually the case of a gene being “faulty”, but rather that certain genetic differences can increase or decrease the risk of getting a disease. Researchers hope to find out what these genes are, and what they do in both sickness and in health, as a way of understanding the biological mechanisms that cause disease.

In the case of ME/CFS, Ponting told me, we are probably looking at genetic predictors of someone’s ability to recover from a viral infection, or other environmental challenges.

How do GWAS work?

An SNP chip (“snip chip”) to measure the SNP versions for one person. Thermofisher

GWAS need very large samples if they are to discover the very small effects that influence the likes of height and Type II diabetes. Even a study with 2,000 patients is now considered small and so studies typically use at least 10,000 patients to generate robust results. Studies of diseases include similar numbers of healthy patients to aid comparison.

Researchers use genetic probes mounted on special chips to identify which version each person has of each of the roughly million SNPs in a GWAS, similar to the information provided to individuals by companies such as 23andMe. Researchers then look to see if any of these SNPs are significantly more or less common in people with the disease than in those without.

Caption: A Manhattan plot showing the result of a GWAS. Along the horizontal axis are individual SNPs along 22 pairs of chromosomes (excluding the 23rd pair – the sex chromosomes – which are hard to analyse in a GWAS). The vertical axis shows the probability of the association with disease occurring by chance. The further up the axis the spot occurs, the more likely it is that the SNP has a real association with disease. Spots above the dashed line are considered statistically significant.

The results are displayed in Manhattan plots, so named because they look like the Manhattan skyline. Each spot represents the probability that an SNP is associated with disease. The higher up the plot the result, the more confident scientists can be that the association with disease is real. Only a very few of the DNA differences will reach significance: typically, a study of 10,000 patients might generate anywhere between a few and 50 significant hits.

The power of GWAS

Normally in science, researchers form a theory about what the problem is (a hypothesis) and then devise an experiment to test that theory. But if no one guesses (hypothesises) what the problem is, they can’t find it.

GWAS are different: there is no need to guess what the problem is first. By scanning across the whole genome, scientists are effectively searching through all the biological processes in the body, looking for issues. And that means the researchers can find previously unsuspected problems. This makes the technique particularly suitable for studying ME/CFS.

With ME/CFS, scientists have pursued many different hypotheses about its cause over the years but have yet to make a breakthrough. University College London’s Emeritus Professor Jonathan Edwards argues that it is time for broad approaches like GWAS with the potential to generate new avenues to explore. He told me in a recent email,

“The success of research into causes of disease hinges on someone suddenly having a brilliant idea… yet for ME/CFS nobody has a strong enough lead to show everyone the way forward. So it makes sense to set up a comprehensive fishing trip to see if we can trawl up some clues. Genetic screening [a GWAS] is probably the best bet for finding such clues.”

GWAS findings reflect cause rather than effect

In addition, compared with most other techniques, a GWAS holds a distinct advantage: its significant results reflect cause rather than effect.

To see this, let’s compare GWAS results with changes in blood levels of cytokines (immune messenger molecules). Cytokine levels might go up or down as a downstream effect of an illness rather than being the underlying cause. But DNA doesn’t change with ME onset – it’s the same before and after the illness starts. So, because the DNA association with disease cannot be an effect of illness it must instead reflect its cause.

And that means that clues emerging from GWAS are particularly valuable.

Discovery: from GWAS to biological mechanisms

Occasionally, researchers get lucky and a GWAS highlights an SNP that itself affects a gene with known function and the biological explanation of the disease is clear. For example, a rare mutation in an immune cell receptor gene significantly increases the risk of Alzheimer’s disease.

Usually, though, the connection with disease is harder to explain.

To understand why this requires an appreciation that DNA differences that lie next to each other on a chromosome tend to be inherited together in blocks down the generations. Effectively, they are linked together and so if an SNP is associated with disease, all the rest of the DNA differences in that block are also associated with disease. And each block will almost certainly harbour numerous other SNPs.

In fact, GWAS typically screens only around 10% of all common SNPs because its immediate neighbours, the ones on the same block, will behave in a virtually identical way. Including neighbouring SNPs in a GWAS will add very little information.

In practice, the SNPs in a GWAS serve as genetic markers, each tagging a block of DNA. The question is: which is the SNP within the block that is really having an impact on the disease?

To be able to have an influence on disease, an SNP must affect a gene. If it is in the gene coding region it can change the protein the gene makes. But it is more usual for it to lie elsewhere, in a regulatory region that affects how much protein the gene produces (see the diagram below.)

Caption: A linked block of DNA that is inherited together. A linked SNP may be significant in a GWAS, but it is usually a nearby causative SNP, affecting gene functioning, that provides the real story.

Unfortunately, the association that emerges from a GWAS does not specify which is the causal SNP or which is the target gene, as a recent article points out.

Asked how researchers pinpoint the relevant SNP and gene, Ponting said that, very often, “it’s a guess”.

Sometimes, the answer seems obvious. For example, if a GWAS for an autoimmune disease identifies a DNA block containing a single immune gene, then that gene is an obvious candidate to be the link with disease.

Ponting and colleagues identified a potential straightforward gene link to ME/CFS from an analysis of data in the UK Biobank (not the ME/CFS biobank). The biobank contains 1,829 people who report having a diagnosis of CFS. They found that people with CFS were slightly more likely to have an SNP within the ornithine transporter protein (ornithine is an amino acid, but not one that appears in proteins). Although the SNP has no effect itself, nearby ones affect the level of gene expression, making its plausible that this association is biologically relevant.

However, often the situation is more complicated, with many possible SNPs or genes that could be involved.

Ponting stressed that while the interpretation process begins with a guess, it is down to experimental biology, or medicine, to find out if the guess is right or wrong.

Summary: how GWAS help to find the causes of diseases

The genes and biology identified by this work are important clues that biomedical researchers can then pursue:

GWAS successes

Despite these limitations, GWAS studies and follow-up work have already thrown light on the mechanisms of several diseases and have helped to identify new or promising drug therapies.

Rheumatoid arthritis

Rheumatoid arthritis is a painful autoimmune condition in which the body’s own antibodies attack joints, particularly those in the wrist and hand, causing them to swell up.

Autoantibodies attack protein fragments via amino acids that have been modified by a process called citrullination. GWAS and follow-up work helped reveal the molecular reason why citrullinated proteins can trigger the autoantibodies, giving a better understanding of how rheumatoid arthritis starts.

GWAS also identified a particular group of enzymes responsible for citrullination, and inhibitors of these enzymes show promise in treating rheumatoid arthritis.

Other autoimmune diseases

Analysing DNA from one patient reveals little, but combining data from thousands of people in a GWAS can reveaal much more.

Researchers have compared GWAS results from different illnesses to see if they have something in common – and autoimmune diseases often do. Findings from these studies have led to the identification of a common pathway for several diseases, one that includes an immune-regulating molecule called IL-23. As a result of this insight, existing drugs that are used to inhibit the IL-23 pathway in other diseases have become a mainstay treatment for several autoimmune conditions, including psoriasis and ankylosing spondylitis.

Type II diabetes

Type II diabetes is becoming a global health problem. In the illness, the body is no longer able to control blood sugar levels with insulin, leading to health problems including heart disease, sight problems and kidney issues. Although Type II diabetes is strongly linked to obesity, certain other genes separately increase the risk of diabetes as well.

GWAS have helped establish the identity of numerous genes involved, some of which either affect the production of insulin in the pancreas, or the action of insulin on fat cells, liver cells and some immune cells. This information could help guide drug development.

GWAS have also helped to identify an unsuspected role in Type II diabetes for a protein that transports zinc into cells, and drugs targeting this protein are being developed as treatments.

More generally, the potential of GWAS to guide drug development for many illnesses was underlined by a recent study. Its authors concluded from their analysis that using GWAS to guide the choice of which candidate drugs to develop could double the success rate for finding treatments that make it into the clinic.

Update, 5.8.19:

Eric Lander highlighted further GWAS successes at a recent lecture at the Broad Institute (which is based at Harvard and MIT).

Heart disease . Analysis of significant SNPs revealed that HDL-cholesterol is not protective (which explains why the $5 billion investment by the Pharma industry in drugs that increase HDL came to nothing). Instead, they reveal that triglycerides are a risk. It turns out that HDL is negatively correlated with triglycerides so the correlation of HDL with heart disease was merely correlation , not causation .

. Analysis of significant SNPs revealed that HDL-cholesterol is not protective (which explains why the $5 billion investment by the Pharma industry in drugs that increase HDL came to nothing). Instead, they reveal that triglycerides are a risk. It turns out that HDL is negatively correlated with triglycerides so the correlation of HDL with heart disease was merely , not causation Inflammatory bowel disease. 10 new significant biological pathways identified. Autophagy, where the body kills off old cells, recycling their contents, and TGF beta signalling, were identified as therapeutic targets.

Alzheimer’s disease. Microglia, the immune cells of the brain, have been shown to play a key role in the disease.

Obesity. GWAS shows that thermogenesis (where “brown fat” cells burn off fat to produce heat) is an important pathway impacting on BMI.

Genome-wide association studies are coming of age

What’s currently limiting GWAS as a research tool is the difficulty in going from the SNPs that are significant in a GWAS to the genes that are having an impact on the disease.

Yet researchers are rapidly developing analytical techniques to identify the causal genetic changes that drive disease risk, such as the fiendishly titled two-sample Mendelian randomisation and its proteome-by-phenome spin-off.

Ponting’s lab is among many groups working on innovative methods. As a result of all these developments, says Ponting, “We could be on the brink of a surge of GWAS-based discoveries about diseases. ME could and should be part of that.”

GWAS can sweep across the whole of human biology looking for potential mechanisms that might cause ME/CFS, even unsuspected mechanisms. It’s a remarkable technique and this is the ideal time for an ME/CFS study.

UK researchers, including Chris Ponting, colleagues from the CMRC and the CureME team are aiming to put a proposal in for a large GWAS later this year. The study might need as many as 20,000 patients.

But recruiting so many ME/CFS patients would pose an unprecedented challenge for the researchers. The study would be the largest ever conducted in ME/CFS.

However, recent years have seen rapid growth of an actio- orientated patient community around the world. MillionsMissing events, for example, have shown what patients can come together to achieve. Chris Ponting says,

“we can get this done, and done fast – but it will be people with ME who make it happen.”

It’s rare that any patient can take part in research about their illness, but this study gives us all this chance. It would be an incredible project, the world’s biggest ME/CFS study, that could help uncover the biological roots to our disease.

I’d love to take part in this research. If it is funded, and when the time comes, I hope you’ll join me.

Follow blog by email »

Image credits: DNA/crowd, ID 73451852 © Roman Fedin | Dreamstime.com; SNP example, provenance unknown; Manhattan plot, Ikram, 2010/Wikipedia; Linked vs causal SNPs, University of Utah; DNA strings by Arek Socha from Pixabay; Man and DNA, (c) Can Stock Photo / DavidCarillet; DNA crowd, (c) Can Stock Photo / DavidCarillet