Guest blog by Professor Chris Ponting and colleagues.

UK Biobank – a national biobank different from the ME/CFS biobank – has data from around 500,000 individuals, including both healthy people and those with one or more of the many different diseases in the UK population. About 2,000 people in the sample reported that they had been given a diagnosis of CFS.

Analysis of data from this biobank indicates an inherited biological component for ME/CFS. The results show only one statistically significant change in a particular section of DNA and even this is problematic. This analysis indicates that a much bigger study, with many more ME/CFS cases, will be needed to indicate which genes and biological pathways are altered in people with ME/CFS.

Introduction

Myalgic encephalomyelitis (ME, also described as chronic fatigue syndrome, CFS) is a devastating long-term condition affecting 250,000 UK individuals. People with ME experience severe, disabling fatigue associated with post-exertional malaise. A few make good progress and may recover, while most others remain ill for years and may never recover. There is no known cause, or effective treatment for most. Consequently, it is vital to try new approaches to understand the reasons for the development of the condition.

This blog sets out what we can glean from the release, last summer, of data from about 500,000 individuals who make up the UK Biobank. (This biobank is not to be confused with the UK ME/CFS Biobank, UKMEB.) The data were acquired from individuals between 40 and 69 years of age in 2006-2010 who live across the UK. These people provided samples (e.g. blood, urine and saliva) and answered questionnaires. In addition, for some of these people their electronic health record data are being linked in. Importantly for this blog, the DNA variation (‘genotype’) of all the volunteer participants has been determined.

Genetic variation can provide insights into the causes of disease when these have a heritable component (i.e. are inherited down through the generations). DNA sequence is not altered by disease (except in cancer) and so variants can reveal the causes, rather than consequences, of disease.

Results

Here we draw heavily from an analysis of the UK Biobank data by Oriol Canela-Xandri, Konrad Rawlik and Albert Tenesa which is described in a preprint available from bioRxiv. (The authors have kindly shared their results in this way in order to share results with others before the findings have been peer reviewed.)

From this (specifically, Supplemental Table 1) we see that data were analysed from 1,829 people among the UK Biobank cohort who self-reported as having been diagnosed with ME/CFS. The table also provides five pieces of information:

(1) The prevalence of ME/CFS among UK Biobank individuals was 0.448%. In other words, picking any person randomly in the UK then there is an even chance that they know someone with ME/CFS if they know about 200 people.

(2) There is a reasonably strong female bias: the prevalence rates are female = 0.611%; male = 0.255%; so there are 2.4-fold more females than males with ME/CFS in the UK Biobank cohort.

(3) Extrapolating these numbers to the UK as a whole, here are the full population prevalence predictions (using 2016 estimates for UK census populations).

Female Male Total ENGLAND 171,630 69,339 240,969 SCOTLAND 16,784 6,781 23,565 WALES 9,668 3,906 13,574 N IRELAND 5,783 2,336 8,119 UK (total) 203,865 82,362 286,227

There is one caveat that should be mentioned with respect to these numbers. This is that the 500,000 people assessed in the Biobank, despite being recruited for assessment at 22 centres in Scotland, Wales and England, are not fully representative of the general population. There appears to be a “healthy volunteer” selection bias which would imply that the prevalence estimates are lower-bound values. Furthermore, if ME/CFS prevalence is different in other groups then this is not accounted for in the numbers above.

(4) ME/CFS has a biological component because the heritability of ME/CFS is not zero. Canela-Xandri et al. estimate that the genetic heritability (liability scale) is 0.080. This is slightly lower than the median heritability of heritable binary traits (0.11; see Figure 1). So among all such things measured, it’s in the lower half of the heritability, but not zero. Note that this doesn’t rule out non-heritable biological causes.

(5) The analysis identifies one, and only one, DNA position whose genetic variation associates with (in part) ME/CFS susceptibility. (The plot below is called a Manhattan plot and any point above the dashed line is predicted to be a significant “hit”. Each dot represents a position (X axis) along a chromosome – shown alternatively in red and blue – and its position on the Y-axis indicates the statistical significance of the association: the higher the better.)

This proposed “significant hit” is on chromosome 10 (position 74828696; rs150954845). The calculated p-value is 2.5×10-12. This DNA change (A-to-T) is predicted to alter a protein called P4HA1, changing an aspartic acid (“D”; GAT) for a valine (“V”; GTT) at its 124th amino acid position. P4HA1 is prolyl 4-hydroxylase subunit alpha 1: in other words, one part of prolyl 4-hydroxylase, a key enzyme in collagen synthesis. We know what this molecule looks like and where the aspartic acid (D124) occurs within it (below; courtesy of Luis Sanchez-Pulido).

We can even see at a resolution of 10-10 of a metre what effect such a change would have on the protein (below; courtesy of Luis Sanchez-Pulido).

Interpretation

So, should we believe that this amino acid change alters someone’s risk of developing ME? For five reasons we need to be cautious:

(a) ME is a complex condition, likely to be caused by many DNA changes each of small effect acting together with the environment, so the fact that only one association was found indicates that the study is under-powered. This means that it doesn’t have the number of patients sufficient to provide the statistical power needed to detect the major DNA changes associated with the illness: more individuals means greater statistical power.

(b) Second, this part of the protein is not conserved across evolution. There is even a nematode worm known that has a valine at exactly the position (124) that would be predicted to alter risk for ME in humans. This isn’t conclusive, but an amino acid change at a position that is shared across different species would have given us greater confidence in the prediction.

(c) Third, very few people have this amino acid change. Only 0.01% of the population have this alteration, and at such low levels it is difficult to calculate levels of significance accurately particularly when the numbers of people self-reporting with ME (here, n=1,829) are so much lower than the entire cohort (500,000).

(d) Fourth, this association was not reported to be significant in a separate study.

(e) The study relies of self-report of receiving diagnosis of chronic fatigue syndrome, so these cases have not been diagnosed by researchers as meeting any particular definition of ME/CFS.

Conclusions

If the UK Biobank prevalence of ME/CFS is repeated across different populations, then 34 million people worldwide will have this disorder, 2.4-fold more women than men.

ME/CFS has a biological component, as shown by its non-zero heritability in UK Biobank.

To obtain robust indications of which genes and which biological pathways are altered in which cells or tissues in people living with ME/CFS, then a much larger study is required. A GWAS with ten- or twenty-thousand cases, is likely to be necessary. Results will then need to be replicated in a separate cohort.

Chris Ponting, Luis Sanchez-Pulido, Katie Nicoll-Baines, Thibaud Boutin and Shona Kerr.

With thanks to Cathie Sudlow, Veronique Vitart, Oriol Canela-Xandri and Albert Tenesa for helpful comments.

MRC Human Genetics Unit at the MRC Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road South, Edinburgh, EH4 2XU, UK