An error in one of the most widely used methods in epigenetics, DIP-seq, can cause misleading results, researchers at Linköping University, Sweden, have shown. This may have major significance in the research field, where "big data" and advanced methods of DNA analysis are used to study vast amounts of epigenetic data. The error can be corrected in previously collected DIP-seq data, which may lead to new discoveries from previous studies of human epigenetics. The results have been published in the journal Nature Methods.

In principle, every cell in our body has the same DNA sequence. However, different cell types use very different groups of genes. This means that additional signals are required to control which genes are used in each individual cell type. One type of such signal consists of chemical groups directly attached to the DNA sequence. These chemical modifications of the DNA sequence form part of what is commonly called the "epigenetic code." Epigenetic regulation of genes plays an important role in normal human development but is also associated with many diseases, such as cancer.

Researchers at Linköping University have now discovered a weakness in one of the most frequently used methods in epigenetic research, DNA immunoprecipitation sequencing (DIP-seq). Put simply, this method is based on picking out the parts of the DNA that carry a particular epigenetic signal. For this, the researchers use various antibodies that recognise a specific chemical structure and bind to it. The antibodies are subsequently sorted and the sequences of the DNA that they have bound to are determined. Nestor's group noticed that certain epigenetic marks always occurred in the same place, even in DNA that shouldn't contain those epigenetic marks at all.

"Our discovery highlights the importance of experimental validation when using high-throughput technologies in research. Without such experimental rigour, pervasive errors can hide in plain sight, concealed by their 'consistency' across studies" says Colm Nestor, Assistant Professor at the Department of Clinical and Experimental Medicine and lead investigator of the study.

By analysing more than 125 existing datasets Nestor's group revealed that DIP-seq commonly detected DNA sequences that did not have any epigenetic marks. These false positives constitute 50-90% of the detected DNA regions, and the magnitude of the effect differs between different datasets. "Now that we know about this error, it's extremely simple to subtract it away. Correcting for these errors will allow novel discoveries to be made from the wealth of epigenetics data already in the public domain" says Colm Nestor.

The researchers point out that the vast majority of results from previous studies are correct.

"We should continue to use these methods but correct for these errors by using appropriate experimental design" says Colm Nestor.