Imagine trying to follow a complex novel many times longer than War and Peace with hundreds of characters and twists. With every cancer having a unique story hidden inside its genetic code, this is similar to the challenge facing modern researchers.

Now imagine trying to follow the plot when some of the pages are missing.

This week, our latest study published in the journal Cancer Research, suggests that such a challenge faces scientists deciphering a cancer’s story. We’ve discovered that the technology we use to “read” cancer’s DNA can sometimes cause pieces of the story to be missed at potentially crucial moments in the plot. It’s as if your eReader or tablet kept inexplicably skipping paragraphs as you tap through a novel.

By identifying this missing data we hope to develop techniques to look more closely to see if these areas contain further clues about how to tackle cancer.

Written in code

Every tumour has its own unique story about how it descended from being a normal cell into an invasive cancer. These tales are hidden in the genetic code that serves as the blueprint for each job a cell carries out. And the plot twists that make up the individual cancer story are caused by mistakes, or mutations, in this genetic blueprint.

So finding these mutations is a major challenge, and pinpointing them could help scientists develop new treatments to stop the growth of the cancer cells.

Fortunately, modern gene-reading technology has greatly improved our ability to spot these mutations. In our lab we’ve been using a technique called “Next Generation Sequencing”. Using this technology we can now reveal most of the genetic story of a cancer (more than three billion letters of code) in a single experiment.

But when we were searching genetic data from cancer cells for new mutations to target, we spotted inconsistencies between the different versions of data being shared from research institutes around the world. The question was why.

We wanted to find out if this was a common problem that might be preventing us from finding new cancer-causing mutations. To test this we turned to cancer data from two major online databases – the Sanger Institute’s COSMIC database, and the Broad Institute’s Cancer Cell Line Encyclopaedia.

Same cancer, different stories

We looked through the databases for where the two institutes had collected genetic data from the same type of cancer cell. If the cancer cells are matched you would expect the results to be similar, but what we found was surprising.

The databases showed that the genetic stories produced by the two institutes only matched for around half of the mutations. And in some instances there was much less agreement. We wanted to know why.

Missing pages

We picked out some of the cancer samples and homed in on sections of the story, looking for clues to explain why one institute was detecting a mutation when another institute was not. We found that in many cases, the discrepancies were caused by some samples not being read as completely as others.

When we looked for a reason, we found these poorly read regions often landed in areas where the code was less complex. Paradoxically, simple regions of code were actually making it harder for the gene-reading technology to spot any changes.

Another way to look at it is that in some areas of the cancer story the pages were effectively sticking together. This meant that valuable information was missing and the correct version of events could not be established.

Are these missing pages important?

The question is: are these missing data just irrelevant filler or do they contain important information that might reveal the strengths and weaknesses of a cancer? To answer this we used different techniques to look at these hidden areas in lung cancer samples, and found a mutation in a gene called PAK4 that had been previously missed.

When we examined the effects of this mutation we found it had the potential to make cancer cells grow more quickly. This indicates that some of these missing pages might carry important information about how a cancer behaves.

As we move into an era where the genetic history of more and more cancer biopsies are read it is important to understand the limitations of the technology.

This is especially important when genetic information becomes more routinely used to help make decisions about the best treatment for a patient.

Filling in the gaps

The good news is that gene reading technology is improving and is beginning to fill in these missing areas. This will help us learn more about these “blind spots” and we can use this information to develop new treatments in the future.

In the meantime, we’ll take a closer look at these regions as we also found other explanations about why the data between institutes does not match up.

So far we’ve pinpointed the location of more than 400 areas of missing data. By homing in on these missing pages we hope to piece together cancer’s complex story and write important new chapters which could one day lead to the development of new treatments for patients.