While sequencing an entire genome is eminently doable, sequencing an exome is easier. The exome contains all of the DNA that dictates the amino acid sequences of all the proteins a cell needs to make, so any mutations that will change a protein's sequence can be found in it. It's easier simply because only about two percent of the entire genome encodes proteins.

But not all disease-causing genetic mutations alter amino acid sequences. Evidence has been accumulating for some time that mutations in non-coding regions of DNA—which can dictate how much of a protein is produced, in which cells, and at which times—can also cause trouble. Computational models have been used to try to find different types of non-coding mutations. A new one has just been developed to try to find mutations that alter the processing of RNA and correlate these mutations to disease; results are reported in Science.

To retain its fidelity, DNA stays in the nucleus, like we'd keep our valuables in a vault. The DNA has all the directions for how to make the proteins the cell needs, but the protein making machinery is outside of the nucleus. So the cell makes copies of the DNA—messenger RNA molecules—and these RNA copies leave the nucleus to get translated into protein.

The protein coding regions of DNA are called exons (hence the exome mentioned above); these are interspersed with regions called introns that do not encode protein. The exons and introns are copied into RNA together, then the introns are spliced out and the exons are spliced together. When this process is done, only the exons get read into protein.

RNA splicing is a complicated process and, if it goes wrong, it can result in the production of aberrant proteins. Since this can contribute to human disease, researchers made a computer model that predicts whether or not a given exon is included in an RNA molecule. For a given cell type, the computational model extracts the DNA code that regulates splicing and predicts whether nearby exons will be included in messenger RNAs. A splicing event can thus be correlated with particular genetic sequences nearby, and any mutations in those sequences can then be analyzed for any other effects they may have.

Importantly, the model was not trained with disease sequences, so it is not predisposed to find them.

Using this model, the researchers were able to categorize mutations that disrupt splicing. Rare variants disrupt splicing more than common ones do, especially rare variants associated with disease. Most intronic mutations that disrupt splicing are within thirty bases of the splice site, but some are further away.

Within exons, synonymous mutations—those that change a gene's DNA sequence but not the corresponding protein—can alter splicing, which can explain their contribution to human cancers. Synonymous mutations that are known to have a disease association were nine times more likely to disrupt splicing than benign mutations. Among the exon mutations that alter proteins, those that had a small effect on protein function were five times more likely to disrupt splicing than those that significantly altered protein function.

After determining that their model was robust, the researchers used their it to analyze three human diseases with very different types of genetic causes: spinal muscular atrophy, a leading cause of infant mortality that is caused by an autosomal recessive single gene; hereditary nonpolyposis colorectal cancer, a condition in which ninety percent of cases are caused by mutations in two genes; and autism, in which over a hundred genes have been implicated.

In the first two, their model predicted that mutations that caused diseases through unknown mechanisms generally worked by changing splicing. The rare splicing variants they identified in autistic people were clustered in genes that are highly expressed in brain; this was not the case for controls. They thus suggest that their computational model can serve two purposes. It can be used to find new genetic determinants of autism and other disorders. And it can be sent to sift through mutations we already know about but don't understand, to find out if they affect splicing.

Science, 2014. DOI: 10.1126/science.1254806 (About DOIs).