Systematically searching DNA for regulatory elements indicates limits of previous thinking.

All the tissues in the human body are made from proteins, and for every protein, there’s a stretch of DNA in the human genome that “codes” for it, or describes the sequence of amino acids that will produce it.

But these coding regions constitute only about 1 percent of the genome, and scattered throughout the other 99 percent are sequences involved in regulating gene expression, or determining which coding regions will be translated into proteins. And when.

In the latest issue of Nature Biotechnology, researchers at MIT and Harvard Medical School describe a new technique for systematically but efficiently searching long stretches of the genome for regulatory elements. And in their first application of the technique, they find evidence that current thinking about gene regulation is incomplete.

“Conventional assays have chopped out little pieces of the genome and asked whether they’re sufficient for driving gene expression,” says David Gifford, a professor of electrical engineering and computer science at MIT and one of the new paper’s senior authors. “There are two limitations to that. The first is that it may be that something is sufficient for gene activation, but that does not mean that it’s necessary. And vice versa: If something is necessary, it may not be sufficient. So these assays really don’t reveal the complete story on the function of genomic DNA.”

“The other problem with these assays is that they are not done in a native context,” Gifford adds — that is, the excised segments of DNA are not in their normal location within the genome. “We were interested in using a direct assay that revealed the necessity of a genomic sequence in its native context — in the cell, in the place in the genome where it normally resides. We mutate it right where it normally functions.”

Nisha Rajagopal, a postdoc in Gifford’s group, is lead author on the paper. Richard Sherwood, a research fellow at Harvard Medical School, is the other senior author, and they’re joined by seven other researchers in both Gifford’s and Sherwood’s groups.

Unmarked paths

In recent years, one of the main techniques for identifying regulatory elements in the genome has been the use of so-called histone marks. In the cell, DNA is usually wrapped into tight coils around proteins called histones. The ends of the histones frequently have modifications — such as the addition of acetyl or methyl groups.

Those modifications are the histone marks, and certain marks appear to be associated with the suppression or promotion of gene expression. Biologists find histones bearing these marks, slice out the segments of DNA wrapped around them, and sequence the DNA. When they find the corresponding sequences in the map of the genome, they can begin conducting narrowly targeted experiments to try to identify regulatory elements.

The MIT and Harvard researchers’ technique, however, has identified regions of the genome that appear to play a crucial role in gene regulation, but which have not previously been associated with histone marks. “Science is always subject to a set of assumptions,” Gifford says. “What this whole study demonstrates, in my opinion, is that it’s important to think carefully about our assumptions. It doesn't disprove the assumptions categorically, but it in some sense demands further exploration of what is necessary for genomic function.”

Gifford and his colleagues’ technique is an application of the CRISPR gene-editing system. CRISPR is a method for cutting DNA; the researchers found a way to space those cuts at regular intervals around a protein-coding region of interest. In the new paper, they report experiments in which they exhaustively searched spans of tens of thousands of base pairs, or DNA letters, around each of four known protein-coding regions.

RNA guides

To make cuts at regular intervals, the researchers designed 4,000 guide RNAs, small biological molecules that lead the CRISPR cutting enzyme to the right locations in the genome.

In the researchers’ experiments, the guide RNAs are manufactured inside the cell. For all the guide RNAs, the researchers constructed DNA templates, which the cells naturally absorbed. On average, each DNA template showed up in about 1,000 cells. Each cell converted exactly one DNA template into a guide RNA, which led the CRISPR cutting enzyme to a specific location in the genome.

At each of those locations, the CRISPR enzyme cut the DNA. When the cells tried to repair those cuts, the DNA sequences at the repair sites became garbled. In some cases, the garbling prevented the cells from manufacturing the appropriate proteins, indicating regions of functional importance.

The researchers then conducted a second set of experiments, targeting just those regions, to determine the precise sequences whose modification interrupted protein production. Around each of the four protein-coding sequences they investigated, they found multiple stretches of roughly 1,000 base pairs that showed strong signs of regulatory activity, but which histone marks had not previously identified.

It could be, Gifford says, that those stretches were present in only a subset of cells, and that they were in fact associated with histone marks; the standard technique for identifying histone modifications relies on average measurements across the cell population, so it could miss outliers. Gifford, Rajagopal, and colleagues are continuing to investigate these regions, to determine just what’s going in them.