A research collaboration that combines novel "big-data" informatics tools with expertise in basic biology has uncovered details of an essential process in life: how a crucial enzyme locates the site on DNA where it begins to direct the synthesis of RNA. This finding may aid in the discovery of new antimicrobial medicines, and the powerful technological approaches developed for this research may shed light on other essential cellular processes.

A bioinformatics group from The Children's Hospital of Philadelphia collaborated with researchers from Rutgers University on the study, which appeared online today in Science.

"The algorithms we developed enable us to tackle many questions across diverse areas of DNA and RNA biology," said study co-author Deanne M. Taylor, Ph.D., Director of Bioinformatics in the Department of Biomedical and Health Informatics at The Children's Hospital of Philadelphia (CHOP). "Understanding these fundamental processes may help in developing antimicrobial treatments to fight bacterial disease."

Taylor collaborated on the study with biochemist Bryce Nickels, Ph.D., and chemist Richard Ebright, Ph.D., both from Rutgers, the State University of New Jersey.

The research focuses on transcription--how cells read genetic information stored in DNA by first synthesizing a copy of that genetic information as RNA. The enzyme RNA polymerase is the molecular machine that carries out transcription. In the current study, the CHOP/Rutgers team determined how RNA polymerase locates the site on DNA where it starts transcription.

In particular, working in bacteria, the CHOP/Rutgers team showed that after RNA polymerase binds to DNA and partly unwinds the two strands of the DNA helix, it then continues unwinding those two strands, pulling the unwound DNA strands into itself until it engages the transcription start site (TSS). The researchers call this process--unwinding DNA and pulling strands into itself--"DNA scrunching." Nickels points out, "Scientists have known for more than three decades that transcription start sites vary, but did not previously know the mechanism."

To detect DNA scrunching during TSS selection, the researchers developed powerful new experimental approaches, called MASTER and MASTER-XL. The CHOP/Rutgers team first described MASTER (for "massively systematic transcript end readout") in a December 2015 paper in Molecular Cell.

MASTER-XL combines the MASTER technology with crosslinking--introducing artificial amino acids at specific sites on proteins to crosslink to sites in DNA. Using high-throughput algorithms, the study team was able to precisely and rapidly pinpoint those crosslinking sites in a million different DNA sequences, each carrying a distinct TSS region. In each sequence, the team identified the TSS as well as front (leading edge) and rear (trailing edge) positions where RNA polymerase attached to DNA.

Yuanchao Zhang, a graduate student working with Taylor's bioinformatics group at CHOP, developed the big-data algorithms with Taylor to analyze the sequencing data output from MASTER and MASTER-XL experiments. "Our algorithms rapidly process many millions of DNA and RNA sequence reads," said Taylor.

The rapid sequencing, plus advanced biochemical and chemical methods underlying the crosslinking, provided a key finding on how DNA scrunching occurs during transcription. As the position of the TSS changes, the position of RNA polymerase's leading edge changes in lock step, but the enzyme's trailing edge remains in the same position. This causes the DNA to scrunch: it remains fastened to RNA polymerase at its trailing edge, but RNA polymerase unwinds the adjacent DNA and pulls the unwound DNA into itself until it locates a new TSS.

"The crucial feature of our approach," explained Ebright, "is the combination of protein-DNA crosslinking with next-generation-sequencing of DNA. This enables us to perform crosslinking studies with a million different DNA sequences in the same amount of time that we previously would have needed to perform crosslinking studies with one DNA sequence." He added, "The million-fold increase in throughput allows biological problems to be solved that couldn't be solved before."

The CHOP/Rutgers collaborators are now investigating transcription in higher organisms, analyzing whether DNA scrunching occurs during TSS selection, and if so, how it compares to the process in bacteria. The team also hopes to apply MASTER and MASTER-XL to analyzing other essential cellular processes such as DNA replication.