Analyzing massive amounts of data officially became a national priority recently when the White House Office of Science and Technology Policy announced the Big Data Research and Development Initiative. A multi-disciplinary team of University of Missouri researchers rose to the big data challenge when they solved a major biological question by using a groundbreaking computer algorithm to find identical DNA sequences in different plant and animal species.

"Our algorithm found identical sequences of DNA located at completely different places on multiple plant genomes," said Dmitry Korkin, lead author and assistant professor of computer science. "No one has ever been able to do that before on such a scale."

"Our discovery helps solve some of the mysteries of plant evolution," said Gavin Conant, co-author and assistant professor of animal sciences. "Basic research on the plant genome provides raw materials and improves techniques for creating medicines and crops."

Previous studies found long strings of identical code in different species of animals' DNA. But before this new MU research, which was published in the Proceedings of the National Academy of Sciences, computer programs had never been powerful enough to find identical sequences in plant DNAs, because the identical sections weren't found at the same points.

The genomes of six animals (dog, chicken, human, mouse, macaque and rat) were compared to each other. Likewise, six plant species (Arabidopsis, soybean, rice, cottonwood, sorghum and grape) were compared to each other. Comparing all the genetic sequences took 4 weeks with 48 computer processors doing 1 million searches per hour for a grand total of approximately 32 billion searches.

Although the scientists found identical sequences between plant species, just as they did between animals, they suggested the sequences evolved differently.

"You would expect to see convergent evolution, but we don't," Conant said. "Plants and animals are both complex multi-cellular organisms that have to deal with many of the same environmental conditions, like taking in air and water and dealing with weather variations, but their genomes code for solutions to these challenges in different ways."

The MU team's research laid the groundwork for future studies into the reasons plants and animals developed different genetic mechanisms and how they function. Their basic research created a foundation for discoveries that may improve human life. Besides advancing genetic science's potential to fight disease, the code-analyzing computer program itself could help in the development of new medicines.

"The same algorithm can be used to find identical sequential patterns in an organism's entire set of proteins," said Korkin. "That could potentially lead to finding new targets for existing drugs or studying these drugs' side effects."

The PNAS paper, titled "Long Identical Multispecies Elements in Plant and Animal Genomes," involved collaboration between the Universities of Missouri, California and Arizona. The computer algorithm was developed by Jeff Reneker, a senior research informatician at MU's Center for Computational Biology and Medicine, during his doctoral study at the MU Computer Science Department under the supervision of Chi-Ren Shyu, Director of the MU Informatics Institute.