Here's a look at how genome sequencing works, the computing techniques that make it possible, and the impact it is expected to have on human health.

Some genetics researchers say we're entering a golden age of genetics, thanks to a technique called next-generation sequencing, which makes it much faster and far less expensive to sequence the genomes of people, plants, and animals. It’s driven in part by advances in computing technology and access to increasingly less expensive computing power.

Here's a look at how genome sequencing works, the computing techniques that make it possible, and the impact it is expected to have on human health.

What is genome sequencing?

Let’s start with the basics: a description of genomes and genome sequencing. A genome is the entire set of genes in an animal or a plant. Each gene is made up of DNA, a double-helix-shaped molecule with two strands twisted around each other, similar to a spiral staircase. The rungs of the DNA molecule are composed of pairs of chemicals called bases. There are four bases: adenine (A), thymine (T), guanine (G), and cytosine (C). An A always pairs with a T (AT), and a C always pairs with a G (CG). Each gene carries the code to build specific proteins, as directed by the exact configuration of the AT and CG base pairs along its length. A single human gene is made up of anywhere from a few hundred bases to more than 2 million of them. In all, humans have between 19,000 and 20,000 genes. That represents a tremendous amount of information to decode, for a total of approximately 3 billion pairs of bases in the human genome.

When you sequence a gene, you determine the precise order of all the base pairs in a gene. Knowing the base pair sequences of a gene can help determine its functions. Sequencing an entire genome means that you’ve mapped every gene in a plant or an animal.

Prepare your healthcare delivery organization for the first wave of genomics Learn more

The human genome was first sequenced in the Human Genome Project in 2003. The project mapped all the genes in the human body. But the map was a general one, and it's important to keep in mind that there’s a great deal of variation in individuals’ genomes. Some people have genes for blue eyes, for example, and others for brown eyes. Some people have genes that help them live longer, while some have gene variants that lead to various diseases. Each of us has our own individual genetic makeup—our own unique genome. So the sequencing of the human genome in 2001 wasn’t the end of a process but just a beginning. The work continues with sequencing individuals’ genomes, which is leading to a better understanding of the functions of genes and can also lead to cures for diseases.

It took well over a decade and $3 billion to sequence the human genome. Today, though, thanks to high-power computing and a variety of new techniques, an individual’s genome can be done in hours and cost as little as $1,000. In large part, that’s due to next-generation sequencing, made possible by advances in computational power and reduced costs.

Enter next-generation sequencing

Much has changed in genetics technology since the human genome was sequenced in 2001 using a technique known as Sanger sequencing. The new techniques are called next-generation sequencing, although the term is somewhat out of date, according to Andrew Severin, a research scientist and manager of the Genome Informatics Facility (GIF) at Iowa State University.

“Next-generation sequencing came out after Sanger sequencing but was quickly replaced by what has been called third-generation sequencing technology,” Severin says. “Now we’re on the verge of moving even beyond that. So next-generation sequencing is a somewhat dated term. What we’re really talking about is high-throughput sequencing.”

Whatever term you use, it has dramatically sped up the ability to sequence genes and genomes, and made the price of doing so plummet. The process depends on the latest computing techniques, particularly high-powered parallel computing.

In the first step, cells are put into a sequencer, which starts the decoding process. The human genome is so massive that it can’t be sequenced from start to finish—it’s simply too big. So the DNA in cells is broken up into smaller, more manageable fragments. The sequencer essentially takes snapshots of all the pieces, uses computing power to do initial analysis on them, and puts them into a file format that can be used for processing in the next phase of the process.

“You have millions of files that get spit out of the sequencer, and it’s like a giant jigsaw puzzle,” explains Lisa Wright, HPC program manager, Life Science Vertical Solutions, at Hewlett Packard Enterprise. “There are multiple copies of each slice of DNA. Then, in what’s called the alignment process, you essentially put the puzzle together.”

Once that is done, the results are compared to what is called the reference genome, to look for variants in the genome being sequenced—genes that cause blonde hair versus brown hair, for example, or brown eyes rather than blue eyes. The end result is the entire individual genome of a human being, specific to that person. In Wright’s words, “It’s basically a blueprint of your body.”

What makes all of this possible is parallel computing, Wright says. “We’re not using GPUs; we’re not using machine learning or artificial intelligence,” she says. Instead, computing clusters with many cores along with parallel storage are used to perform parallel processing. The number of cores in the cluster is what speeds up everything, even more than the speed of processors.

What the advances mean for human health

The ability to do all this will have a dramatic impact on human health—and already has. No one knows that better than Eric Lander, president and founding director of the Broad Institute of MIT and Harvard, perhaps the foremost genetics research institute in the world. Lander was a principal leader of the Human Genome Project, and is a professor of biology at MIT and professor of systems biology at Harvard Medical School.

In an interview with The Atlantic, Lander succinctly laid out how the future of medicine and human health will be changed forever by next-generation sequencing:

Remember that it took 15 years and $3 billion just to get the first person’s sequence. The idea of doing that thousands of times over would have seemed crazy—except that an amazing transformation over the past 12 years brought down the cost of sequencing genomes by about a million-fold. The rate of progress is stunning. As costs continue to come down, we are entering a period where we are going to be able to get the complete catalog of disease genes. This will allow us to look at thousands of people and see the differences among them, to discover critical genes that cause cancer, autism, heart disease, or schizophrenia.

Once that is done, cures can be crafted for the diseases. It’s already happening. One notable example is in treating breast cancer for women. Ten years ago, Wright notes, breast cancer was typically treated by performing a mastectomy, followed by radiation therapy and chemotherapy.

“After that,” she says, “people had to live with the side effects for the rest of their lives.”

But thanks to the ability to quickly sequence genes, that has changed. Doctors can now take a biopsy, sequence it, determine exactly what type of cancer it is, and decide upon the best way to treat it. For example, if the cancer is fed by estrogen, the doctor can remove it surgically and then prescribe a medicine that removes estrogen from the body, reducing the risks of the cancer returning.

Even more individualized treatments are done using precision medicine, in which cures for diseases are customized for each person’s specific genetic makeup. Already, two treatments for doing this have been approved by the U.S. Food and Drug Administration. The first, Kymriah from Novartis, is for young adults and children who have an aggressive form of acute leukemia. The second is Kite Pharma's Yescarta, which has been approved to treat aggressive forms of non-Hodgkin’s lymphoma, a blood cancer.

In the Yescarta treatment, millions of a person’s T cells, important to the immune system, are frozen and sent to Kite Pharma, where they are genetically engineered, based on a person’s genetic makeup, to kill cancer cells. The engineered cells are then fed back into the person’s body.

Beyond the human genome

It’s not only people’s genomes that can be sequenced in this way. So can animal and plant genomes, at institutes such as the GIF at Iowa State University. The knowledge gained by doing that can lead to a wide range of benefits, including higher agricultural yields to help feed an increasingly hungry planet and save endangered species.

And that's just the beginning. Computing power will continue to increase while costs decrease. New breakthroughs are being made in parallel processing, as well as in machine learning and artificial intelligence, which will likely be applied to genomic research. So you can expect next-generation sequencing to continue to make the strides it has in the past decade and more, with increasing benefits each year.