Introduction to Next-Generation Sequencing (NGS) Technology

In the previous lesson on basic DNA sequencing techniques, we covered a variety of sequencing methods that were used from the mid 80's to the early 2000's.

Although these techniques allowed us to sequence the first human genome, they were too costly and time-intensive. Because of this, there were a limited number of sample human genomes to base genetic studies on, making it difficult to come up with robust phenotype-genotype correlations.

It wasn't until pyrosequencing and other NGS techniques that allowed for a price drop to $1000. By 2008, consumer genomics began to take hold, with hard data showing genetic mutations correlating with specific disease.

Genome sequencers from the early 2000's

Resequencing vs. de novo sequencing

Before we talk about how much cheaper DNA sequencing has gotten, let's look at these two terms used to describe sequencing methodologies: resequencing and de novo sequencing.

Resequencing is the term for sequencing an organism that has already been sequenced. We only need to align our reads to a reference genome. Thus, our reads need only be a few hundred base pairs long. Many Next-Generation Sequencing platforms provide short reads that can be aligned to a reference genome.

De novo sequencing, on the other hand, is the term used to sequence a genome from scratch. It is much more costly, time-intensive, and limited to select techniques. The length of a read must be at least 1,000 bps long. The first human genome sequenced relied on these methods, which is one of the reasons it was so costly and time-intensive.

Forces behind NGS

Throughout the 2000's, scientists have come up with a class of novel techniques to lower the cost of DNA sequencing. These methods were successful not only because of the new chemistries available, but also due to cheaper and more powerful computing power.

Some may argue that computers allowed for the emergence of NGS technology, as faster processing powers allowed computers to assemble genomes at a rate much higher than before. Additionally, affordable data storage allowed for genomes to be stored and accessible through public databases, and novel algorithms provided immediate analysis and results.

The problem we face in bioinformatics is now not the lack of information, but the wealth of it! Scientists simply have too much data and not enough time to curate through them. This is why many biological databases are separated into primary (unfiltered) and secondary (curated) databases. In order to make sense of all this data, we are in need of well-trained and knowledgeable bioinformaticians.

What exactly is NGS?

The textbook definition of Next-Generation Sequencing is a high-throughput DNA sequencing methodology that makes use of parallelization to process up to half a million sequences concurrently. The process of running thousands of analytes at a time is known as a multiplexing .

A new era

NGS can also be used to describe a new era. In this new time, we can see sequencing one's genome becoming commonplace. Imagine going to the doctor's office with some illness or concerns, and simply ordering a genetic test. The process will be affordable and easy - just like taking an MRI or performing a blood test. An era where this is commonplace is what some say NGS refers to.

Time cover in 2012 - The genetic revolution.

Distinguishing factors of NGS

A simpler library preparation

A commonality of Next-Generation Sequencing methods is the simplified workflow used to prepare genes for sequencing. With the advent of PCR and its variations, there is no more use of transforming DNA fragments into bacterial cells to replicate DNA. Library preparation includes the following:

Fragmenting the DNA (through sonification, enzymatic cleavage, or any other method). Ligation of an adapter sequence, barcode and primer. Size selection of the fragments.

Short reads limit de novo sequencing

Previous methods relied on capillary electrophoresis, which could only read up to 96 wells at a time. NGS's massively parallel technique allowed for millions of reads to run simultaneously; however, most reads come out as short, unless additional techniques such as mate-pair sequencing are used.

Two types of PCR

Instead of conventional PCR or amplification through bacterial species, NGS techniques use two different flavors of PCR to set the stage for sequencing.

There are two ways we are able to prepare the library: through emulsion PCR (ePCR) and bridge PCR .

With ePCR we have technologies such as Ion Torrent Semiconductor sequencing, 454 Roche Pyrosequencing, and sequencing by ligation.

With Bridge PCR, we have technologies such as Illumina's Sequencing by Synthesis and SOLiD sequencing by ligation.

We'll first cover ePCR and the technologies that use them, then move onto Bridge PCR.

Emulsion PCR is done within a water-in-oil emersion, while bridge PCR is conducted on a flow cell.

Sequencing by...

There are alternative methods used to sequence the actual DNA. We have seen sequencing by synthesis already, where the base calls are read at the addition of each nucleotide. There is another type technique called sequencing by ligation , which we'll see soon.

NGS terms

Here are some important NGS terms you should familiarize yourselves with.

Read A raw sequence that comes from a sequencing machine. Usually 300-800 bp long. Tag Several reads coming from the same sequences can be merged to one tag . Sequencing Depth Total number of sequences, reads, or base pairs generated represented in a single sequencing experiment. Coverage Total number of bases generated / size of genome sequenced.

This is just the beginning...

The term Next-Generation Sequencing is somewhat of a misnomer since it implies some technology of the future. However, as you're going through this lesson, note the limitations of NGS, as they exist. There is a Third-Generation Sequencing, which is supposed to be the next-Next-Generation of sequencing platforms, and improve upon these limitations. We will cover this in the future.

References

Emulsion PCR ePCR

Emulsion PCR is a PCR variation that some NGS technologies use to replicate DNA sequences. It is conducted on a bead surface within tiny water bubbles floating on an oil solution.

Why replicate DNA?

This is a very important concept to understand, as all NGS techniques replicate DNA before sequencing is done. In short, DNA is replicated in order to amplify signals. No matter the method of sequencing, without a proper amount of amplification, it's near impossible to detect each base call.

Procedure

1) Fragmentation of DNA

The library is first fragmented either by sonication (high sound energy) or nebulization (forces DNA through a small hole) to fragments ranging from 300 to 800 bp.

2) Adapters are ligated

Adapters are then ligated onto the DNA fragments. These allow the strands to bind to the emulsion beads.

3) Denature to single strands

The double stranded DNA's with adapters are then denatured by heating the DNA up to 95 °C. Denaturing DNA simply means to go from double stranded DNA (dsDNA) to two single strands (ssDNA) - the hydrogen bonds keeping the two together are broken.

4) Formation of clonal bead populations

Each bead coated with streptavidin , which is resistant to organic solvents, denaturants, detergents, proteolytic enzymes and extremes of temperature and pH.

Over a billion beads are used with a primer that matches the adapters attached earlier. The ssDNA is then attached to these beads.

Each bead is emulsified in a water-in-oil droplet with PCR reagents (DNA polymerase, primers, buffers, dNTPs).

Emulsion PCR components

5) ePCR amplifies DNA strands on beads

Within these droplets, PCR is conducted. This involves the steps Denaturation, Annealing, Elongation. Firstly, the strand is elongated with DNA polymerase and dNTPs. Then the double-strand is denatured, allowing for the strand to ligate to another site on the surface of the bead. Eventually, 1 million copies of the target is amplified on the surface of each bead. The water-in-oil droplet is approximately 1-um.

Follow the figure to see how each bead is able to replicate DNA on its surface.

ePCR amplifies DNA strands on beads. Figure adapted from Andy Vierstraete

6) Emulsion Breaking

After the DNA strands are amplified, the emulsion from the preceding step is broken using isopropanol and detergent buffer. The solution is then vortexed, centrigued, and magnetically separated. The resulting solution is a suspension of empty, clonal and non-clonal beads, which will be filtered in the next step.

7) Bead enrichment

After PCR is conducted, you are left with a mixture of some beads that have amplified DNA attached on its surface, and some that do not.

We may take out the enriched beads by attaching streptavidin coated magnetic enrichment bead. With a magnet, we can then pull out the beads with amplified DNA.

Bead enrichment using magnets.

There are other methods of bead enrichment that include using larger beads that are able to bind to beads with amplified DNA. After centrifugation, the beads with amplified DNA and without can then be separated.

8) Bead Capping

Attach a capping oligonucleotide to the 3' end of both unextended forward ePCR primers and the RDV segment of template DNA. This helps in coverslip arraying, which is used to polony sequencing, and prevents fluorescent probes from ligating to the ends.

9) Result

The beads with amplified sequences are then placed on a slide and are sequenced. Due to their high density of the same DNA molecule, the signal is amplified, allowing computers to read the sequencing data.

References

Emulsion PCR figure adapted and used with permission from Andy Vierstraete.

Sequencing by Ligation

Thus far we have seen methods that add a single base per cycle, known as sequencing by synthesis . In contrast, sequencing by ligation uses short segments of DNA called oligonucleotides instead of single bases to sequence DNA. Take a look at the diagram below to see the difference:

Sequencing by synthesis vs. sequencing by ligation.

Ligase vs. Polymerase

Since this is ligation, we use the enzyme DNA ligase rather than DNA polymerase . This enzyme joins together ends of DNA molecules.

Note that ligation is performed in the 3'-5' direction for multiple cycles, which is the opposite of how polymerase works.

Because DNA ligase has a low efficiency when there are mismatches between bases, we can be sure that only the oligonucleotides that match are ligated.

Procedure

There are five main steps to sequencing by ligation, as outlined below.

1) Anchor known sequences to target DNA

A known sequence is flanked onto the target DNA strand. A short anchor sequence is then brought in to bind to this known sequence.

A known sequence is ligated to flank the target DNA strand, and another anchor sequence is used to bind to that.

2) Oligonucleotides design

Oligonucleotides are short segments of DNA, and are characterized by a number of features, as outlined below.

Size

The oligonucleotides have either lengths 8 (octamer) or 9 (nonamer).

Partially degenerate

The oligonucleotides are partially degenerate , meaning that at one of their positions they have a known nucleotide. For example, one oligonucleotide can have a known query position at 1, but unknown positions for 2-9. Another nucleotide can have a known position at 4, but unknown nucleotides at 1-3 and 5-9.

For our example, let's assume we have a nonamer whose known position is at query position 1.

Fluorescently labeled

Each oligonucleotide is tagged at the 3' ends with a fluorescent dye. The colors vary depending on the known query position.

Nonamers with known query position 1. Imagine a whole pool of these with gray bases being any random nucleotide.

3) Oligonucleotides hybridize

The pool of oligonucleotides are mixed in with the target DNA and allowed to hybridize with target DNA sequence.

DNA ligase joins the molecule to the anchor when its bases match the unknown DNA sequence. based on what color light is emitted, we are able to see the nucleotide at the position of the unknown sequence.

4) Fluorescent labels are cleaved

The fluorescent labels are cleaved away, regenerating at 5'-phosphate group on the ends of the ligated probes. This will allow the next oligonucleotides to be ligated onto the rest of the unknown sequences.

5) Another round of ligation

This process in steps 4 and 5 are repeated until the nonamers have reached the end of the unknown DNA sequence. After this, the anchor sequence is reduced by one nucleotide, and the process is repeated.

How sequencing by ligation works, step-by-step.

Pros and cons

A downside to this method is its limitation to short reads, and the time it takes to ligate oligonucleotides on and off. Additionally, there have been problems sequencing palindromic strands.

The positives to this method is that it is easy to implement with off-the-shelf reagents.

References

Yu-Feng Huang, Sheng-Chung Chen, Yih-Shien Chiang, Tzu-Han Chen & Kuo-Ping Chiu (2012). "Palindromic sequence impedes sequencing-by-ligation mechanism". BMC systems biology. Wikipedia - Sequencing by ligation

Polony Sequencing

Polony sequencing , developed by George M. Church at Harvard Medical School, is a sequencing technique that uses paired-tag library emulsion PCR to amplify the target DNA, and sequencing by ligation to detect DNA bases. This is a combination of concepts we covered in the two previous pages.

George M. Church

When polony sequencing was published was released in 2003, and the cost was less than 10% of Sanger Sequencing. It was used to sequence a full E. coli genome in 2005 with an error rate of less than 0.00001%.

Open-source means freedom

One unique aspect of polony sequencing is that its technology is an open-source platform. This means the software and protocols are free and don't require licensing or a fee for use. Any modifications or improvements to the system are also made available. Additionally, the only machinery required is a computer-controlled fluidics system and an epifluorescence microscope.

Procedure

The procedure takes a total of 9 steps, but the most important parts (emulsion PCR and sequencing by ligation) were already covered in an earlier lesson.

1) Shearing DNA

The first step, as in any other NGS technique, is the library construction. We break apart the genomic DNA.

2) DNA Repair

Next we want to perform end-repair to fix any damaged or incompatible edges. We want to make our DNA ends blunt-ended with a phosphate group attached at the 5'. This allows us to ligate any adapter oligonucleotides.

The DNA fragments also undergo A-tailed treatment . This adds an A to the 3' end of the sheared DNA.

The left dsDNA has blunt ends, while the right has sticky ends.

After the DNA molecules are repaired, those of length 1kb are selected by loading them onto a 6% TBE PAGE gel.

3) DNA circularization

The next step is to circularize the DNA. We do this with the T-tailed 30 bp long synthetic oligonucleotides (T30). This contains two outward-facing Mmel recognition sites.

Restriction Enzymes Restriction enzymes are biomolecules that are able to recognize a specific sequence and cut either at that particular spot, or a spot a certain nucleotides away from it. The cuts may be "sticky," or "blunt" depending on the type of restriction enzyme.

Circularization with T30.

4) Rolling circle replication

The circularized DNA undergoes rolling circle replication . This is a type of nucleic acid replication that rapidly synthesizes multiple copies of circular molecules of DNA.

Rolling Cycle Amplification to generate several copies of the circularized DNA.

The newly generated circularized DNA are then digested by restriction enzyme Mmel (type IIs restriction endonucleases), which cut at a distance away from its recognition site. This releases the T30 fragment, flanked by 17-18 bp tags of the sequence (70 bp in total).

Digest the circularized DNA with Mmel, which cut a specific number of bp away from the recognition site.

5) DNA Repair and primers added

The resulting DNA is repaired and FDV2 and RDV2 are added on each ends. In total, this results in a 135 bp library molecules.

We now have DNA templates with 44 bp FDV sequence, a 17-18 bp proximal tag, the T30 sequence, a 17-18 bp distal tag, and a 25 bp RDV sequence.

Attach primers FDV2 and RDV2 for emulsion PCR.

6) ePCR

ePCR is used to amplify the 135 bp paired end-tag library molecules. This process takes place within a water droplet embedded within an oil solution.

Product of emulsion PCR - beads with amplified DNA on surface.

7) Coverslip arraying

Coverslips are washed and treated with aminosilane . This eliminate fluorescent contamination and allows for covalent coupling of template DNA and beads to attach.

The resulting beads from ePCR are mixed with acrylamide and poured into a teflon-masked microscope slide. The coverslip is placed on top of the acrylamide gel for 45 minutes to allow for polymerization.

The beads bind to the aminosaline coating of the coverslip, spreading out in a monolayer in an acrylamide gel. The coverslip with the gel, beads and template DNA are inverted. Now beneath this solution is where the sequencing reagents will flow.

Coverslip and aminosilane.

8) DNA sequencing

The methods for DNA sequencing is sequencing by ligation. In short, a series of anchor primers are hybridized to the synthetic oligonucleotide sequences at the genomic DNA sequences.

A group of degenerate nonamers (oligonucleotides of length 9) are used, each with a particularly known query position and fluorescent marker. Thus, in this round the known query is at position 9:

Depending on which nonamer binds, we can see which nucleotide is at position 9. We can then do this again to get the nucleotide at postition 18, then 27, and so on. Now we can use a pool of nonamers that have a known query position down one nucleotide:

We may either use these, or simply shift the known nucleotide position up one base pair and again use nonamers of known query position 1.

We perform throw in this pool of degenerate nonamers again to see nucleotides at positions 8, 16, 24, 32 and so on. We repeat this over again with different known query positions until we are through with the sequence.

Cons

There may be failures in cleaving the dyes, which can mess up base calls.

Pros

Cheap, opensource, free software.

Flexible. can include BAC (bacterial artificial chromosome) and bacterial genome resequencing, as well as SAGE (series analysis of gene expression) and barcode sequencing.

Easy to set up. Only need commonly fluorescence microscope, and a computer controlled flowcell.

Scalable by using 1 um magnetic beads.

References

Wikipedia - Polony Sequencing Shendure J, Porreca GJ, Reppas NB, Lin X, McCutcheon JP, Rosenbaum AM, Wang MD, Zhang K, Mitra RD, Church GM (2005). "Accurate multiplex polony sequencing of an evolved bacterial genome". Science 309 (5741): 1728–32.

Pyrosequencing 454 Roche technology

Pyrosequencing is considered to be one of the first of the second-generation sequencing technologies. It was commercialized through Roche's 454 sequencing instrument, and allowed scientists to garner large amounts of sequencing data in a single run.

Unlike polony sequencing, pyrosequencing falls under sequencing by synthesis , meaning the sequence is resolved while forming the sample's complementary strand. However, similar to polony sequencing, pyrosequencing uses emulsion PCR.

At its core, the pyrosequencing technique relies on the detection of pyrophosphate molecules that are released during DNA synthesis. This allows for the generation of light, which is then detected by a sensor.

What is Pyrophosphate?

In your typical dNTP molecule, there are three phosphate that are attached to the 5' carbon of the deoxyribose sugar. The first (which is attached to the sugar) is called the α-phophate. The next is β-phosphate and the last is γ-phosphate.

During replication, the α-phosphate of each incoming complementary nucleotide is joined enzymatically by a phosphodiester linkage to the 3'-OH group of the last nucleotide in the growing strand.

During this reaction, the β- and γ-phosphates are cleaved off in a unit called the pyrophosphate (PPi).

Pyrophosphate is a by-product of DNA elongation.

Procedure

In second generation DNA sequence techniques, a cycle is established to resolve each nucleotide. Here is the cycle used in pyrosequencing:

1) Emulsion PCR

After emulsion PCR is performed, each enriched bead is placed in one of the many picoliter-volume wells of the sequencing machine.

2) Adding in a dNTP

One of the four dNTP's is added. If the next sequence of the growing strand is complementary to the dXTP, PPi is released.

3) PPi causes light to be released

PPi reacts with ATP sufurylase, generating ATP. This reacts with luciferase to produce light. The flash of light is recorded by a camera - intensity is proportional to more dXTP's being added.

Here is the chemical reaction that takes place to generate light.

Steps to detecting a nucleotide by pyrosequencing: 1) PPi is generated if dXTP is incorporated. 2) PPi reacts and through a series of reactions, light is generated. 3) Light gets picked up by the sensor. 4) The more intense the light is, the more dXTP's are in that region of sequence.

4) Remaining reagents washed away

Any remaining deoxynucleoside triphosphate (dXTP) and ATP are degraded by apyrase and washed away.

5) Repeat in a cyclic fashion

This process is repeated from step 2 until all bases are sequenced.

Sequencing Machines

The sequencing machines out in industry that use pyrosequencing includes Roche's 454 platform.

Two of Roche's DNA sequencing instruments.

GS Junior+ GS FLX Titanium XL+ GS FLX Titanium XLR70 Bases per run ~100,000 ~1,000,000 shotgun ~1,000,000 shotgun Read Length ~700 bp Up to 1,000 bp Up to 600 bp Mode Read Length 700 bp 700 bp 450 bp Run time 18 hours 23 hours 10 hours Consensus Accuracy 99% 99.997% 99.995%

Pros and Cons

Some good points to pyrosequencing is the long read sizes, and fast run times. However, runs are expensive, and the homopolymer errors are frequent due to a low sensitivity.

Semiconductor Sequencing pH-mediated sequencing, silicon sequencing or semiconductor sequencing

Semiconductor sequencing is another sequencing by synthesis method that is based on detection of H+ ions released during the polymerization of DNA. With this technique, Life Technologies released the Personal Genome Machine in 2011 as, "a rapid, compact and economical bench top machine."

One great thing about this technology is that there is no need for a modified nucleotide, or oligonucleotides as we have seen in reversible chain terminators and Sanger sequencing.

Procedure

1) Emulsion PCR

Emulsion PCR allows for enriched beads to be placed in microwells (see micro-machined well). Just underneath these microwells are pH sensors that are able to detect the most miniscule changes in pH.

Microwell containing the DNA template with an ISFET ion sensor. Each microwell is loaded up with a bead from emulsion PCR.

pH Remember that pH is just a logarithmic scale that measures the amount of hydrogen ions (H+) in a solution. The lower the pH, the more hydrogen ions there are.

2) dNTP's flooded

A particular dNTP is released. If the growing sequence requires that particular dNTP, then a H+ ion and pyrophosphate group is released.

Hydrogen ion being released with DNA polymerization.

3) Signal detection

The signal is picked up by the ISFET sensor and translated into a base call. Any homopolymers (multiple of the same base) result in a strong signal.

4) Wash and repeat

Unattached dNTP molecules are washed out, and the cycle repeats with a new dNTP.

Analysis

The Ion Proton System was release from Life Technologies in 2012.

Ion Torrent Personal Genome Machine Ion Proton System Bases per run 1 Gb Up to 10Gb Read Length 35-400 bp 200 bp Run time 4.5 hours 2-4 hours

Pros

Rapid sequencing speed.

Low upfront and operating costs.

Real-time sequencing.

No need for modified nucleotides or special enzymes.

No need for expensive optical equipment.

Cons

Difficult to capture homopolymer regions such as CCCCCC. This results in multipled hydrogen ions going off at once, and a greater pH change. However, there is a loss of resolution as the number of repeated elements increases.

Short read lengths compared to Sanger sequencing and pyrosequencing.

Rate limited by dNTP flow.

Videos

Watch how the Ion Torrent system works.

A more detailed look into the Ion Proteon Sequencer.

References

Bridge PCR

Bridge PCR is a PCR technique that embeds DNA on a surface for cloning. It is used by Illumina's HiSeq platform.

Procedure

1) Adapted attached to ends of fragmented DNA

The DNA is fragmented (through sonication or any other method) and adapters are ligated to both ends.

Adapters are attached to fragmented DNA.

2) Denature and bind to flow cell surface

The DNA is then denatured into single-stranded molecules. These fragments are then floated onto a flow cell which have corresponding adapter sequences that permit binding.

When the DNA strands are placed onto the slide, they attach to their corresponding adapter sequences.

DNA strands are ligated onto the flowcell.

3) Bridge amplification

Add dNTPs, and DNA polymerase enzyme to elongate DNA strands.

Bridge amplification occurs with regular PCR components.

4) Denature and repeat to generate clusters

Denaturation the newly formed DNA strands and repeat until dense clusters of dsDNA are generated in each channel of flow cell.

Repeat until we have millions of dense clusters of DNA.

The reverse strands are then cleaved and washed away.

What is the purpose of high density regions?

After Bridge PCR is conducted on the flow cell, Illumina uses fluorescently labeled dNTP's to detect each nucleotide bases.

So for example, if a red fluorescent light goes off, then we know it's an A. If a blue light goes off, then we know it's a G, and so forth (colors here aren't accurate but you get the picture).

However, the signal produced by the synthesis of one dNTP on a strands is not enough to be detected. This is why we need to amplify the DNA sequences and producing a dense amount of sequences per area on the flow cell.

We'll see how Illumina is able to sequence millions of these dense colonies in parallel.

References

Illumina Website

Illumina Sequencing-By-Synthesis (SBS) Technology

Instead of bead-based emulsion PCR, Illumina uses bridge PCR, which we just saw in the previous page. The sequencing is conducted on a flow cell using sequencing-by-synthesis methods with fluorescent lights. This requires the user of high-resolution optical devices.

History

The technology was originally developed by Shankar Balasubramanian and David Klenerman at the University of Cambridge. The two founded Solexa in 1998, commercializing their sequencing method. Illumina merged with Solexa in 2007 for $600m, together hoping to "reach and exceed the $100,000 genome." And reach and exceed they did.

Procedure

1) Library Preparation

Whole genomes are fragmented by nebulization or sonication. The randomly fragmented genomic DNA are then end-repaired by polymerase and exonuclease activity. The 3' ends are phosphorylated, while 5' ends are adenylated. Size selection occurs through gel electrophoresis and PCR selection.

2) Clonal colony cluster creation

The DNA is then placed on a flow cell , which are silica slides of eight lengthwise lengths. These are about the size of a microscope slide, and are sealed to minimize contamination and handling errors.

What an Illumina flowcell looks like with a US quarter for scale. Very similar to a microscope slide.

On the slides, the flow cells are subjected to isothermal bridge amplification, created clusters densities of up to 2000 molecules. The duplication of each genomic strand aids in amplifying the generated signals upon sequencing.

Several clusters are formed on the Illumina flow cell like the above.

3) Sequencing

Illumina sequencing devices incorporate fluorescent reversible terminators. Each dNTP has a corresponding fluorophore attached to it.

When polymerase elongates the strand with a fluorescently-labeled dNTP, the clusters are then excited by a light source and the color recorded by an optical detector. After incorporation occurs, the fluorophore is cleaved, unblocking for the next nucleotide to be incorporated in the next cycle. Since each cycle one permits the elongation of a single dNTP at a time, homopolymers are determined precisely.

Sequencing by Synthesis. dNTP fluorescence is translated to a base call.

4) Paired-end reads

In order to elongate our reads, we may sequence starting from the other end. This would be helpful for de novo assemblies, detection of insertions/deletions and other genomic mutations.

After the forward strand is sequenced, we remove it by denaturation. Index 1 primer attaches, and index 1 is read off the template. 3' ends are unblocked and index 2 is then read. dsDNA clusters are regenerated by bridge amplification and DNA is denatured. The forward base strand is cleaved. Use the newly synthesized strands to sequence and produce the paired end sequence data.

5) Output

Illumina machines generate output in FASTQ format, which gives the probability of a base call being incorrect.

Videos

Two videos outlining the overview of Illumina's Sequencing technology.

References

Illumina acquires Solexa

Illumina Website