The SMRT approach to sequencing has several advantages. First, consider the impact of the longer reads, especially for de novo assemblies of novel genomes. While typical next-generation sequencing can provide abundant coverage of a genome, the short read lengths and amplification biases of those technologies can lead to fragmented assemblies whenever a complex repeat or poorly amplified region is encountered. As a result, GC-rich and GC-poor regions, which tend to be poorly amplified, are particularly susceptible to poor quality sequencing. Resolving fragmented assemblies requires additional costly bench work and further sequencing. By also including the longer reads of SMRT sequencing runs, the read set will span many more repeats and missing bases, thereby closing many of the gaps automatically and simplifying, or even eliminating, the finishing time (Figure 1). It is becoming routine for bacterial genomes to be completely assembled using this approach [3, 4], and we expect this practice will translate to larger genomes in the near future. A complete genome is far more useful than the poor quality draft sequences that litter GenBank because it provides a complete blueprint for the organism; the genes encoded therein represent the full biological potential of that organism. With only draft assemblies available, one is always left with the nagging feeling that some crucial gene is missing - perhaps the one in which you are most interested! The long read lengths also have more power to reveal complex structural variations present in DNA samples, such as pinpointing precisely where copy number variations have occurred relative to the reference sequence [5]. They are also extremely powerful for resolving complex RNA splicing patterns from cDNA libraries, since a single long read may contain the entire transcript end-to-end, thus eliminating the need to infer the isoforms [6].

Figure 1 Idealized assembly graphs [18]of the 5.2 megabase-pairB. anthracisAmes Ancestor main chromosome using (a) 100 bp, (b) 1,000 bp and (c) 5,000 bp reads. The graphs encode the compressed de Bruijn graph derived from infinite coverage error-free reads, effectively representing the repeats in the genome and the upper bound of what could be achieved in a real assembly. Increasing the read length decreases the number of contigs because the longer reads will span more of the repeats. Note the assembly with 5,000 bp reads has a self-edge because the chromosome is circular. Full size image

Second, consider DNA methyltransferases. These can exist as solitary entities or as parts of restriction-modification systems. In both cases, they methylate relatively short sequence motifs that can easily be recognized from SMRT sequencing data because of the change in DNA polymerase kinetics, as it moves along the template molecule, that result from the presence of epigenetic modifications. The altered kinetics cause a change in the timing of when the fluorescent colors are observed, thus enabling direct detection of epigenetic modifications, which can ordinarily only be inferred, and bypassing the usual necessity of enrichment or chemical conversion. Often, thanks to bioinformatics, the gene responsible for any given modification can be matched to the sequence motif in which the modification lies [7, 8]. When it cannot, then simply cloning the gene into a plasmid, which is subsequently grown in a non-modifying host and re-sequenced, can provide the match [9]. Moreover, SMRT sequencing has also been able to identify RNA base modifications through the same approach as DNA base modifications, but using an RNA transcriptase in place of the DNA polymerase [10]. In fact, SMRT sequencing represents an important step toward uncovering the biology that happens between DNA and proteins, including not only the study of mRNA sequences but also the regulation of translation [11, 12]. Thus, functional information emerges directly from the SMRT sequencing approach.

Third, we must consider the persistent rumor that SMRT sequencing is much less accurate than other next-generation sequencing platforms, which has now been demonstrated to be untrue in several ways. First, a direct comparison of several approaches to determining genetic polymorphisms has shown that SMRT sequencing has comparable performance to other sequencing technologies [13]. Second, the accuracy of assembling a complete genome using SMRT sequencing in combination with other technologies has proved to be as reliable and accurate as more traditional approaches [3, 6, 14]. Moreover Chin et al. [15] showed that an assembly using only long SMRT sequencing reads achieves comparable or even higher performance than other platforms (99.999% accuracy in three organisms with known reference sequences), including 11 corrections to the Sanger reference of these genomes. Koren et al. [6] showed that most microbial genomes could be assembled into a single contig per chromosome with this approach; it is by far the least expensive option for doing so.