`
` a p p l i c at i o n s o f n e x t- g e n e r at i o n s e q u e n c i n g
`
`Sequencing technologies —
`the next generation
`
`Michael L. Metzker*‡
`
`Abstract | Demand has never been greater for revolutionary technologies that deliver
`fast, inexpensive and accurate genome information. This challenge has catalysed the
`development of next-generation sequencing (NGS) technologies. The inexpensive
`production of large volumes of sequence data is the primary advantage over conventional
`methods. Here, I present a technical review of template preparation, sequencing and
`imaging, genome alignment and assembly approaches, and recent advances in current
`and near-term commercially available NGS instruments. I also outline the broad range of
`applications for NGS technologies, in addition to providing guidelines for platform
`selection to address biological questions of interest.
`
`Automated Sanger
`sequencing
`This process involves a mixture
`of techniques: bacterial
`cloning or PCR; template
`purification; labelling of DNA
`fragments using the chain
`termination method with
`energy transfer, dye-labelled
`dideoxynucleotides and a
`DNA polymerase; capillary
`electrophoresis; and
`fluorescence detection that
`provides four-colour plots to
`reveal the DNA sequence.
`
`*Human Genome Sequencing
`Center and Department of
`Molecular & Human
`Genetics, Baylor College of
`Medicine, One Baylor Plaza,
`N1409, Houston, Texas
`77030, USA.
`‡LaserGen, Inc., 8052 El Rio
`Street, Houston, Texas
`77054, USA.
`e‑mail: mmetzker@bcm.edu
`doi:10.1038/nrg2626
`Published online
`8 December 2009
`
`Over the past four years, there has been a fundamental
`shift away from the application of automated Sanger
`sequencing for genome analysis. Prior to this depar-
`ture, the automated Sanger method had dominated the
`industry for almost two decades and led to a number of
`monumental accomplishments, including the comple-
`tion of the only finished-grade human genome sequence1.
`Despite many technical improvements during this era,
`the limitations of automated Sanger sequencing showed
`a need for new and improved technologies for sequenc-
`ing large numbers of human genomes. Recent efforts
`have been directed towards the development of new
`methods, leaving Sanger sequencing with fewer reported
`advances. As such, automated Sanger sequencing is not
`covered here, and interested readers are directed to
`previous articles2,3.
`The automated Sanger method is considered as
`a ‘first-generation’ technology, and newer methods
`are referred to as next-generation sequencing (NGS).
`These newer technologies constitute various strategies
`that rely on a combination of template preparation,
`sequencing and imaging, and genome alignment and
`assembly methods. The arrival of NGS technologies in
`the marketplace has changed the way we think about
`scientific approaches in basic, applied and clinical
`research. In some respects, the potential of NGS is akin
`to the early days of PCR, with one’s imagination being
`the primary limitation to its use. The major advance
`offered by NGS is the ability to produce an enormous
`volume of data cheaply — in some cases in excess of
`one billion short reads per instrument run. This feature
`expands the realm of experimentation beyond just
`
`determining the order of bases. For example, in
`gene-expression studies microarrays are now being
`replaced by seq-based methods, which can identify and
`quantify rare transcripts without prior knowledge of a
`particular gene and can provide information regarding
`alternative splicing and sequence variation in identified
`genes4,5. The ability to sequence the whole genome of
`many related organisms has allowed large-scale com-
`parative and evolutionary studies to be performed that
`were unimaginable just a few years ago. The broadest
`application of NGS may be the resequencing of human
`genomes to enhance our understanding of how genetic
`differences affect health and disease. The variety of
`NGS features makes it likely that multiple platforms
`will coexist in the marketplace, with some having clear
`advantages for particular applications over others.
`This Review focuses on commercially available tech-
`nologies from Roche/454, Illumina/Solexa, Life/APG
`and Helicos BioSciences, the Polonator instrument and
`the near-term technology of Pacific Biosciences, who
`aim to bring their sequencing device to the market in
`2010. Nanopore sequencing is not covered, although
`interested readers are directed to an article by Branton
`and colleagues6, who describe the advances and remain-
`ing challenges for this technology. Here, I present a tech-
`nical review of template preparation, sequencing and
`imaging, genome alignment and assembly, and current
`NGS platform performance to provide guidance on how
`these technologies work and how they may be applied
`to important biological questions. I highlight the appli-
`cations of human genome resequencing using targeted
`and whole-genome approaches, and discuss the progress
`
`NATuRe RevIewS | Genetics
`
` vOLume 11 | jANuARy 2010 | 31
`
`00001
`
`EX1011
`
`
`
`R E V I E W S
`
`Finished grade
`A quality measure for a
`sequenced genome. A
`finished-grade genome,
`commonly referred to as a
`‘finished genome’, is of higher
`quality than a draft-grade
`genome, with more base
`coverage and fewer errors and
`gaps (for example, the human
`genome reference contains
`2.85 Gb, covers 99% of the
`genome with 341 gaps, and
`has an error rate of 1 in every
`100,000 bp).
`
`Template
`This recombinant DNA
`molecule is made up of a
`known region, usually a vector
`or adaptor sequence to which
`a universal primer can bind,
`and the target sequence, which
`is typically an unknown portion
`to be sequenced.
`
`Seq-based methods
`Assays that use
`next-generation sequencing
`technologies. They include
`methods for determining the
`sequence content and
`abundance of mRNAs,
`non-coding RNAs and small
`RNAs (collectively called
`RNA–seq) and methods for
`measuring genome-wide
`profiles of immunoprecipitated
`DNA–protein complexes
`(ChIP–seq), methylation
`sites (methyl–seq) and
`DNase I hypersensitivity
`sites (DNase–seq).
`
`Polonator
`This Review mostly describes
`technology platforms that are
`associated with a respective
`company, but the Polonator
`G.007 instrument, which is
`manufactured and distributed
`by Danaher Motions (a Dover
`Company), is an open source
`platform with freely available
`software and protocols. Users
`manufacture their own reagents
`based on published reports or
`by collaborating with George
`Church and colleagues or other
`technology developers.
`
`Fragment templates
`A fragment library is prepared
`by randomly shearing genomic
`DNA into small sizes of <1kb,
`and requires less DNA than
`would be needed for a
`mate-pair library.
`
`and limitations of these methods, as well as upcoming
`advances and the impact they are expected to have over
`the next few years.
`
`next-generation sequencing technologies
`Sequencing technologies include a number of methods
`that are grouped broadly as template preparation,
`sequencing and imaging, and data analysis. The unique
`combination of specific protocols distinguishes one
`technology from another and determines the type of
`data produced from each platform. These differences
`in data output present challenges when comparing plat-
`forms based on data quality and cost. Although qual-
`ity scores and accuracy estimates are provided by each
`manufacturer, there is no consensus that a ‘quality base’
`from one platform is equivalent to that from another
`platform. various sequencing metrics are discussed later
`in the article.
`In the following sections, stages of template prepara-
`tion and sequencing and imaging are discussed as they
`apply to existing and near-term commercial platforms.
`There are two methods used in preparing templates
`for NGS reactions: clonally amplified templates origi-
`nating from single DNA molecules, and single DNA-
`molecule templates. The term sequencing by synthesis,
`which is used to describe numerous DNA polymer-
`ase-dependent methods in the literature, is not used
`in this article because it fails to delineate the different
`mechanisms involved in sequencing2,7. Instead, these
`methods are classified as cyclic reversible termination
`(CRT), single-nucleotide addition (SNA) and real-time
`sequencing. Sequencing by ligation (SBL), an approach
`in which DNA polymerase is replaced by DNA ligase,
`is also described. Imaging methods coupled with these
`sequencing strategies range from measuring biolumines-
`cent signals to four-colour imaging of single molecular
`events. The voluminous data produced by these NGS
`platforms place substantial demands on informa-
`tion technology in terms of data storage, tracking and
`quality control (see Ref. 8 for details).
`
`template preparation
`The need for robust methods that produce a representative,
`non-biased source of nucleic acid material from the
`genome under investigation cannot be overemphasized.
`Current methods generally involve randomly breaking
`genomic DNA into smaller sizes from which either frag-
`ment templates or mate-pair templates are created. A com-
`mon theme among NGS technologies is that the template
`is attached or immobilized to a solid surface or support.
`The immobilization of spatially separated template sites
`allows thousands to billions of sequencing reactions to
`be performed simultaneously.
`
`Clonally amplified templates. most imaging systems have
`not been designed to detect single fluorescent events, so
`amplified templates are required. The two most common
`methods are emulsion PCR (emPCR)9 and solid-phase
`amplification10. emPCR is used to prepare sequencing
`templates in a cell-free system, which has the advantage
`of avoiding the arbitrary loss of genomic sequences — a
`
`problem that is inherent in bacterial cloning methods. A
`library of fragment or mate-pair targets is created, and
`adaptors containing universal priming sites are ligated to
`the target ends, allowing complex genomes to be ampli-
`fied with common PCR primers. After ligation, the
`DNA is separated into single strands and captured onto
`beads under conditions that favour one DNA molecule
`per bead (fIG. 1a). After the successful amplification and
`enrichment of emPCR beads, millions can be immobi-
`lized in a polyacrylamide gel on a standard microscope
`slide (Polonator)11, chemically crosslinked to an amino-
`coated glass surface (Life/APG; Polonator)12 or deposited
`into individual PicoTiterPlate (PTP) wells (Roche/454)13
`in which the NGS chemistry can be performed.
`Solid-phase amplification can also be used to produce
`randomly distributed, clonally amplified clusters from
`fragment or mate-pair templates on a glass slide (fIG. 1b).
`High-density forward and reverse primers are covalently
`attached to the slide, and the ratio of the primers to the
`template on the support defines the surface density of
`the amplified clusters. Solid-phase amplification can
`produce 100–200 million spatially separated template
`clusters (Illumina/Solexa), providing free ends to which
`a universal sequencing primer can be hybridized to
`initiate the NGS reaction.
`
`Single-molecule templates. Although clonally amplified
`methods offer certain advantages over bacterial cloning,
`some of the protocols are cumbersome to implement
`and require a large amount of genomic DNA material
`(3–20 μg). The preparation of single-molecule tem-
`plates is more straightforward and requires less start-
`ing material (<1 μg). more importantly, these methods
`do not require PCR, which creates mutations in clon-
`ally amplified templates that masquerade as sequence
`variants. AT-rich and GC-rich target sequences may
`also show amplification bias in product yield, which
`results in their underrepresentation in genome align-
`ments and assemblies. Quantitative applications, such as
`RNA–seq5, perform more effectively with non-amplified
`template sources, which do not alter the representational
`abundance of mRNA molecules.
`Before the NGS reaction is carried out, single-
`molecule templates are usually immobilized on solid sup-
`ports using one of at least three different approaches. In
`the first approach, spatially distributed individual primer
`molecules are covalently attached to the solid support14.
`The template, which is prepared by randomly fragment-
`ing the starting material into small sizes (for example,
`~200–250 bp) and adding common adaptors to the frag-
`ment ends, is then hybridized to the immobilized primer
`(fIG. 1c). In the second approach, spatially distributed sin-
`gle-molecule templates are covalently attached to the solid
`support14 by priming and extending single-stranded, sin-
`gle-molecule templates from immobilized primers (fIG. 1c).
`A common primer is then hybridized to the template
`(fIG. 1d). In either approach, DNA polymerase can bind
`to the immobilized primed template configuration to
`initiate the NGS reaction. Both of the above approaches
`are used by Helicos BioSciences. In a third approach,
`spatially distributed single polymerase molecules
`
`32 | jANuARy 2010 | vOLume 11
`
` www.nature.com/reviews/genetics
`
`00002
`
`
`
`a Roche/454, Life/APG, Polonator
`Emulsion PCR
`One DNA molecule per bead. Clonal amplification to thousands of copies occurs in microreactors in an emulsion
`
`PCR
`amplification
`
`Break
`emulsion
`
`Template
`dissociation
`
`R E V I E W S
`
`100–200 million beads
`
`Chemically cross-
`linked to a glass slide
`
`Primer, template,
`dNTPs and polymerase
`
`b Illumina/Solexa
`Solid-phase amplification
`One DNA molecule per cluster
`
`Sample preparation
`DNA (5 µg)
`
`Template
`dNTPs
`and
`polymerase
`
`100–200 million molecular clusters
`
`c Helicos BioSciences: one-pass sequencing
`Single molecule: primer immobilized
`
`Cluster
`growth
`
`Bridge amplification
`
`Billions of primed, single-molecule templates
`
`d Helicos BioSciences: two-pass sequencing
`Single molecule: template immobilized
`
`e Pacific Biosciences, Life/Visigen, LI-COR Biosciences
`Single molecule: polymerase immobilized
`
`Billions of primed, single-molecule templates
`
`Thousands of primed, single-molecule templates
`
`Figure 1 | template immobilization strategies. In emulsion PCR (emPCR) (a), a reaction mixture consisting of
`Nature Reviews | Genetics
`an oil–aqueous emulsion is created to encapsulate bead–DNA complexes into single aqueous droplets. PCR
`amplification is performed within these droplets to create beads containing several thousand copies of the same
`template sequence. EmPCR beads can be chemically attached to a glass slide or deposited into PicoTiterPlate
`wells (fIG. 3c). Solid-phase amplification (b) is composed of two basic steps: initial priming and extending of the
`single-stranded, single-molecule template, and bridge amplification of the immobilized template with immediately
`adjacent primers to form clusters. Three approaches are shown for immobilizing single-molecule templates to a solid
`support: immobilization by a primer (c); immobilization by a template (d); and immobilization of a polymerase (e).
`dNTP, 2′-deoxyribonucleoside triphosphate.
`
`Mate-pair templates
`A genomic library is prepared
`by circularizing sheared DNA
`that has been selected for a
`given size, such as 2 kb,
`therefore bringing the ends
`that were previously distant
`from one another into close
`proximity. Cutting these circles
`into linear DNA fragments
`creates mate-pair templates.
`
`are attached to the solid support15, to which a primed
`template molecule is bound (fIG. 1e). This approach is
`used by Pacific Biosciences15 and is described in patents
`from Life/visiGen16 and LI-COR Biosciences17. Larger
`DNA molecules (up to tens of thousands of base pairs)
`can be used with this technique and, unlike the first two
`approaches, the third approach can be used with real-time
`methods, resulting in potentially longer read lengths.
`
`sequencing and imaging
`There are fundamental differences in sequencing
`clonally amplified and single-molecule templates. Clonal
`amplification results in a population of identical tem-
`plates, each of which has undergone the sequencing
`reaction. upon imaging, the observed signal is a con-
`sensus of the nucleotides or probes added to the iden-
`tical templates for a given cycle. This places a greater
`
`NATuRe RevIewS | Genetics
`
` vOLume 11 | jANuARy 2010 | 33
`
`00003
`
`
`
`R E V I E W S
`
`Dephasing
`This occurs with step-wise
`addition methods when
`growing primers move out of
`synchronicity for any given
`cycle. Lagging strands (for
`example, n – 1 from the
`expected cycle) result from
`incomplete extension, and
`leading strands (for example,
`n + 1) result from the addition
`of multiple nucleotides or
`probes in a population of
`identical templates.
`
`Dark nucleotides or probes
`A nucleotide or probe that
`does not contain a fluorescent
`label. It can be generated from
`its cleavage and carry-over
`from the previous cycle or be
`hydrolysed in situ from its
`dye-labelled counterpart in
`the current cycle.
`
`Total internal reflection
`fluorescence
`A total internal reflection
`fluorescence imaging device
`produces an evanescent
`wave — that is, a near-field
`stationary excitation wave with
`an intensity that decreases
`exponentially away from the
`surface. This wave propagates
`across a boundary surface,
`such as a glass slide, resulting
`in the excitation of fluorescent
`molecules near (<200 nm) or
`at the surface and the
`subsequent collection of their
`emission signals by a detector.
`
`Libraries of mutant DNA
`polymerases
`Large numbers of genetically
`engineered DNA polymerases
`can be created by either
`site-directed or random
`mutagenesis, which leads
`to one or more amino acid
`substitutions, insertions and/or
`deletions in the polymerase.
`The goal of this approach is
`to incorporate modified
`nucleotides more efficiently
`during the sequencing reaction.
`
`Consensus reads
`These are only useful for
`single-molecule techniques and
`are produced by sequencing
`the same template molecule
`more than once. The data are
`then aligned to produce a
`‘consensus read’, reducing
`stochastic errors that may
`occur in a given sequence read.
`
`demand on the efficiency of the addition process, and
`incomplete extension of the template ensemble results
`in lagging-strand dephasing. The addition of multiple
`nucleotides or probes can also occur in a given cycle,
`resulting in leading-strand dephasing. Signal dephas-
`ing increases fluorescence noise, causing base-calling
`errors and shorter reads18. Because dephasing is not an
`issue with single-molecule templates, the requirement
`for cycle efficiency is relaxed. Single molecules, however,
`are susceptible to multiple nucleotide or probe additions
`in any given cycle. Here, deletion errors will occur owing
`to quenching effects between adjacent dye molecules or
`no signal will be detected because of the incorporation
`of dark nucleotides or probes. In the following sections,
`sequencing and imaging strategies that use both clonally
`amplified and single-molecule templates are discussed.
`
`Cyclic reversible termination. As the name implies, CRT
`uses reversible terminators in a cyclic method that com-
`prises nucleotide incorporation, fluorescence imaging
`and cleavage2. In the first step, a DNA polymerase, bound
`to the primed template, adds or incorporates just one flu-
`orescently modified nucleotide (BOX 1), which represents
`the complement of the template base. The termination of
`DNA synthesis after the addition of a single nucleotide is
`an important feature of CRT. Following incorporation,
`the remaining unincorporated nucleotides are washed
`away. Imaging is then performed to determine the iden-
`tity of the incorporated nucleotide. This is followed by
`a cleavage step, which removes the terminating/inhibit-
`ing group and the fluorescent dye. Additional washing
`is performed before starting the next incorporation step.
`fIG. 2a depicts a four-colour CRT cycle used by Illumina/
`Solexa, and fIG. 2c illustrates a one-colour CRT cycle
`used by Helicos BioSciences.
`The key to the CRT method is the reversible ter-
`minator, of which there are two types: 3′ blocked and
`3′ unblocked (BOX 1). The use of a dideoxynucleotide,
`which acts as a chain terminator in Sanger sequenc-
`ing, provided the basis for the initial development
`of reversible blocking groups attached to the 3′ end of
`nucleotides19,20. Blocking groups, such as 3′-O-allyl-
`2′-deoxyribonucleoside triphosphates (dNTPs)21 and
`3′-O-azidomethyl-dNTPs22, have been successfully used
`in CRT. 3′-blocked terminators require the cleavage of
`two chemical bonds to remove the fluorophore from the
`nucleobase and restore the 3′-OH group.
`Currently, the Illumina/Solexa Genome Analyzer
`(GA)23 dominates the NGS market. It uses the clonally
`amplified template method illustrated in fIG. 1b, coupled
`with the four-colour CRT method illustrated in fIG. 2a.
`The four colours are detected by total internal reflection
`fluorescence (TIRF) imaging using two lasers, the output
`of which is depicted in fIG. 2b. The slide is partitioned
`into eight channels, which allows independent sam-
`ples to be run simultaneously. TABLe 1 shows the cur-
`rent sequencing statistics of the Illumina/Solexa GAII
`platform operating at the Baylor College of medicine
`Human Genome Sequencing Center (BCm-HGSC;
`D. muzny, personal communication). Substitutions are
`the most common error type, with a higher portion of
`
`errors occurring when the previous incorporated
`nucleotide is a ‘G’ base24. Genome analysis of Illumina/
`Solexa data has revealed an underrepresentation of
`AT-rich24–26 and GC-rich regions25,26, which is probably
`due to amplification bias during template preparation25.
`Sequence variants are called by aligning reads to a refer-
`ence genome using bioinformatics tools such as mAQ27
`or eLAND23. Bentley and colleagues reported high con-
`cordance (>99.5%) of single-nucleotide variant (SNv)28
`calls with standard genotyping arrays using both align-
`ment tools, and a false-positive rate of 2.5% with novel
`SNvs23. Other reports have described a higher false-
`positive rate associated with novel SNv detection using these
`alignment tools29,30.
`The difficulty involved in identifying a modified
`enzyme that efficiently incorporates 3′-blocked termi-
`nators — a process that entails screening large libraries
`of mutant DNA polymerases — has spurred the develop-
`ment of 3′-unblocked reversible terminators. LaserGen,
`Inc. was the first group to show that a small terminating
`group attached to the base of a 3′-unblocked nucleotide
`can act as an effective reversible terminator and be effi-
`ciently incorporated by wild-type DNA polymerases31.
`This led to the development of Lightning Terminators32
`(BOX 1). Helicos BioSciences has reported the develop-
`ment of virtual Terminators, which are 3′-unblocked
`terminators with a second nucleoside analogue that
`acts as an inhibitor33. The challenge for 3′-unblocked
`terminators is creating the appropriate modifications
`to the terminating (Lightning Terminators)32 or inhib-
`iting (virtual Terminators)33 groups so that DNA syn-
`thesis is terminated after a single base addition. This
`is important because an unblocked 3′-OH group is the
`natural substrate for incorporating the next incoming
`nucleotide. Cleavage of only a single bond is required
`to remove both the terminating or inhibiting group and
`the fluorophore group from the nucleobase, which is a
`more efficient strategy than 3′-blocked terminators for
`restoring the nucleotide for the next CRT cycle.
`Helicos BioSciences was the first group to commer-
`cialize a single-molecule sequencer, the HeliScope, which
`was based on the work of Quake and colleagues34. The
`HeliScope uses the single-molecule template methods
`shown in fIG. 1c and fIG. 1d coupled with the one-colour
`(Cy5 dye) CRT method shown in fIG. 2c. Incorporation
`of a nucleotide results in a fluorescent signal. The
`HeliScope also uses TIRF to image the Cy5 dye34, the
`imaging output of which is shown in fIG. 2d. Harris and
`colleagues14 used Cy5-12ss-dNTPs, which are earlier ver-
`sions of their virtual Terminators that lack the inhibiting
`group, and reported that deletion errors in homopoly-
`meric repeat regions were the most common error type
`(~5% frequency) when using the primer-immobilized
`strategy shown in fIG. 1c. This is likely to be related
`to the incorporation of two or more Cy5-12ss-dNTPs
`in a given cycle. These errors can be greatly reduced
`with two-pass sequencing, which provides ~25-base
`consensus reads using the template-immobilized strat-
`egy shown in fIG. 1d. At the 2009 Advances in Genome
`Biology and Technology (AGBT) meeting, the Helicos
`group reported their recent progress in sequencing the
`
`34 | jANuARy 2010 | vOLume 11
`
` www.nature.com/reviews/genetics
`
`00004
`
`
`
`R E V I E W S
`
`Fluor
`
`O
`
`HN
`
`NO2
`
`O
`
`Fluor
`
`HN
`
`O
`O
`
`NH
`
`S
`
`S
`
`O
`
`b 3′-unblocked reversible terminators
`
`O
`
`N
`
`HN
`
`O
`
`O
`
`OH
`
`O
`
`NH
`
`Lightning Terminator
`(LaserGen, Inc.)
`
`O
`
`HO
`
`–O
`
`P
`
`O
`
`O
`
`O
`
`P
`P
`–O O–O O
`
`Box 1 | Modified nucleotides used in next-generation sequencing methods
`
`a 3′-blocked reversible terminators
`
`Fluor
`
`O
`
`HN
`
`Illumina/Solexa
`
`Fluor
`
`O
`
`HN
`
`NH
`
`O
`
`O
`
`O
`
`NH
`
`O
`
`N3
`
`O
`
`O
`
`N
`
`HN
`
`O
`
`HO
`
`O
`
`O
`
`O
`
`NH
`
`OO
`
`O
`
`N
`
`HN
`
`O
`
`O
`
`O
`
`N
`
`HN
`
`O
`
`O
`
`O
`
`HO
`
`–O
`
`P
`
`O
`
`O
`
`O
`
`P
`P
`–O O–O O
`
`O
`
`Ju et al.
`
`O
`
`HO
`
`–O
`
`P
`
`O
`
`O
`
`O
`
`P
`P
`–O O–O O
`
`O
`
`N3
`
`P
`
`P
`P
`–O O–O O
`
`O
`
`–O
`
`c Real-time nucleotides
`
`O
`
`O
`
`P
`
`HN
`
`–O
`
`Fluor
`
`O
`
`O
`
`P
`P
`–O O–O O
`
`Virtual Terminator
`(Helicos BioSciences)
`
`HN
`
`N
`
`O
`
`O
`
`O
`P
`
`O
`O–
`
`HO
`
`OH
`
`Life/VisiGen
`
`O
`
`OH
`
`O
`
`N
`
`HN
`
`O
`
`O
`
`OH
`
`O
`
`Quchr
`
`NH
`
`LI-COR Biosciences
`
`O
`
`N
`
`HN
`
`O
`
`O
`
`OH
`
`At the core of most
`next-generation sequencing
`(NGS) methods is the use of
`dye-labelled modified nucleotides. Ideally, these nucleotides are
`incorporated specifically, cleaved efficiently during or following
`fluorescence imaging, and extended as modified or natural bases
`in ensuing cycles. In the figure, red chemical structures denote
`terminating functional groups, except in the Helicos BioSciences
`structure, which is characterized by an inhibitory function33.
`Arrows indicate the site of cleavage separating the fluorophore
`from the nucleotide, and the blue chemical structures denote
`residual linker structures or molecular scars that are attached to
`the base and accumulate with subsequent cycles. DNA synthesis
`is terminated by reversible terminators following the
`incorporation of one modified nucleotide by DNA polymerase.
`Two types of reversible terminators have been described:
`3′-blocked terminators, which contain a cleavable group
`attached to the 3′-oxygen of the 2′-deoxyribose sugar, and
`3′-unblocked terminators.
`Several blocking groups have been described
`(see the figure, part a), including 3′-O-allyl19,21,101
`(Ju & colleagues, who exclusively licensed their
`technology to Intelligent Bio-Systems) and 3′-O-
`azidomethyl22,23,101 (Illumina/Solexa). The blocking
`group attached to the 3′ end causes a bias against
`incorporation with DNA polymerase. Mutagenesis of DNA polymerase is required to facilitate
`the incorporation of 3′-blocked terminators.
`3′-unblocked reversible terminators (part b) show more favourable enzymatic incorporation
`and, in some cases, can be incorporated as well as a natural nucleotide using wild-type DNA
`polymerases31. Other groups, including Church and colleagues102 and Turcatti and colleagues103,
`have described 3′-unblocked terminators that rely on steric hindrance of the bulky dye group to
`inhibit incorporation after the addition of the first nucleotide.
`With real-time nucleotides (part c), the fluorophore is attached to the terminal phosphate
`group (Life/VisiGen16 and Pacific Biosciences15) rather than the nucleobase, which also reduces
`bias against incorporation with DNA polymerase. In addition to labelling the terminal phosphate
`group, LI-COR Biosciences’ nucleotides attach a quencher molecule to the base17. Gamma-
`labelled 2′-deoxyribonucleoside triphosphates (dNTPs) were first described in 1979 by Yarbrough
`et al.104, and more recently, Kumar et al. described their terminally labelled polyphosphate
`nucleotides105. With the exception of LI-COR Biosciences’ nucleotides, which leave the quencher
`group attached, natural bases are incorporated into the growing primer strand.
`
`O
`
`O
`
`O
`
`P
`P
`–O O–O O
`
`O
`
`P
`
`HN
`
`–O
`
`S
`
`S
`
`NH
`
`O
`
`5
`
`HN
`
`Fluor
`
`O
`
`O
`
`N
`
`HN
`
`O
`
`O
`
`OH
`
`Fluor
`
`NH
`
`O
`
`Phospholinked nucleotides
`(Pacific Biosciences)
`
`O
`
`O
`
`O
`
`O
`
`P
`
`O
`
`O
`
`P
`P
`–O O–O O
`
`–O
`
`O
`
`P
`P
`P
`O O–O O–O O
`
`O –
`
`NATuRe RevIewS | Genetics
`
`Nature Reviews | Genetics
` vOLume 11 | jANuARy 2010 | 35
`
`00005
`
`
`
`R E V I E W S
`
`G
`
`F
`
`F
`
`A
`
`C
`F
`
`G
`F
`
`Incorporate
`all four
`nucleotides,
`each label
`with a
`different dye
`
`a Illumina/Solexa — Reversible terminators
`F
`
`F
`
`C
`
`T
`F
`
`T
`
`F
`
`A
`
`F
`
`F
`
`C
`
`A
`
`F
`
`G
`
`F
`
`T
`
`C
`G
`F
`
`G
`A
`F
`
`T
`C
`F
`
`c Helicos BioSciences — Reversible terminators
`F
`
`C
`
`F
`
`F
`
`C
`
`C
`
`F
`
`C
`
`F
`
`C
`
`F
`C
`
`Incorporate
`single,
`dye-labelled
`nucleotides
`
`F
`
`F
`
`G
`
`F
`
`G
`
`G
`
`F
`
`G
`
`F
`G
`
`F
`
`CG
`
`Each cycle,
`add a
`different
`dye-labelled
`dNTP
`
`F
`G
`
`F
`
`CG
`
`G
`
`CG
`
`F
`C
`
`C
`
`C
`F
`
`G
`F
`
`T
`F
`
`C
`G
`F
`
`G
`A
`F
`
`T
`C
`F
`
`Wash, four-
`colour imaging
`
`Wash, one-
`colour imaging
`
`C
`
`G
`
`T
`
`C
`G
`
`G
`A
`
`T
`C
`
`Cleave dye
`and inhibiting
`groups, cap,
`wash
`
`Cleave dye
`and terminating
`groups, wash
`
`b
`
`Repeat cycles
`
`Repeat cycles
`
`T
`
`T
`
`A
`
`A
`
`G
`
`G
`
`Top:
`Bottom:
`
`CTAGTG
`CAGCTA
`
`Cd
`
`C
`
`Top:
`CATCGT
`Bottom: CCCCCC
`
`GA
`
`C
`T
`
`Figure 2 | Four-colour and one-colour cyclic reversible termination methods. a | The four-colour cyclic reversible
`termination (CRT) method uses Illumina/Solexa’s 3′-O-azidomethyl reversible terminator chemistry23,101 (BOX 1) using
`Nature Reviews | Genetics
`solid-phase-amplified template clusters (fIG. 1b, shown as single templates for illustrative purposes). Following
`imaging, a cleavage step removes the fluorescent dyes and regenerates the 3′-OH group using the reducing agent
`tris(2-carboxyethyl)phosphine (TCEP)23. b | The four-colour images highlight the sequencing data from two clonally
`amplified templates. c | Unlike Illumina/Solexa’s terminators, the Helicos Virtual Terminators33 are labelled with the
`same dye and dispensed individually in a predetermined order, analogous to a single-nucleotide addition method.
`Following total internal reflection fluorescence imaging, a cleavage step removes the fluorescent dye and inhibitory
`groups using TCEP to permit the addition of the next Cy5-2′-deoxyribonucleoside triphosphate (dNTP) analogue. The
`free sulphhydryl groups are then capped with iodoacetamide before the next nucleotide addition33 (step not shown).
`d | The one-colour images highlight the sequencing data from two single-molecule templates.
`
`Caenorhabditis elegans genome. From a single HeliScope
`run using only 7 of the instrument’s 50 channels, approx-
`imately 2.8 Gb of high-quality data were generated in
`8 days from >25-base consensus reads with 0, 1 or 2
`errors. Greater than 99% coverage of the genome was
`reported, and for regions that showed >5-fold coverage,
`the consensus accuracy was 99.999% (j. w. efcavitch,
`personal communication).
`
`Sequencing by ligation. SBL is another cyclic method
`that differs from CRT in its use of DNA ligase35 and
`either one-base-encoded probes or two-base-encoded
`probes. In its simplest form, a fluorescently labelled
`probe hybridizes to its complementary sequence adja-
`cent to the primed template. DNA ligase is then added
`to join the dye-labelled probe to the primer. Non-ligated
`probes are washed away, followed by fluorescence
`
`One-base-encoded probe
`An oligonucleotide sequence in
`which one interrogation base is
`associated with a particular
`dye (for example, A in the first
`position corresponds to a green
`dye). An example of a one-base
`degenerate probe set is
`‘1-probes’, which indicates
`that the first nucleotide is the
`interrogation base. The
`remaining bases consist of
`either degenerate (four possible
`bases) or universal bases.
`
`36 | jANuARy 2010 | vOLume 11
`
` www.nature.com/reviews/genetics
`
`00006
`
`
`
`Pros
`
`cons
`
`Biological
`applications
`
`R E V I E W S
`
`Bacterial and insect
`genome de novo
`assemblies; medium
`scale (<3 Mb) exome
`capture; 16S in
`metagenomics
`Variant discovery
`by whole-genome
`resequencing or
`whole-exome capture;
`gene discovery in
`metagenomics
`Variant discovery
`by whole-genome
`resequencing or
`whole-exome capture;
`gene discovery in
`metagenomics
`Bacterial genome
`resequencing for
`variant discovery
`
`Refs
`
`D. Muzny,
`pers.
`comm.
`
`D. Muzny,
`pers.
`comm.
`
`D. Muzny,
`pers.
`comm.
`
`J.
`Edwards,
`pers.
`comm.
`
`Seq-based methods
`
`91
`
`S. Turner,
`pers.
`comm.
`
`Table 1 | comparison of next-generation sequencing platforms
`