throbber
Next-Generation Seque11cing
`Platforms
`
`Elaine R. Mardis
`
`The Genome Institute at Washington University School of Medicine, St. Louis,
`Missouri 63108; email: emardis@wustl.edu
`
`Annu. Rev. Anal. Chem. 2013. 6:287-303
`
`The A111111nl Rroiew of Annlytiml Cbe111ist1y is online
`at anchem.annua1reviews.org
`
`Thi s article's doi:
`10. 1 H6/annurev-anchem-062012-092628
`
`Copyright © 2013 by Annual Reviews.
`All rights reserved
`
`Keywords
`massively parallel sequencing, next-generation sequencing, reversible dye
`terminators, sequencing by synthesis, single-molecule sequencing,
`genomics
`
`Abstract
`
`Automated DNA sequencing instruments embody an elegant interplay
`among chemisuy, engineering, software, and molecular biology and have
`built upon Sanger's founding discove1y of dideoxynucleotide sequencing to
`perform once-unfathomable tasks. Combined with innovative physical map(cid:173)
`ping approaches that helped to establish long-range relationships between
`cloned stretches of genomic DNA, fluorescent DNA sequencers produced
`reference genome sequences for model organisms and for the reference hu(cid:173)
`man genome. New types of sequencing instruments that permit amazing
`acceleration of data-collection rates for DNA sequencing have been devel(cid:173)
`oped. The ability to generate genome-scale data sets is now u·ansforming
`the nature of biological inqui1y. Here, I provide an historical perspective of
`the field, focusing on the fundamental developments that predated the ad(cid:173)
`vent of next-generation sequencing instruments and providing information
`about how these insu·uments work, their application to biological research,
`and the newest types of sequencers that can exu·act data from single DNA
`molecules.
`
`00001
`
`EX1012
`
`

`

`1. INTRODUCTION
`Automated DNA sequencing instruments embody an elegant interplay among chemist1y, engi(cid:173)
`neering, sofrwai·e, and molecular biology and have built upon Sanger's founding discovery of
`d.ideoxynuc.leotide ~equc.nci.ng to perform once-unfathomable tasks. Combined with innovative
`physical mapping approaches that helped to establish long-range relationships between cloned
`stretches of genomic D A, fluorescent D r A sequencers have been used to produce reference
`genome sequences for model organisms (Ercberichia coli, Dro.rophi/11 111elrmog11ster, Caenor/J11bditis
`olegrms, Nlus 11111.smlus 1-Jmbidopsis I ba/i1111n, Zerr 1111,ys) and for the reference human genome. Since
`2005, however, new rypes of seq uencing instruments that permit amazing acceleration of data(cid:173)
`collection rares for DNA sequencing have been inn·oduced by commercial manufacturers. For
`example, single instruments can generate data to decipher an entire human genome within only
`2 weeks. Indeed, we anticipate insn·unu:nts that will further accelerate this whole-genome se(cid:173)
`quencing data-production cirneline to days or hours in the near future. T he ability to generate
`ge110me-sc11lc data secs is now transforming the narure of biological inquiry, and the rcsulti.no-
`•
`0
`increase in our u11dersta n.di.11g of biology will probably be extrnordinary. In this review, I provide
`an historical perspective of the field, focusing on the fonda.menrnl developments that predated
`the advent of nexr-gencrati.on sequencing insmunent.s, providing information about how mas(cid:173)
`sively pamllel instru ments work and their applicarion to biological research, and finally discussing
`the newest types of equencers that are capable of exn·acting sequence data from single DNA
`molecules.
`
`2. A BRIEF HISTORY OF DNA SEQUENCING
`DNA sequencing and its manifest discipline, known as genomics, are relatively new areas of
`endeavor. They are the result of combining molecular biology with nucleotide chemist1y, both
`of which blossomed as scientific disciplines in the 1950s. Dr. Frederick Sanger's laborat01y at the
`Medical Research Council (MRC) in Cambridge, United Kingdom, began research to devise a
`method of DNA sequencing in the early 1970s (1-3) after having first published methods for R.J.'\TA
`sequencing in the late 1960s (4-6). Sanger et al.'s (7) seminal 1977 publication describes a method
`for essentially tricking DNA polymerase into incorporating nucleotides with a slight chemical
`modification-the exchange of the 3' hydroxyl group needed for cha.in elongation with a hydrogen
`atom that is functionally unable to participate in the reaction with the incoming nucleotide to
`extend the synthesized strand. Mixing proportions of the four native deoxynucleotides with one
`of four of their analogs, termed dideoxynucleotides, yields a collection of nucleotide-specific
`terminated fragments for each of the four bases (Figure 1). The fragments resulting from these
`reactions were separated by size on thin slab polyacrylamide gels; the A, C, G and T reactions
`were performed for each template run in adjacent lanes. The fragment positions were identified
`by virtue of 32 P, which was supplied in the reaction as labeled dATP molecules. When cfried and
`exposed to X-ray film, the gel-separated fragments were visualized and subsequently read from
`the exposed film from bottom to top (shortest to longest fragments) by the naked eye. Thus, a long
`and labor-intensive process was completed, and the sequencing data for the DNA of interest were
`in hand and ready for assembly, translation to amino acid sequence, or other types of analysis.
`Sequencing by radiolabeled methods underwent numerous improvements followin g its inven(cid:173)
`tion until the mid 1980s. These improvements included the invention ofDNAsynthe is chcrnisn·y
`(8, 9) and, ultimately, of DNA synthesizers that can be used to make oligonuc:leotidc pcimcrs for
`the sequencing reaction (providing a 3'-0H for extension); improved enzymes from the original
`E.coli K.lenow fragment polymerase (more uniform incorporatioll_of dideoxynucleolides) ( I 0, 11);
`
`288
`
`lvlnrdis
`
`-
`
`00002
`
`

`

`Primer
`
`Template-strand
`
`s·---- 3'
`3' --------------------------------s·
`I Polymerase+ dNTPs
`r.::::=-Ps_="_._""l"=""- +ddCTPs
`
`l +ddGTPs
`
`5' . , . _ _ CTAAG
`
`S' - - - - CT
`
`s· ---- c
`
`5• . - - CTAA
`
`Long
`fragments
`
`Direction of
`electrophoresis
`
`Short
`fragments
`
`Fi~ 1 urc 1
`"
`s~nger sequencing.
`
`Direction of
`sequence read
`
`use of 35 S-in place of 32 P-dATP for radiolabeling (sharper banding and hence longer read lengths);
`and the use of thinner and/or longer polyacrylamide gels (improved separation and longer read
`lengths), among others. Although there were attempts at automating various steps of the process,
`not,1bly the automated pipetting of sequencing reactions and the automated reading of the au(cid:173)
`toradiograph banding patterns, most improvements were not sufficient to make this sequencing
`approach truly scalable to high-throughput needs.
`
`3. IMPACT OF FLUORESCENCE LABELING
`A significant change in the scalability of DNA sequencing was introduced in 1986, when Applied
`Biosystems, Inc. (ABI), commercialized a fluorescent DNA sequencing instrument that had been
`invented in Leroy Hood's laborato1y at the California Institute of Technology (12). In replacing
`the use of radiolabeled dATP with reactions primed by fluorescently labeled primers (different
`fluor for each nucleotide reaction), the laborious processes of gel drying, X-ray film exposure
`and developing, reading autoradiographs, and performing hand enuy of the resulting sequences
`were eliminated. In this instrument, a raster scanning laser beam crossed the surface of the gel
`plates to provide an excitation wavelength for the differentially labeled fluorescent primers to
`he detected during the elecu·ophoretic separation of fragments. Thus, significant manual effort
`and several sources of error were eliminated . By use of the initial versions of this instrument,
`great increases were made in the daily thrnughput of sequencing data production, and several
`
`www.m11111nlreviews.org • Ne:a-Geuemrio11 Sequencing Plnrfonns
`
`289
`
`00003
`
`

`

`laboratories used newly available automated pipetting stations to decrease the effort and error
`rate of the upstream sequencing reaction pipetting steps (13). During this time, investigators
`made additional improvements to sequencing enzymology and processes, including the ability
`to perform cycled sequencing reactions catalyzed by thermostable sequencing polymerases (14)
`that were patterned after the polymerase chain reaction (PCR), which was first described in 1988
`by Mullis and colleagues (15). By incorporating linear (cycled) amplification into the sequencino-
`b
`reaction, one could begin with significantly lower input template DNA and hence could produce
`uniform results across a range of DNA yields (from automated isolation methods in multiwel!
`plates, for example). Improvements to chemist:1y were also important, as fluorescent dye-labeled
`dideoxynucleotides (known as terminators) were introduced (16). Because the terminatino-
`b
`nucleotide was identified by its attached fluor, all four reactions could be combined into a single
`reaction, greatly decreasing the cost of reagents and the input DNA requirements. Finally, the
`per run throughput of the sequencers increased during this time (17), ultimately permitting
`96 samples to be loaded on one gel. These technological breakthroughs combined to make
`96-well and ultimately 384-well sequencing reactions a major contributor to scalability. These
`high-throughput slab gel fluorescence inSt:1·uments largely contributed to the sequencing of
`several model organism genomes, and although they were impressive in their capacity to produce
`data, they still contained several manual and hence labor-intensive and error-prone steps. These
`limitations largely centered around casting polyacrylamide gels and loading samples by hand.
`
`4. IMPACT OF CAPILLARY OVER SLAB GEL ELECTROPHORESIS
`The rate-limiting manual steps in slab gels were addressed in 1999 with the introduction of
`capilla1y sequencing inst:1·uments, first the MegaBACETM sequencer from Molecular Dynamics
`(18) and then the ABI PRISM® 3700. These instruments solved the slab gel problem by directly
`injecting a polymeric separation mat:1·ix into capillaries that provided single-nucleotide resolution.
`Samples, by definition, could also be loaded directly from the microtiter plate to the capillaries for
`separation by use of elect:1·ical current pulses through a process known as elect:1·okinetic injection.
`Following the separation and detection of reaction products, the polymer matrix was replaced by
`pumping in new matrix. Thus, these instruments eliminated an entire series of rate-limiting steps.
`Downstream activities were further simplified because the capillaries were fixed in their positions,
`so there was no need for t:1·acking lanes on the slab gel image, and subsequent data ext:1·action and
`base-calling were much faster and more accurate. Lastly, the run times were greatly accelerated
`due to the rapid heat dissipation of the capillaries over thick glass plates. The ABI PRISM 3 700
`instruments and a later upgrade (ABI 3730) were principal data-generating instruments for the
`human and mouse genome projects, among others. Their scalability and ease of use came at a
`crucial time, when large-scale robotics to perform DNA extraction and sequencing were available
`in specialized facilities for the clone-based front end of the process.
`Indeed, these reference genomes that were produced for major model organisms, human and
`plant, provided not only a fundamental advance for biological studies in these organisms but also
`the basis for the utility of next-generation sequencing instruments. Next-generation sequencing
`is described in the next section.
`
`5. GENERAL PRINCIPLES OF NEXT-GENERATION SEQUENCING
`Beginning in 2005, the traditional Sanger-based approach to D TA sequencing has expcrienc:ecl
`revcluLionmy changes (l 9, 20). The previous "top-down' approach involved c:barntterir.ing large
`clones by low-resolution m apping as a means to organize the high- resolution sequencing oi" s1miller
`
`290
`
`1\1a1'tlir
`
`00004
`
`

`

`subclones that were assembled and finished to recapitulate each originating, larger clone (21). The
`sequences of the larger clones were then stitched together at their overlapped ends to reconstruct
`entire chromosomes (with small gaps). By contrast, next-generation sequencing instruments do
`not require a cloning step per se. Rather, the DNA to be sequenced is used to construct a libra1y
`of fragments that have synthetic DNAs (adapters) added covalently to each fragment end by use
`of DNA ligase. These adapters are universal sequences, specific to each platform, that can be
`used to polymerase-amplify the libra1y fragments during specific steps of the process. Another
`difference is that next-generation sequencing does not require performing sequencing reactions in
`microtiter plate wells. Rather, the libra1y fragments are amplified in situ on a solid surface, either
`a bead or a flat glass microfluidic channel that is covalently derivatized with adapter sequences
`that are complementary to those on tl1e library fragments. This amplification is digital in nature;
`in other words, each amplified fragment yields a single focus (a bead- or surface-borne cluster
`of mnplified DNA, all of which originated from a single fragment). Amplification is required
`co prnvide sufficient signal from each of the DNA sequencing reaction steps that determine the
`sequencing data for thatlibrary fragment. The scale and throughput of next-generation sequencing
`,ire often referred to as massively parallel, which is an appropriate descriptor for the process
`that follows fragment amplification to yield sequencing data. In Sanger sequencing, the reaction
`th,it produces the nested fragment set is distinct from the process that separates and detects the
`fragments by size to produce a linear sequence of bases. In massively parallel sequencing, the
`process is a stepwise reaction series that consists of (a) a nucleotide addition step, (b) a detection
`step that determines the identity of the incorporated nucleotides on each fragment focus being
`sequenced, and (c) a wash step that may include chemistty to remove fluorescent labels or blocking
`groups. In essence, next-generation sequencing instruments conduct sequencing and detection
`simultaneously rather than as distinct processes, one of which is completed before the other takes
`phice. J\1oreover, these steps are performed in a format that allows hundreds of thousands to
`billions of reaction foci to be sequenced during each instrument run and, hence, at a capacity per
`instrument that can produce enormous data sets.
`One final difference between Sanger sequencing data and next-generation sequencing data is
`the read length, or the number of nucleotides obtained from each fragment being sequenced.
`In Sanger sequencing, the read length was determined largely by a combination of gel-related
`factors, such as the percentage of polyacrylamide, the electt·ophoresis conditions, the time of
`separation, and the length and thickness of the gel. In next-generation sequencing, the read
`length is a function of the signal-to-noise ratio. Because the sources of noise differ according to
`the technology, specifics are described for each type of sequencing below. However, the major
`impact of the signal-to-noise ratio is to limit the read length from all next-generation sequencing
`instruments, all of which produce shorter reads than does Sanger sequencing.
`Shorter read lengths, in turn, are a differentiation point because, although short reads can be
`assembled as are traditional Sanger reads, based on shared sequence, the lower extent of shared
`sequence (due to read length) limits the ability to assemble these reads, so the overall length of
`contiguous sequence that can be assembled is limited. This limitation is exacerbated by genome
`size and complexity (e.g., repetitive content and gene families), so genomes such as that of the
`human (3 Gb and ~48% repetitive content) cannot be reassembled from the component reads
`of a whole-genome shotgun of next-generation sequencing data. Rather, because a high-quality
`reference genome exists for many model organisms and for humans, sequence read alignment is a
`more practical approach to sequencing data analysis from next-generation read lengths. Specific
`algorithms to approach short read alignment have been devised; they provide a score-based metric
`indicative of that sequence's best fit in the genome, whereby sequences that contain mostly or
`entirely repetitive content score lowest due to the uncertainty of their origin (22, 23). Improved
`
`www.n11111rnlreviews.org • Next-Ge11ern1io11 Seq11enci11g Plntfonns
`
`291
`
`00005
`
`

`

`a
`
`Al
`
`SPl
`
`..--::::
`
`Genomic DNA l h•g=o< (200-SOO bpi
`~\\==--~
`l Ligate adapters
`j
`
`SP2
`
`Generate clusters
`
`SP2 A2
`
`Flowcell
`
`SPl Al
`
`Bio •
`
`A2
`
`SPl
`
`~ ..... .
`
`A2
`
`j Sequence first end
`j Regeratedusters and
`~ ..... .
`------"[
`
`SP2
`
`sequence paired end
`
`Genomic
`DNA
`
`Fragment
`(2-5 kb)
`
`Biotinylate
`Bi~ ends
`
`Circularize
`
`Fragment
`(400-600 bp)
`
`Enrich
`biotinylated
`fragments
`
`Ligate
`adapters
`
`SP2 A2
`
`SP2 A2
`
`SPl Al
`
`Generate
`clusters
`
`A2
`
`Al
`
`Sequence
`first end
`
`Regerate
`clusters and
`sequence
`paired end
`
`•
`•
`
`Al SPI
`
`SPl
`
`~ ......
`
`SPl ~ --····
`
`Figure 2
`Comparison between (a) paired-end and (b) mate-pair sequencing libra ry-construction processes.
`
`certainty can be obtained from longer read lengths, and several next-generation sequencers have
`offered increases in read length over time and relinemenr of their signal -to-noise chara cteristics
`to allow this certainty. Another fundam ntal improvement has resulted from so-ca llc<l paired-end
`sequencing, namely producing sequence data from bod1 ends of cacl, libr.uy frngmenL. llcad pairs
`can be obtained by one of two mechani sms: (a) paired ends or (b) mate pair (Figure 2).
`In paired-end equencing, a linear frn gmenrwith a length oflcss thanl kb has adapter sequences
`at each end with diffc1·enr printing sires on each adapter. The sequencing insu·umcnt is designed ro
`sequence from one adapter priming sire by use of the stepwise sequencing described above; then,
`
`00006
`
`

`

`in a subsequent reaction, the opposite adapter is primed and sequence data are obtained. These
`reads are paired with one another during the alignment step in data analysis, which provides higher
`overall certainty of placement than does a single end read of the same length. Most alignment
`algorithms also take into account the average length of fragments in the sequencing libra1y to
`inake the most accurate placement possible. In mate-pair sequencing, the libra1y is constructed
`of fragments longer than 1 kb, and instead of ligating two adapters at each fragment end, the
`fragment is circularized around a single adapter and both fragment ends ligate to the adapter
`ends (24). These circular molecules are then treated by various molecular biology schemes (e.g.,
`by type IIS endonuclease digestion or by nick translation) to produce a single linear fragmenr
`that holds both ends of the original DNA fragment with a central ad;1pter. The remaining DNA
`remnants are removed by washing steps, as the central adapter that carries the mate-pair ends
`is biotinylated and can be captured using streptavidin magnetic beads. Typically, the resulting
`linear fragments have distinct adapters ligated to their ends, and sequencing is obtained from two
`sequential reads as described above. Again, the resulting reads are aligned as a pair to the genome
`of interest, wherein the separation distance between the reads is longer overall than that obtained
`with the paired-end approach. Often, mate-pair and paired-end reads are used in combination
`to achieve genome coverage when attempting longer-range assemblies through difficult regions
`of a genome or when attempting to assemble a genome for the first time (de nova sequencing)
`(25). In this combined coverage approach, the mate-pair reads provide longer-range order and
`orientation (a separation of up to 20 kb is possible), and the paired ends provide the ability to
`assemble, in a localized way, difficult-to-sequence regions that can then be layered on top of the
`sc,1 lfold provided by an assembly of mate-pair reads.
`
`6. DIGITAL DATA TYPE AND RAMIFICATIONS
`
`Next-generation sequencing libraries, carefully constructed to avoid sources of biasing and du(cid:173)
`plication, are highly digital. Specifically, the fact that each read originates from a consistently
`detected focus that results from the amplification of a single libra1y fragment means that the
`data m·e inherently digital in nature. Thus, a quantitation of abundance can be inferred from this
`one-to-one relationship, which has ramifications for biological systems that are being investigated
`by next-generation sequencing. For example, chromosomal amplifications that are common in
`cancer genomes can be quantitated with respect to the extent of amplification (ploidy) on each
`chromosome (26). Similarly, the read prevalence of expressed genes identified by Rl'-JA sequencing
`can be directly correlated to their expression level and compared acrnss replicates or with other
`samples from the same study (27). In population-based studies that use next-generation sequenc(cid:173)
`ing to characterize the individual species present in an isolate (metagenomics), a similar ability
`to correlate the presence of each species as a proportion of the overall population can be derived
`from the digital nature of next-generation sequencing data (28).
`
`7. SOURCES OF NOISE AND ERROR MODELS
`As mentioned above, although read length in next-generation sequencing is not limited by an elec(cid:173)
`trophoretic separation step, the major limitation of read length is the signal-to-noise ratio during
`stepwise sequencing. Depending on the platform, the contributors to noise in the sequencing reac(cid:173)
`tion differ, and there is interplay between the sources of noise and the sequencing errors that may
`result. This interplay gives rise to what is commonly referred to as the error model and is highly
`instrument and chemistry specific. In general, one typically explores both read-length limitations
`and error types by sequencing a reference set of genes or an entire genome, then comparing the
`
`www.a1111unlreviews.org • Nert-Geuerntiou Seq11e11d11g Plntfoiws
`
`2 93
`
`00007
`
`

`

`sequence obrnined with the high-quality reference gene set or genome (29). In this approach, the
`different type of errors (sub titution errors or in ertion and deletion errors) can be identified, and
`the error model (random versus systematic errors) can be defined. Representation biases can also be
`uncovered by th.is ,lpproach when one examines the aligned reads for evidence of complete or par(cid:173)
`tia l Lack of representation. If this lack ofrepresenrotion can be classified (for example, regions with
`> 95% G + C content), then the hias can be defined. Typically, the more sequence reads are exam(cid:173)
`ined, rhe better defined are the error model, coverage biases, and their conn-:ibuting SOUJ'ces. For
`example, the use of PCR or other types of enzymatic amplification may contribute systematie c.rrnrs
`during the library construction or amplification processes described above. nc might11cldrcss this
`problem, inclependc.mly of the instrument system used, by employing a ltigh-fidelir.y polymerase
`and/or by Umitino-the a.umber of amplification cycles when possible. Some sources of error, how(cid:173)
`ever, are simply inso·ument pecific and may nor be readily addressed by the end user (althoua-h
`they may improve over time with new chemistry nnd software from the manufacturer). As discussed
`below, instruments that use library amplific,ition to enhance signal produced from the sequencing
`process forego ome of the signa l-to-noise issues that are experienced in single-molecule systems
`because there are so many identical fragments being sequenced per focus that the nw11ber of fra"-
`o
`mcnrs that a.re not misrc-portiug far c.-x_cceds the number of fragments that are. In gene1:al, noise
`accumulates dming the stepwise sequencing process and ultimately limits the read length obtained
`once the signal from any base incorporation step is outcompeted by incorrect or out-of-phase in(cid:173)
`corporation events, residual signal from prior reactions or reactants, and other sources of noise.
`
`t,
`
`8. NEXT-GENERATION SEQUENCING WITH REVERSIBLE
`DYE TERMINATORS
`It is informative to discuss some of the predominant approaches to next-generation sequenc:in:g
`as a means of tying together the concepts presented herein. The first instrnment system lnvolvc.
`the use of reversible dye terminators in enzymatic sequencing of ampliliecl foci of libra1.y fr~g(cid:173)
`initially developed in 2007 by Solexa and was subsequently acquired by
`ments. This system wa
`fllw11in11®, Inc. (30). The library work How follows steps simila r ro those outlined above namely
`fragmentation of high- mole ular weight D A, enzymatic trimming, and, denylation of the frag(cid:173)
`ment ends and ligation of pecific adapters (Figure 311). The l1Jm1,una 111icrofluidic conduit is a
`flow cell composed of flar glass wirh eight micrnfluidic channels, each decorated by covalent at(cid:173)
`tachment of adapter sequences complemem:uy to the librn1y adapters. By CJrcfuJ qaantitntion of
`the library concentrntion, a prcc.iscly diluted solution of libr:iry fragments is amplifit:d in situ on
`the Row ceJI. lll'faces by use of a bridge amplification step to produce foci for seq11encing(clustcr)
`(Figure 3b). A sub equenr step chemically effects the release of fra 0 ment ends carrying the same
`arJapter, which is then primed with a complementary synthetic D A (primer) to provide free
`3'- 1J groups that can be extended in sub equent stepwise seriuencing react.ions. Tn reversible dye
`terminator eqnencing, all four nucleotides arc provided in each cycle because each nucl eotide
`carries an identifying fluo1·escent label. The sequencing occurs as single-nucleotide addition re(cid:173)
`action because a blocking group ex.is rs at tl1c 3'-0H position of the ribose sugar, prcvcnLi.ng
`additional base incorporation reactions by the polymerase. As such, the series of cvenrs in each
`rep includes the following in order of occurrence: (17) The nucleotide is added by polymerase,
`(b) unincorporated nuclcoridcs nre washed away, (t) the flow cell is imaged on both i11.ner sur(cid:173)
`faces ro identify each cluster that is reporting a auorescent signa l, (d) the Auorescent grnups arc
`chemically cleaved, and (e) the 3'-0U i chemically deblocked (Fig ure Jc). This seri es of ·reps
`i repeated for up to L50 nucleotide addition reactions, whereupon the second read pi:ep:m1tions
`begin. To read from the opposite end of each fn1gment cluster, the instnU11cnt nrst removes the
`
`194 Mnrdis
`
`00008
`
`

`

`a lllumina's library-preparation work flow
`
`b
`
`L
`
`Denaturation
`
`OH OH
`
`_L_I -
`-~ -
`~~ -'- · JI II ~
`P7 P5
`Grafted
`flow cell
`
`Template
`hybridization
`
`Initial
`extension
`
`DNA fragments
`
`0 0
`
`Blunting by fill-in
`and exonucl ease
`
`o O ! 0
`p ! p
`!
`~ i ~
`
`Phosphorylation
`p
`
`Addition of
`A-overh ang
`
`p
`
`Ligation
`to adapters
`
`p
`
`p
`
`"
`
`C
`
`(N
`
`G)
`
`C
`

`
`~

`
`T
`

`
`(I)
`
`T
`
`C
`
`5'
`
`3'
`
`S'
`
`0
`
`Fluor
`
`First
`denaturation
`
`First cycle
`annealing
`
`First cycle
`extension
`
`Second cycle
`denaturation
`
`H~~ pp~, Cle~~:ge
`ll-ill- ~--1
`
`3'h
`
`ock
`
`Incorporate
`Detect
`Deblock
`Cleave flu or
`
`Second cycle
`denaturation
`
`Second cycle
`annealing
`
`Second cycle ~
`extension MIIIII
`
`5~ ~y x
`
`'b
`DNAX9
`3'
`OH free 3' end
`
`Cluster
`amplification
`
`PS
`linearization
`
`Block with
`ddNTPs
`
`Denaturation and
`sequencing primer
`hybridization
`
`Figu, L' 3
`(11) Illumina® library-construction process. (b) Illumina cluster generation by bridge amplifica tion. (c) Sequencing by synthesis with
`reversible dye termina to rs.
`
`synthesized strands by denaturation and regenerates the clusters by performing a limited bridge
`amplification to improve the signal-to-noise ratio in the second read. After the amplification step,
`the opposite ends of the fragments are released from the flow cell surfaces by a different chemical
`cleavage reagent (corresponding to a labile group on the reverse adapter), and the fragments are
`primed with the reverse primer. Sequencing proceeds as described above. All of these steps occur
`on-instrument with the flow cell in place and without manual intervention, so the correlation of
`positi on from forward (first) to reverse (second) reads is maintained and yields a very high read-pair
`concordance upon read alignment to the reference genome.
`Illumina data have an error model that is described as having decreasing accuracy with
`increasing nucleotide addition steps. When errors occur, they are predominantly substitution
`errors, in which an incorrect nucleotide identity is assigned to the base. The error percentage
`of most Illumina reads is approximately 0.5% at best (i.e., 1 error in 200 bases). Sources of noise
`include (n) phasing, wherein increasing numbers of fragments fall out of phase with the majority
`
`www.n111111nlrroiews.org • Ne.r1-Ge11erntio11 Seq11e11t·i11g P/11tforn1s
`
`2 9 5
`
`00009
`
`

`

`of fragments in the cluster due to incomplete deblocking in prior cycles or, conversely, due to lack
`of a blocking group that allows an additional base to be incorporated and (b) residual fluorescence
`imcrfercace noise due to i.acornplete fluorescent label cleavage from previous cycles.
`Read lengths have increased from the original Solexa instrument at 25-bp single-end read , to
`the current Illumina HiSeq 2000 instrument's 150-bp paired-end reads. Increased read length has
`been one component that is contributing to an explosion in throughput-per-instrument run over
`a relarively short time frame (5 year ), from l Gb for the Solexa I G to 600 ,.b for the HiSeq 2000.
`The fatter i.nsa:wnent can thus produce sufficient dsirn coverage for six whole-lrnmau gcnonie
`sequences in approximately 11 days. The coverage per genome needed is approximately 30-folcl
`.-1nd with a 3-Gb genome wherein approximately 90% of the reads will map, 100 Gb are require~
`to produce the necessary 90 Gb of data per genome.
`The other contriburo,· to throughput has beeuche ability to use increasingl}1 mare-concearraced
`library dilutions onto the How cell, resulting in significant increases in duster density. The Hi. eq
`2000 was the first instrument to read clusters from both surfaces of the How cell channel , effectively
`doubling the throughput per run. Improvements in chemisuy have made deblocking and fluor(cid:173)
`removal steps more complete; polymerase engineering has improved incorporation fidelity and
`decreased errors and has decreased the G + C biases associated with the instrument at the brido-e
`amplification step.
`
`b
`
`9. NEXT-GENERATION SEQUENCING BY pH CHANGE MONITORING
`A completely different approach to next-generation sequencing is embodied in an insu·ument sys(cid:173)
`tem that detects the release of hydrogen ions, a by-product of nucleotide incorporation, as quanti(cid:173)
`tatcd changes in pH through a novel coupled silicon detector. This instrument was commercialized
`in 20 IO by lon TorrcnL (31), a company tha was later purchased by Life TechnologiesTM Corp.
`For d1is ,lpproach, library con m:iction includes DNA fragmentation, enzymatic end polishing,
`and adapter ligation. Amplification of libnuy fragments occurs by a unique approach !mown as
`emulsio1tPCR, whid, qnancitates the librn1y fragments and dilutes them to be mixed in equimolar
`quantities with smlllJ beads, P R reacrnars and DNA polymerase molecules (32). The beads have
`covalently linked adapter complementary sequences on their surfaces to facilitate amplification
`on tl1c bead. Thi mixture is tl1en shaken to form an emulsion so that the beads and DNA are
`encapsulated in a I: I ratio (on average) in oil micelles that also contain the reactants needed for
`PCR-based amplification. The r~ulting mixture is placed into a specific app.mnus tlrnt performs
`rhermaJ cycling of th.e emulsion, effectively allowing hundreds of Li1ousands o[ indi iJual PCR
`amplifications to occu r i.n parallel in one v sel. ubsequcnt steps are required lir t Lo scparntc
`tbe oil from the aqueous solution and beads (so-called enmlsion breaking) ,111d th.en ro emich the
`beads that were successfully amplified (to remove beads with insufficient DNA). Enriched beads
`are primed for seq11encing by annealin"a

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket