`
`The Annual Review ofAnalytical Chemistry is online
`at anchem,annualreviews.org
`This article’s doi:
`10.1146/annurev-anchem-062012-092628
`
`Copyright © 2013 by Annual Reviews.
`All rights reserved
`
`Next-Generation Sequencing
`Platforms
`
`Elaine R. Mardis
`
`The GenomeInstitute at Washington University School of Medicine, St. Louis,
`Missouri 63108; email; emardis@wustl.edu
`
`Keywords
`massively parallel sequencing, next-generation sequencing, reversible dye
`terminators, sequencing by synthesis, single-molecule sequencing,
`genomics
`
`Abstract
`
`Automated DNA sequencing instruments embody an elegant interplay
`among chemistry, engineering, software, and molecular biology and have
`built upon Sanger’s founding discovery of dideoxynucleotide sequencing to
`perform once-unfathomable tasks. Combined with innovative physical map-
`ping approachesthat helped to establish long-range relationships between
`cloned stretches of genomic DNA,fluorescent DNA sequencers produced
`reference genome sequences for model organismsandforthe reference hu-
`man genome. New types of sequencing instruments that permit amazing
`acceleration of data-collection rates for DNA sequencing have been devel-
`oped. Theability to generate genome-scale data sets is now transforming
`the natureofbiological inquiry. Here, I providean historical perspective of
`the field, focusing on the fundamental developments that predated the ad-
`vent of next-generation sequencing instruments and providing information
`about how these instruments work, their application to biological research,
`and the newest types of sequencers that can extract data from single DNA
`molecules.
`
`00001
`00001
`
`EX1012
`EX1012
`
`287
`
`
`
`1. INTRODUCTION
`
`Automated DNAsequencing instruments embody an elegant interplay among chemistry, engi-
`neering, software, and molecular biology and have built upon Sanger’s founding discovery of
`dideoxynucleotide sequencing to perform once-unfathomable tasks. Combined with innovative
`physical mapping approachesthat helped to establish long-range relationships between cloned
`stretches of genomic DNA,fluorescent DNAsequencers have been used to producereference
`genome sequences for model organisms (Escherichia coli, Drosophila melanogaster, Caenorhabditis
`elegans, Mus musculus, Arabidopsis thaliana, Zea mays) and for the reference human genome,Since
`2005, however, new types ofsequencing instruments that permit amazing acceleration of data-
`collection rates for DNA sequencing have been introduced by commercial manufacturers. Fo;
`example, single instruments can generate data to decipher an entire human genomewithin only
`2 weeks. Indeed, we anticipate instruments that will further accelerate this whole-genomege.
`quencing data—production timeline to days or hours in the near future. The ability to generate
`genome-scale data sets is now transforming the nature of biological inquiry, and the resulting
`increase in our understanding ofbiology will probably be extraordinary. In this review, I provide
`an historical perspective ofthe field, focusing on the fundamental developments that predated
`the advent of next-generation sequencing instruments, providing information about how mas-
`sively parallel instruments work and their application to biological research,andfinally discussing
`the newest types of sequencers that are capable of extracting sequence data from single DNA
`molecules.
`
`2. A BRIEF HISTORY OF DNA SEQUENCING
`
`DNAsequencing andits manifest discipline, known as genomics, are relatively new areas of
`endeavor. They are the result of combining molecular biology with nucleotide chemistry, both
`of which blossomedasscientific disciplines in the 1950s. Dr. Frederick Sanger’s laboratory at the
`Medical Research Council (MRC) in Cambridge, United Kingdom, began research to devise a
`method of DNAsequencinginthe early 1970s (1-3) after havingfirst published methods for RNA
`sequencingin the late 1960s (4-6). Sangeret al.’s (7) seminal 1977 publication describes a method
`for essentially tricking DNA polymerase into incorporating nucleotides with a slight chemical
`modification—the exchangeofthe 3’ hydroxyl group neededfor chain elongation with a hydrogen
`atom thatis functionally unable to participate in the reaction with the incoming nucleotide to
`extend the synthesized strand. Mixing proportions of the four native deoxynucleotides with one
`of four of their analogs, termed dideoxynucleotides, yields a collection of nucleotide-specific
`terminated fragments for each ofthe four bases (Figure 1). he fragments resulting from these
`reactions were separated by size on thin slab polyacrylamide gels; the A, C, G, and T reactions
`were performed for each template run in adjacent lanes. The fragmentpositions wereidentified
`by virtue of ??P, which was supplied in the reaction as labeled dATP molecules. When dried and
`exposed to X-ray film, the gel-separated fragments were visualized and subsequently read from
`the exposed film from bottomto top (shortest to longest fragments) by the naked eye. Thus, a long
`and labor-intensive process was completed, and the sequencing data for the DNAofinterest were
`in hand and ready for assembly, translation to amino acid sequence,or othertypes of analysis.
`Sequencing by radiolabeled methods underwent numerous improvementsfollowing its inven-
`tion until the mid 1980s. These improvements included the invention ofDNAsynthesis chemistry
`(8, 9) and, ultimately, of DNA synthesizers that can be used to make oligonucleotide primers for
`the sequencingreaction(providing a 3’-OH for extension); improved enzymes fromthe original
`E. coli Klenow fragmentpolymerase (more uniform incorporationofdideoxynucleotides) (10, 11);
`
`288
`
`Mardis
`
`00002
`00002
`
`
`
`Primer
`
`5)
`
`a
`
`Template-strand
`Polymerase + dNTPs
`
`
`
`+ddATPs
`
`+ddTTPs
`
`+ddCTPs
`
`+ddGTPs
`
`5! Qn CTA
`
`5) es CT
`
`5) ees
`
`5) Qs CTAAG
`
`5! Qa CTAA
`
`5) Qe CTAAGCT
`
`5) Qe CTAAGC
`
`2
`
`7 C
`
`A v
`
`C 5
`
`A
`
`
`
`Long
`fragments
`
`Short
`fragments
`
`Figure 1
`Sanger sequencing.
`
`Direction of
`G
`Direction of
`
`electrophoresis A|sequence read
`
`use of *°S- in place of? P-dATPforradiolabeling (sharper banding and hence longerreadlengths);
`and the use of thinner and/or longerpolyacrylamide gels (improved separation and longerread
`Icngths), among others. Although there were attempts at automating various steps of the process,
`notably the automated pipetting of sequencing reactions and the automated reading of the au-
`toradiograph banding patterns, most improvements were not sufficient to make this sequencing
`approachtrulyscalable to high-throughput needs.
`
`3. IMPACT OF FLUORESCENCE LABELING
`
`A significant changein the scalability of DNA sequencing wasintroduced in 1986, when Applied
`Biosystems, Inc. (ABI), commercialized a fluorescent DNA sequencing instrumentthat had been
`invented in Leroy Hood’s laboratory at the California Institute of Technology (12). In replacing
`the use of radiolabeled dATP with reactions primed by fluorescently labeled primers (different
`fluor for each nucleotide reaction), the laborious processes of gel drying, X-ray film exposure
`and developing, reading autoradiographs, and performing hand entry of the resulting sequences
`were eliminated. In this instrument, a raster scanning laser beam crossed the surface of the gel
`plates to provide an excitation wavelength for the differentially labeled fluorescent primers to
`be detected during the electrophoretic separation of fragments. Thus,significant manualeffort
`and several sources of error were eliminated. By use oftheinitial versions of this instrument,
`great increases were madein the daily throughput of sequencing data production, and several
`
`00003
`www.annuAQOQBrg © Next-Generation Sequencing Platforms
`
`289
`
`
`
`laboratories used newly available automated pipetting stations to decrease the effort and erro;
`rate of the upstream sequencing reaction pipetting steps (13). During this time, investigators
`made additional improvements to sequencing enzymology and processes, including the ability
`to perform cycled sequencingreactions catalyzed by thermostable sequencing polymerases(14)
`that were patterned after the polymerase chain reaction (PCR), which wasfirst described in 19g
`by Mullis and colleagues (15). By incorporating linear (cycled) amplification into the sequencing
`reaction, one could begin with significantly lower input template DNAand hence could produce
`uniformresults across a range of DNAyields (from automated isolation methods in multiwel|
`plates, for example). Improvements to chemistry were also important, as fluorescent dye~labeled
`dideoxynucleotides (known as terminators) were introduced (16). Because the terminating
`nucleotide wasidentified by its attachedfluor,all four reactions could be combinedintoa single
`reaction, greatly decreasing the cost of reagents and the input DNA requirements. Finally, the
`per run throughputof the sequencers increased during this time (17), ultimately permitting
`96 samples to be loaded on one gel. These technological breakthroughs combined to make
`96-well and ultimately 384-well sequencing reactions a major contributorto scalability. These
`high-throughput slab gel fluorescence instruments largely contributed to the sequencing of
`several model organism genomes,and although they were impressive in their capacity to produce
`data, they still contained several manual and hence labor-intensive and error-pronesteps. These
`limitations largely centered aroundcasting polyacrylamide gels and loading samples by hand.
`
`4. IMPACT OF CAPILLARY OVER SLAB GEL ELECTROPHORESIS
`
`The rate-limiting manual steps in slab gels were addressed in 1999 with the introduction of
`capillary sequencing instruments,first the MegaBACE™sequencer from Molecular Dynamics
`(18) and then the ABI PRISM® 3700. These instrumentssolved the slab gel problem by directly
`injecting a polymeric separation matrix intocapillaries that provided single-nucleotide resolution.
`Samples, by definition, could also be loaded directly from the microtiterplate to the capillaries for
`separationby use of electrical current pulses through a process knownaselectrokinetic injection.
`Following the separation and detection of reaction products, the polymer matrix was replaced by
`pumping in new matrix. Thus,these instruments eliminatedan entire series of rate-limitingsteps.
`Downstreamactivities were furthersimplified because thecapillaries werefixed in theirpositions,
`so there was no needfortracking lanes ontheslab gel image, and subsequent data extractionand
`base-calling were much faster and more accurate. Lastly, the run times were greatly accelerated
`due to the rapid heatdissipation ofthe capillaries over thick glass plates. The ABI PRISM 3700
`instruments and a later upgrade (ABI 3730) were principal data-generating instruments forthe
`human and mouse genomeprojects, among others. Their scalability and ease of use came ata
`crucial time, when large-scale robotics to perform DNAextraction and sequencing were available
`in specialized facilities for the clone-based front endofthe process.
`Indeed, these reference genomesthat were produced for major model organisms, humanand
`plant, provided not only a fundamental advance forbiological studies in these organismsbutalso
`the basis for the utility of next-generation sequencing instruments. Next-generation sequencing
`is described in the nextsection.
`
`5. GENERAL PRINCIPLES OF NEXT-GENERATION SEQUENCING
`Beginning in 2005, the traditional Sanger-based approach to DNA sequencing has experienced
`revolutionary changes (19, 20). The previous “top-down” approach involved characterizing large
`clones by low-resolution mapping as a meansto organize the high-resolution sequencing o! smaller
`
`290
`
`Mardis
`
`00004
`00004
`
`
`
`subclones that were assembled and finished to recapitulate eachoriginating,largerclone (21). The
`sequencesof the larger clones were thenstitched togetherat their overlapped ends to reconstruct
`entire chromosomes (with small gaps). By contrast, next-generation sequencing instruments do
`not require a cloning step perse. Rather, the DNAto be sequencedis used to constructa library
`of fragments that have synthetic DNAs(adapters) added covalently to each fragment end by use
`of DNAligase. ‘These adapters are universal sequences, specific to each platform, that can be
`used to polymerase-amplify the library fragments during specific steps of the process. Another
`difference is that next-generation sequencing does not require performing sequencingreactions in
`microtiterplate wells. Rather, the library fragments are amplified in situ on a solid surface, either
`a bead ora flat glass microfluidic channel that is covalently derivatized with adapter sequences
`that are complementary to thoseon thelibrary fragments. This amplificationis digital in nature;
`in other words, each amplified fragment yields a single focus (a bead- or surface-borne cluster
`of amplified DNA,all of which originated from a single fragment). Amplification is required
`to provide sufficient signal from each of the DNA sequencingreaction steps that determine the
`sequencing dataforthat library fragment. The scale and throughputofnext-generation sequencing
`are often referred to as massively parallel, which is an appropriate descriptor for the process
`that follows fragment amplification to yield sequencing data. In Sanger sequencing, the reaction
`that produces the nested fragmentset is distinct from the process that separates and detects the
`fragments by size to produce a linear sequence of bases. In massively parallel sequencing, the
`process is a stepwise reaction series that consists of (¢) a nucleotide addition step, (0) a detection
`step that determines the identity of the incorporated nucleotides on each fragment focus being
`sequenced, and (¢) a wash step that may include chemistry to remove fluorescentlabels or blocking
`groups. In essence, next-generation sequencing instruments conduct sequencing and detection
`simultaneously ratherthanas distinct processes, one of which is completed before the othertakes
`place. Moreover, these steps are performed in a format that allows hundreds of thousands to
`billions of reaction foci to be sequenced during each instrument runand, hence, at a capacity per
`instrumentthat can produce enormousdatasets.
`Onefinal difference between Sanger sequencing data and next-generation sequencing data is
`the read length, or the numberof nucleotides obtained from each fragment being sequenced.
`In Sanger sequencing, the read length was determined largely by a combination of gel-related
`factors, such as the percentage of polyacrylamide, the electrophoresis conditions, the time of
`separation, and the length and thickness of the gel. In next-generation sequencing, the read
`length is a function ofthe signal-to-noise ratio. Because the sources of noise differ according to
`the technology, specifics are described for each type of sequencing below. However, the major
`impactofthe signal-to-noise ratio is to limit the read length from all next-generation sequencing
`instruments, all of which produce shorter reads than does Sanger sequencing.
`Shorter read lengths, in turn, are a differentiation point because, although short reads can be
`assembled as are traditional Sanger reads, based on shared sequence, the lower extent of shared
`sequence (due to read length) limits the ability to assemble these reads, so the overall length of
`contiguous sequence that can be assembledis limited. This limitation is exacerbated by genome
`size and complexity (e.g., repetitive content and gene families), so genomes such as that ofthe
`human (3 Gb and ~48% repetitive content) cannot be reassembled from the componentreads
`of a whole-genome shotgun of next-generation sequencing data. Rather, because a high-quality
`reference genomeexists for many model organisms and for humans, sequenceread alignmentis a
`More practical approach to sequencing data analysis from next-generation read lengths. Specific
`algorithmsto approachshortread alignmenthave been devised; they provide a score-based metric
`indicative of that sequence’s best fit in the genome, whereby sequencesthat contain mostly or
`entirely repetitive content score lowest due to the uncertainty of theirorigin (22, 23). Improved
`
`00005
`‘
`90005
`wow.annuatreviews.org « Next-Generation Sequencing Platforms
`
`291
`
`
`
`a
`Genomic DNA
`b
`DecttlToctLaesDNA
`
`wv
`
`Fragment(200-500 bp) es «Fragment
`Ss 5 kh)
`
`
`SSSSSSSSlot
`
`Ligate adapters
`
`
`s
`
`Biot:
`Bio
`ZA — NN
`“\=s Se
`© Circularize
`
`Al
`
`SPI
`
`
`
`SP2
` A2
`
`Generate clusters
`
`SP2
`
` A2
`
`
`
`Flowcell
`
`
`
`A2
`
`spt
`Sequencefirst end
`ese tbinesenes
`
`
`
`
`SP2
`
`Regerate clusters and
`sequence paired end
`
`Ff\\ =~\
`
`(400-600 bp)
`
`
`wv
`—
`gment
`—
`Fra
`
`=
`Enrich
`®
`
`
`biotinylated
`
`*
`fragments
`a
`
`Al SPI
`SSLigate
`t adapters
`SP2 A2
`
`
`
`Generate
`clusters
`
`SPI
`Al
`—eesceeco
`first end
`
`
`A2
`
`Sequence
`
` Al
`
`Regerate
`clusters and
`sequence
`paired end
`
`Figure 2
`Comparison between(a) paired-end and (b) mate-pair sequencing library-construction processes.
`
`certainty can be obtained from longer read lengths, and several next-generation sequencers have
`offered increases in read length overtime and refinementoftheir signal-to-noise characteristics
`to allowthis certainty. Another fundamental improvementhas resulted from so-called paired-end
`sequencing, namelyproducing sequence data from both ends ofeachlibrary fragment. Readpairs
`can be obtained by one of two mechanisms: (a) paired ends or (/) mate pairs (Figure 2),
`In paired-end sequencing,a linear fragment with a length ofless than | kb has adapter sequences
`at each end with different primingsites on each adapter. The sequencing instrumentis designedto
`sequencefromone adapterprimingsite by use ofthe stepwise sequencing described above; then,
`00006
`00006
`
`Mardis
`
`292
`
`
`
`in a subsequent reaction, the opposite adapter is primed and sequence data are obtained. These
`reads are paired with one anotherduringthe alignmentstep in data analysis, which provides higher
`overall certainty of placement than doesa single end read of the same length. Most alignment
`algorithms also take into account the average length of fragments in the sequencing library to
`make the most accurate placementpossible. In mate-pair sequencing,thelibrary is constructed
`of fragments longer than 1 kb, and instead of ligating two adapters at each fragment end, the
`fragmentis circularized around a single adapter and both fragment endsligate to the adapter
`ends (24). These circular molecules are then treated by various molecular biology schemes(e.g.,
`by type IIS endonuclease digestion or by nick translation) to producea single linear fragment
`that holds both endsof the original DNA fragment with a central adapter. The remaining DNA
`remnants are removed by washing steps, as the central adapterthat carries the mate-pair ends
`is biotinylated and can be captured using streptavidin magnetic beads. Typically, the resulting
`linear fragments have distinct adapters ligated to their ends, and sequencing is obtained from two
`sequential reads as described above. Again, the resulting readsare aligned as a pair to the genome
`of interest, wherein the separation distance between the readsis longer overall than that obtained
`with the paired-end approach. Often, mate-pair and paired-end reads are used in combination
`to achieve genome coverage when attempting longer-range assemblies through difficult regions
`of a genome or whenattempting to assemble a genomeforthefirst time (de novo sequencing)
`(25). In this combined coverage approach, the mate-pair reads provide longer-range order and
`orientation (a separation of up to 20 kbis possible), and the paired ends provide the ability to
`assemble, in a localized way, difficult-to-sequence regions that can then be layered on top of the
`scaffold provided by an assembly of mate-pair reads.
`
`6. DIGITAL DATA TYPE AND RAMIFICATIONS
`
`Next-generation sequencing libraries, carefully constructed to avoid sources of biasing and du-
`plication, are highly digital. Specifically, the fact that each read originates from a consistently
`detected focus that results from the amplification of a single library fragment means that the
`data are inherently digital in nature. Thus, a quantitation of abundance can beinferred from this
`one-to-onerelationship, which has ramifications forbiological systemsthat are being investigated
`by next-generation sequencing. For example, chromosomal amplifications that are commonin
`cancer genomes can be quantitated with respect to the extent of amplification (ploidy) on each
`chromosome(26). Similarly, the read prevalence of expressed genesidentified by RNA sequencing
`can be directly correlated to their expression level and compared across replicates or with other
`samples from the samestudy (27). In population-basedstudies that use next-generation sequenc-
`ing to characterize the individual species present in an isolate (metagenomics), a similarability
`to correlate the presence of each species as a proportionofthe overall population can be derived
`from the digital nature of next-generation sequencing data (28).
`
`7. SOURCES OF NOISE AND ERROR MODELS
`
`As mentioned above,althoughreadlength in next-generation sequencingis notlimited by an elec-
`trophoretic separationstep, the majorlimitation of read length is the signal-to-noise ratio during
`stepwise sequencing. Dependingontheplatform, the contributors to noise in the sequencing reac-
`tion differ, and thereis interplay between the sourcesof noise and the sequencingerrors that may
`result, This interplay gives rise to what is commonly referredto as the error model and is highly
`instrument and chemistry specific. In general, one typically explores both read-length limitations
`and errortypes by sequencing a reference set of genes oran entire genome, then comparing the
`
`5
`‘
`00007
`00007
`www.annialreviews.org © Next-Generation Sequencing Platforms
`
`293
`
`
`
`sequences obtained with the high-quality reference gene set or genome(29). Inthis approach,the
`different types of errors (substitution errors or insertion and deletionerrors) can beidentified, ayd
`the error model (randomversus systematic errors) can be defined. Representationbiases canalso be
`uncovered by this approach when one examinesthe aligned reads for evidence of complete orpay.
`tial lack of representation.Ifthis lack of representation canbeclassified (for example, regions with
`>95% G+ C content), thenthe bias can be defined. Typically, the more sequencereadsare exayy_
`ined, the better defined are the error model, coveragebiases, and their contributing sources, Foy
`example, the use ofPCRorother types ofenzymatic amplification maycontribute systematic errors
`during the library construction or amplification processes described above. One might addressthis
`problem,independentlyof the instrumentsystem used, by employing a high-fidelity polymerase
`and/or bylimiting the numberof amplification cycles when possible. Somesourcesoferror, how-
`ever, are simply instrumentspecific and may not be readily addressed by the end user (although
`they may improve overtime with new chemistryandsoftware from the manufacturer). As discussed
`below, instruments that uselibrary amplification to enhancesignal produced from the sequencing
`process forego someofthe signal-to-noise issues that are experienced in single-molecule systems
`because there are so manyidentical fragments being sequencedperfocus that the numberoffrag-
`ments that are not misreporting far exceeds the number of fragments that are. In general, noise
`accumulates during the stepwise sequencing process and ultimately limits the read length obtained
`oncethesignal from any base incorporation step is outcompeted by incorrector out-of-phasein-
`corporation events, residual signal from priorreactions or reactants, and othersourcesofnoise,
`
`8. NEXT-GENERATION SEQUENCING WITH REVERSIBLE
`DYE TERMINATORS
`
`It is informative to discuss some of the predominant approaches to next-generation sequencing
`as a meansof tying together the concepts presented herein. Thefirst instrument system involves
`the use of reversible dye terminators in enzymatic sequencing of amplified foci oflibrary frag-
`ments. ‘This system wasinitially developed in 2007 by Solexa and was subsequently acquired by
`Ilumina®, Inc. (30). The library work flow follows steps similar to those outlined above, namely
`fragmentation of high-molecular weight DNA, enzymatic trimming, and adenylation ofthe frag-
`ment ends and ligation of specific adapters (Figure 3a). ‘The Illumina microfluidic conduit is a
`flow cell composed offlat glass with eight microfluidic channels, each decorated by covalent at-
`tachmentof adapter sequences complementary to the library adapters. By careful quantitation of
`the library concentration,a precisely diluted solution oflibrary fragments is amplified in situ on
`the Howcell surfaces by use ofa bridge amplification step to produce foci for sequencing (clusters)
`(Figure 3b), A subsequent step chemically effects the release of fragment ends carrying the same
`adapter, which is then primed with a complementary synthetic DNA(primer) to provide free
`3'-OH groupsthat can be extended in subsequent stepwise sequencing reactions, In reversible dye
`terminator sequencing, all four nucleotides are provided in each cycle because each nucleotide
`carries an identifying fluorescent label. The sequencing occurs as single-nucleotide addition re-
`actions because a blocking group exists at the 3'-OHpositionof the ribose sugar, preventing
`additional base incorporation reactions by the polymerase. As such, the series of events in each
`step includes the following, in order of occurrence: (a) The nucleotide is added by polymerase,
`(6) unincorporated nucleotides are washed away, (c) the flow cell is imaged on both inner sur-
`faces to identify each cluster that is reporting a fluorescent signal, (d) the fluorescent groups are
`chemically cleaved, and (e) the 3/-OLH is chemically deblocked (Figure 3¢). This series ofsteps
`is repeated for up to 150 nucleotide addition reactions, whereuponthe secondread preparations
`begin. ‘Vo read from the opposite end of each fragmentcluster, the instrumentfirst removes the
`00008
`00008
`
`Mardis
`
`294
`
`
`
`a Illumina’s library-preparation work flow
`DNA fragments
`
`PeeRSaBRaRRSRARREARSRESRREEREURERIURRERNERRRRRRRAGGAP
`
`Initial
`extension
`
`Denaturation
`
`CORTISOLOLeNOaOH
`29
`andexonuclease a0r
`«®
`o
`Blunting byfill-in
`PERRIS Cafed
`Template
`P
`‘
`flow cell
`hybridization
`p
`Phosphorylation
`
`Addition of
`A-overhang
`
` ®
`
`®
`
`P
`
`fc
`
`corormnre
`Sr20cec09
`ss Ste,
`STOLEEEEUESESENNONINIIND
`
`5!
`
`PP
`
`Emission
`
`Excitation
`
`Seam
`
`ra
`
`®
`
`©
`
`©
`a
`
`(A)
`
`@)
`
`T
`
`Ligation
`to adapters
`
`>
`
`>
`
`>
`
`a
`
`First
`
`First cycle
`annealing
`
`First cycle
`extension
`
`denaturation
`
`total
`
`n=35
`a,
`
`Secondcycle
`denaturation
`
`Second cycle
`annealing
`
`Second cycle
`extension
`
`Fluor
`Cleavage
`site
`
`0
`
`HN
`
`° A ch @
`, pinek
`
`Incorporate
`Detect
`Deblock
`Cleave fluor
`
`Secondcycle denaturation
`
`
`
`Denaturation and
`Block with
`Cluster
`‘
`Ce
`
`a ddNTPs—sequencing primerOH free 3'end amplification inate
`2
`hybridization
`
`
`
`
`
`Figure 3
`(a) Ullumina® library-construction process. (}) Illumina cluster generation by bridge amplification.(c) Sequencing by synthesis with
`reversible dye terminators.
`
`
`
`synthesized strands by denaturation and regeneratesthe clusters by performing a limited bridge
`amplification to improve thesignal-to-noise ratio in the second read. After the amplification step,
`the opposite endsof the fragmentsare released from the flow cell surfaces by a different chemical
`cleavage reagent (correspondingto a labile group onthe reverse adapter), and the fragments are
`primed with the reverse primer. Sequencing proceeds as described above.All of these steps occur
`on-instrumentwith the flow cell in place and without manual intervention,so thecorrelation of
`position from forward(first) to reverse (second) readsis maintained andyields a very high read-pair
`concordance uponread alignmentto the reference genome.
`Illumina data have an error model that is described as having decreasing accuracy with
`increasing nucleotide addition steps. When errors occur, they are predominantly substitution
`etrors, in which an incorrect nucleotide identity is assigned to the base. The error percentage
`of most Illuminareads is approximately 0.5% at best (i.e., 1 error in 200 bases). Sources of noise
`include (@) phasing, wherein increasing numbers of fragments fall out of phase with the majority
`00009
`00009
`www.annualreviews.org * Next-Generation Sequencing Platforms
`
`
`
`of fragments in the cluster due to incomplete deblockingin priorcycles or, conversely, due to lack
`of a blocking groupthatallows an additional base to be incorporated and (8) residual fluorescence
`interference noise due to incomplete fluorescent label cleavage from previous cycles.
`Read lengths have increased from the original Solexa instrument at 25-bp single-end reads to
`the current Hlumina HiSeq 2000 instrument’s 150-bp paired-endreads. Increased readlength has
`been one componentthatis contributing to an explosion in throughput-per-instrument run ovey
`a relatively short time frame(5 years), from 1 Gbfor the Solexa 1G to 600 Gbforthe HiSeq 2000.
`Thelatter instrument can thus produce sufficient data coverage for six whole-humangenome
`sequences in approximately 11 days. The coverage per genome needed is approximately 30-fold,
`and with a 3-Gb genome wherein approximately 90% ofthe reads will map, 100 Gbare required
`to produce the necessary 90 Gb ofdata per genome.
`Theother contributorto throughputhas beenthe ability to use increasingly more-concentrated
`librarydilutions onto the flowcell, resulting in significantincreasesin cluster density. The HiSeq
`2000 wasthefirst instrumenttoread clusters from bothsurfaces ofthe How cell channels, effectively
`doubling the throughputper run. Improvements in chemistry have made deblocking and fuor-
`removal steps more complete; polymerase engineering has improved incorporation fidelity ang
`decreased errors and has decreased the G + C biases associated with the instrumentat the bridge
`amplification step.
`
`9. NEXT-GENERATION SEQUENCING BY pH CHANGE MONITORING
`A completely different approach to next-generation sequencingis embodied inan instrumentsys-
`tem thatdetects the release of hydrogen ions, a by-productof nucleotide incorporation, as quanti-
`tated changes in pH throughanovel coupled silicon detector. This instrument was commercialized
`in 2010 by Ion ‘Torrent (31), a company that was later purchased by Life Technologies™Corp,
`For this approach, library construction includes DNA fragmentation, enzymatic end polishing,
`and adapter ligation. Amplification of library fragments occurs by a unique approach knownas
`emulsion PCR,which quantitates the library fragments and dilutes them to be mixed in equimolar
`quantities with small beads, PCRreactants, and DNA polymerase molecules (32). The beads have
`covalently linked adapter complementary sequences on their surfaces to facilitate amplification
`on the bead. This mixture is then shaken to form an emulsion so that the beads and DNAare
`encapsulated in a 1:1 ratio (on average) in oil micelles that also contain the reactants needed for
`PCR-based amplification. The resulting mixture is placed into a specific apparatus that performs
`thermal cycling of the emulsion, effectively allowing hundreds of thousands ofindividual PCR
`amplifications to occur in parallel in one vessel, Subsequent steps are required first to separate
`the oil from the aqueoussolution and beads (so-called emulsion breaking) and then to enrich the
`beads that were successfully amplified (to remove beads with insufficient DNA). Enrichedbeads
`are primedfor sequencing by annealing a sequencingprimer and are deposited into the wells of an
`Ion Chip,a specialized silicon chip designed to detect pH changeswithin individual wellsofthe
`sequenceras the reaction progresses stepwise. Figure 4a shows that the Ion Chip has an upper
`surface that serves as a microfluidic conduit to deliver the reactants needed for the sequencing
`reaction. The lowersurface of the Ion Chip interfaces directly with a hydrogen ion detector that
`translates released hydrogenions fromeach well into a quantitative readout of nucleotide bases
`that were incorporated in each reaction step (Figure 46). In this instrument, the reactant flow
`is by nucleotide in a systematic order because there is no label to provide base-specific identity
`upon incorporation. The adapter sequence contains a series of four single bases downstream of
`the primer’s 3’-OH,in a sequence that matchesthefirst four individual nucleotide flows across
`00010
`00010
`
`296
`
`Mardis
`
`
`
`S| —«<«#¢ 3'
`
`4 dNTPs
`
`Template
`ee OG
`
`Example
`
`Ma Matta aaa
`
`
`Fiewre 4
`
`(a) Sxructure of the Ion Torrent Ion Chip used in pH-based sequencing, (b) pHsensing of nucleotide incorporation.
`
`

Accessing this document will incur an additional charge of $.
After purchase, you can access this document again without charge.
Accept $ ChargeStill Working On It
This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.
Give it another minute or two to complete, and then try the refresh button.
A few More Minutes ... Still Working
It can take up to 5 minutes for us to download a document if the court servers are running slowly.
Thank you for your continued patience.

This document could not be displayed.
We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.
You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.
Set your membership
status to view this document.
With a Docket Alarm membership, you'll
get a whole lot more, including:
- Up-to-date information for this case.
- Email alerts whenever there is an update.
- Full text search for other cases.
- Get email alerts whenever a new case matches your search.

One Moment Please
The filing “” is large (MB) and is being downloaded.
Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!
If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document
We are unable to display this document, it may be under a court ordered seal.
If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.
Access Government Site