`Next-Generation Sequencing
`Elaine R. Mardis
`The GenomeInstitute at Washington University School of Medicine, St. Louis,
`Missouri 63108; email; emardis@wustl.edu
`massively parallel sequencing, next-generation sequencing, reversible dye
`terminators, sequencing by synthesis, single-molecule sequencing,
`Automated DNA sequencing instruments embody an elegant interplay
`among chemistry, engineering, software, and molecular biology and have
`built upon Sanger’s founding discovery of dideoxynucleotide sequencing to
`perform once-unfathomable tasks. Combined with innovative physical map-
`ping approachesthat helped to establish long-range relationships between
`cloned stretches of genomic DNA,fluorescent DNA sequencers produced
`reference genome sequences for model organismsandforthe reference hu-
`man genome. New types of sequencing instruments that permit amazing
`acceleration of data-collection rates for DNA sequencing have been devel-
`oped. Theability to generate genome-scale data sets is now transforming
`the natureofbiological inquiry. Here, I providean historical perspective of
`the field, focusing on the fundamental developments that predated the ad-
`vent of next-generation sequencing instruments and providing information
`about how these instruments work, their application to biological research,
`and the newest types of sequencers that can extract data from single DNA
`Automated DNAsequencing instruments embody an elegant interplay among chemistry, engi-
`neering, software, and molecular biology and have built upon Sanger’s founding discovery of
`dideoxynucleotide sequencing to perform once-unfathomable tasks. Combined with innovative
`physical mapping approachesthat helped to establish long-range relationships between cloned
`stretches of genomic DNA,fluorescent DNAsequencers have been used to producereference
`genome sequences for model organisms (Escherichia coli, Drosophila melanogaster, Caenorhabditis
`elegans, Mus musculus, Arabidopsis thaliana, Zea mays) and for the reference human genome,Since
`2005, however, new types ofsequencing instruments that permit amazing acceleration of data-
`collection rates for DNA sequencing have been introduced by commercial manufacturers. Fo;
`example, single instruments can generate data to decipher an entire human genomewithin only
`2 weeks. Indeed, we anticipate instruments that will further accelerate this whole-genomege.
`quencing data—production timeline to days or hours in the near future. The ability to generate
`genome-scale data sets is now transforming the nature of biological inquiry, and the resulting
`increase in our understanding ofbiology will probably be extraordinary. In this review, I provide
`an historical perspective ofthe field, focusing on the fundamental developments that predated
`the advent of next-generation sequencing instruments, providing information about how mas-
`sively parallel instruments work and their application to biological research,andfinally discussing
`the newest types of sequencers that are capable of extracting sequence data from single DNA
`DNAsequencing andits manifest discipline, known as genomics, are relatively new areas of
`endeavor. They are the result of combining molecular biology with nucleotide chemistry, both
`of which blossomedasscientific disciplines in the 1950s. Dr. Frederick Sanger’s laboratory at the
`Medical Research Council (MRC) in Cambridge, United Kingdom, began research to devise a
`method of DNAsequencinginthe early 1970s (1-3) after havingfirst published methods for RNA
`sequencingin the late 1960s (4-6). Sangeret al.’s (7) seminal 1977 publication describes a method
`for essentially tricking DNA polymerase into incorporating nucleotides with a slight chemical
`modification—the exchangeofthe 3’ hydroxyl group neededfor chain elongation with a hydrogen
`atom thatis functionally unable to participate in the reaction with the incoming nucleotide to
`extend the synthesized strand. Mixing proportions of the four native deoxynucleotides with one
`of four of their analogs, termed dideoxynucleotides, yields a collection of nucleotide-specific
`terminated fragments for each ofthe four bases (Figure 1). he fragments resulting from these
`reactions were separated by size on thin slab polyacrylamide gels; the A, C, G, and T reactions
`were performed for each template run in adjacent lanes. The fragmentpositions wereidentified
`by virtue of ??P, which was supplied in the reaction as labeled dATP molecules. When dried and
`exposed to X-ray film, the gel-separated fragments were visualized and subsequently read from
`the exposed film from bottomto top (shortest to longest fragments) by the naked eye. Thus, a long
`and labor-intensive process was completed, and the sequencing data for the DNAofinterest were
`in hand and ready for assembly, translation to amino acid sequence,or othertypes of analysis.
`Sequencing by radiolabeled methods underwent numerous improvementsfollowing its inven-
`tion until the mid 1980s. These improvements included the invention ofDNAsynthesis chemistry
`(8, 9) and, ultimately, of DNA synthesizers that can be used to make oligonucleotide primers for
`the sequencingreaction(providing a 3’-OH for extension); improved enzymes fromthe original
`E. coli Klenow fragmentpolymerase (more uniform incorporationofdideoxynucleotides) (10, 11);
`Polymerase + dNTPs
`5! Qn CTA
`5) es CT
`5) ees
`5) Qs CTAAG
`5! Qa CTAA
`7 C
`A v
`C 5
`Figure 1
`Sanger sequencing.
`Direction of
`Direction of
`electrophoresis A|sequence read
`use of *°S- in place of? P-dATPforradiolabeling (sharper banding and hence longerreadlengths);
`and the use of thinner and/or longerpolyacrylamide gels (improved separation and longerread
`Icngths), among others. Although there were attempts at automating various steps of the process,
`notably the automated pipetting of sequencing reactions and the automated reading of the au-
`toradiograph banding patterns, most improvements were not sufficient to make this sequencing
`approachtrulyscalable to high-throughput needs.
`A significant changein the scalability of DNA sequencing wasintroduced in 1986, when Applied
`Biosystems, Inc. (ABI), commercialized a fluorescent DNA sequencing instrumentthat had been
`invented in Leroy Hood’s laboratory at the California Institute of Technology (12). In replacing
`the use of radiolabeled dATP with reactions primed by fluorescently labeled primers (different
`fluor for each nucleotide reaction), the laborious processes of gel drying, X-ray film exposure
`and developing, reading autoradiographs, and performing hand entry of the resulting sequences
`were eliminated. In this instrument, a raster scanning laser beam crossed the surface of the gel
`plates to provide an excitation wavelength for the differentially labeled fluorescent primers to
`be detected during the electrophoretic separation of fragments. Thus,significant manualeffort
`and several sources of error were eliminated. By use oftheinitial versions of this instrument,
`great increases were madein the daily throughput of sequencing data production, and several
`www.annuAQOQBrg © Next-Generation Sequencing Platforms
`laboratories used newly available automated pipetting stations to decrease the effort and erro;
`rate of the upstream sequencing reaction pipetting steps (13). During this time, investigators
`made additional improvements to sequencing enzymology and processes, including the ability
`to perform cycled sequencingreactions catalyzed by thermostable sequencing polymerases(14)
`that were patterned after the polymerase chain reaction (PCR), which wasfirst described in 19g
`by Mullis and colleagues (15). By incorporating linear (cycled) amplification into the sequencing
`reaction, one could begin with significantly lower input template DNAand hence could produce
`uniformresults across a range of DNAyields (from automated isolation methods in multiwel|
`plates, for example). Improvements to chemistry were also important, as fluorescent dye~labeled
`dideoxynucleotides (known as terminators) were introduced (16). Because the terminating
`nucleotide wasidentified by its attachedfluor,all four reactions could be combinedintoa single
`reaction, greatly decreasing the cost of reagents and the input DNA requirements. Finally, the
`per run throughputof the sequencers increased during this time (17), ultimately permitting
`96 samples to be loaded on one gel. These technological breakthroughs combined to make
`96-well and ultimately 384-well sequencing reactions a major contributorto scalability. These
`high-throughput slab gel fluorescence instruments largely contributed to the sequencing of
`several model organism genomes,and although they were impressive in their capacity to produce
`data, they still contained several manual and hence labor-intensive and error-pronesteps. These
`limitations largely centered aroundcasting polyacrylamide gels and loading samples by hand.
`The rate-limiting manual steps in slab gels were addressed in 1999 with the introduction of
`capillary sequencing instruments,first the MegaBACE™sequencer from Molecular Dynamics
`(18) and then the ABI PRISM® 3700. These instrumentssolved the slab gel problem by directly
`injecting a polymeric separation matrix intocapillaries that provided single-nucleotide resolution.
`Samples, by definition, could also be loaded directly from the microtiterplate to the capillaries for
`separationby use of electrical current pulses through a process knownaselectrokinetic injection.
`Following the separation and detection of reaction products, the polymer matrix was replaced by
`pumping in new matrix. Thus,these instruments eliminatedan entire series of rate-limitingsteps.
`Downstreamactivities were furthersimplified because thecapillaries werefixed in theirpositions,
`so there was no needfortracking lanes ontheslab gel image, and subsequent data extractionand
`base-calling were much faster and more accurate. Lastly, the run times were greatly accelerated
`due to the rapid heatdissipation ofthe capillaries over thick glass plates. The ABI PRISM 3700
`instruments and a later upgrade (ABI 3730) were principal data-generating instruments forthe
`human and mouse genomeprojects, among others. Their scalability and ease of use came ata
`crucial time, when large-scale robotics to perform DNAextraction and sequencing were available
`in specialized facilities for the clone-based front endofthe process.
`Indeed, these reference genomesthat were produced for major model organisms, humanand
`plant, provided not only a fundamental advance forbiological studies in these organismsbutalso
`the basis for the utility of next-generation sequencing instruments. Next-generation sequencing
`is described in the nextsection.
`Beginning in 2005, the traditional Sanger-based approach to DNA sequencing has experienced
`revolutionary changes (19, 20). The previous “top-down” approach involved characterizing large
`clones by low-resolution mapping as a meansto organize the high-resolution sequencing o! smaller
`subclones that were assembled and finished to recapitulate eachoriginating,largerclone (21). The
`sequencesof the larger clones were thenstitched togetherat their overlapped ends to reconstruct
`entire chromosomes (with small gaps). By contrast, next-generation sequencing instruments do
`not require a cloning step perse. Rather, the DNAto be sequencedis used to constructa library
`of fragments that have synthetic DNAs(adapters) added covalently to each fragment end by use
`of DNAligase. ‘These adapters are universal sequences, specific to each platform, that can be
`used to polymerase-amplify the library fragments during specific steps of the process. Another
`difference is that next-generation sequencing does not require performing sequencingreactions in
`microtiterplate wells. Rather, the library fragments are amplified in situ on a solid surface, either
`a bead ora flat glass microfluidic channel that is covalently derivatized with adapter sequences
`that are complementary to thoseon thelibrary fragments. This amplificationis digital in nature;
`in other words, each amplified fragment yields a single focus (a bead- or surface-borne cluster
`of amplified DNA,all of which originated from a single fragment). Amplification is required
`to provide sufficient signal from each of the DNA sequencingreaction steps that determine the
`sequencing dataforthat library fragment. The scale and throughputofnext-generation sequencing
`are often referred to as massively parallel, which is an appropriate descriptor for the process
`that follows fragment amplification to yield sequencing data. In Sanger sequencing, the reaction
`that produces the nested fragmentset is distinct from the process that separates and detects the
`fragments by size to produce a linear sequence of bases. In massively parallel sequencing, the
`process is a stepwise reaction series that consists of (¢) a nucleotide addition step, (0) a detection
`step that determines the identity of the incorporated nucleotides on each fragment focus being
`sequenced, and (¢) a wash step that may include chemistry to remove fluorescentlabels or blocking
`groups. In essence, next-generation sequencing instruments conduct sequencing and detection
`simultaneously ratherthanas distinct processes, one of which is completed before the othertakes
`place. Moreover, these steps are performed in a format that allows hundreds of thousands to
`billions of reaction foci to be sequenced during each instrument runand, hence, at a capacity per
`instrumentthat can produce enormousdatasets.
`Onefinal difference between Sanger sequencing data and next-generation sequencing data is
`the read length, or the numberof nucleotides obtained from each fragment being sequenced.
`In Sanger sequencing, the read length was determined largely by a combination of gel-related
`factors, such as the percentage of polyacrylamide, the electrophoresis conditions, the time of
`separation, and the length and thickness of the gel. In next-generation sequencing, the read
`length is a function ofthe signal-to-noise ratio. Because the sources of noise differ according to
`the technology, specifics are described for each type of sequencing below. However, the major
`impactofthe signal-to-noise ratio is to limit the read length from all next-generation sequencing
`instruments, all of which produce shorter reads than does Sanger sequencing.
`Shorter read lengths, in turn, are a differentiation point because, although short reads can be
`assembled as are traditional Sanger reads, based on shared sequence, the lower extent of shared
`sequence (due to read length) limits the ability to assemble these reads, so the overall length of
`contiguous sequence that can be assembledis limited. This limitation is exacerbated by genome
`size and complexity (e.g., repetitive content and gene families), so genomes such as that ofthe
`human (3 Gb and ~48% repetitive content) cannot be reassembled from the componentreads
`of a whole-genome shotgun of next-generation sequencing data. Rather, because a high-quality
`reference genomeexists for many model organisms and for humans, sequenceread alignmentis a
`More practical approach to sequencing data analysis from next-generation read lengths. Specific
`algorithmsto approachshortread alignmenthave been devised; they provide a score-based metric
`indicative of that sequence’s best fit in the genome, whereby sequencesthat contain mostly or
`entirely repetitive content score lowest due to the uncertainty of theirorigin (22, 23). Improved
`wow.annuatreviews.org « Next-Generation Sequencing Platforms
`Genomic DNA
`Fragment(200-500 bp) es «Fragment
`Ss 5 kh)
`Ligate adapters
`ZA — NN
`“\=s Se
`© Circularize
` A2
`Generate clusters
` A2
`Sequencefirst end
`ese tbinesenes
`Regerate clusters and
`sequence paired end
`Ff\\ =~\
`(400-600 bp)
`t adapters
`SP2 A2
`first end
` Al
`clusters and
`paired end
`Figure 2
`Comparison between(a) paired-end and (b) mate-pair sequencing library-construction processes.
`certainty can be obtained from longer read lengths, and several next-generation sequencers have
`offered increases in read length overtime and refinementoftheir signal-to-noise characteristics
`to allowthis certainty. Another fundamental improvementhas resulted from so-called paired-end
`sequencing, namelyproducing sequence data from both ends ofeachlibrary fragment. Readpairs
`can be obtained by one of two mechanisms: (a) paired ends or (/) mate pairs (Figure 2),
`In paired-end sequencing,a linear fragment with a length ofless than | kb has adapter sequences
`at each end with different primingsites on each adapter. The sequencing instrumentis designedto
`sequencefromone adapterprimingsite by use ofthe stepwise sequencing described above; then,
`in a subsequent reaction, the opposite adapter is primed and sequence data are obtained. These
`reads are paired with one anotherduringthe alignmentstep in data analysis, which provides higher
`overall certainty of placement than doesa single end read of the same length. Most alignment
`algorithms also take into account the average length of fragments in the sequencing library to
`make the most accurate placementpossible. In mate-pair sequencing,thelibrary is constructed
`of fragments longer than 1 kb, and instead of ligating two adapters at each fragment end, the
`fragmentis circularized around a single adapter and both fragment endsligate to the adapter
`ends (24). These circular molecules are then treated by various molecular biology schemes(e.g.,
`by type IIS endonuclease digestion or by nick translation) to producea single linear fragment
`that holds both endsof the original DNA fragment with a central adapter. The remaining DNA
`remnants are removed by washing steps, as the central adapterthat carries the mate-pair ends
`is biotinylated and can be captured using streptavidin magnetic beads. Typically, the resulting
`linear fragments have distinct adapters ligated to their ends, and sequencing is obtained from two
`sequential reads as described above. Again, the resulting readsare aligned as a pair to the genome
`of interest, wherein the separation distance between the readsis longer overall than that obtained
`with the paired-end approach. Often, mate-pair and paired-end reads are used in combination
`to achieve genome coverage when attempting longer-range assemblies through difficult regions
`of a genome or whenattempting to assemble a genomeforthefirst time (de novo sequencing)
`(25). In this combined coverage approach, the mate-pair reads provide longer-range order and
`orientation (a separation of up to 20 kbis possible), and the paired ends provide the ability to
`assemble, in a localized way, difficult-to-sequence regions that can then be layered on top of the
`scaffold provided by an assembly of mate-pair reads.
`Next-generation sequencing libraries, carefully constructed to avoid sources of biasing and du-
`plication, are highly digital. Specifically, the fact that each read originates from a consistently
`detected focus that results from the amplification of a single library fragment means that the
`data are inherently digital in nature. Thus, a quantitation of abundance can beinferred from this
`one-to-onerelationship, which has ramifications forbiological systemsthat are being investigated
`by next-generation sequencing. For example, chromosomal amplifications that are commonin
`cancer genomes can be quantitated with respect to the extent of amplification (ploidy) on each
`chromosome(26). Similarly, the read prevalence of expressed genesidentified by RNA sequencing
`can be directly correlated to their expression level and compared across replicates or with other
`samples from the samestudy (27). In population-basedstudies that use next-generation sequenc-
`ing to characterize the individual species present in an isolate (metagenomics), a similarability
`to correlate the presence of each species as a proportionofthe overall population can be derived
`from the digital nature of next-generation sequencing data (28).
`As mentioned above,althoughreadlength in next-generation sequencingis notlimited by an elec-
`trophoretic separationstep, the majorlimitation of read length is the signal-to-noise ratio during
`stepwise sequencing. Dependingontheplatform, the contributors to noise in the sequencing reac-
`tion differ, and thereis interplay between the sourcesof noise and the sequencingerrors that may
`result, This interplay gives rise to what is commonly referredto as the error model and is highly
`instrument and chemistry specific. In general, one typically explores both read-length limitations
`and errortypes by sequencing a reference set of genes oran entire genome, then comparing the
`www.annialreviews.org © Next-Generation Sequencing Platforms
`sequences obtained with the high-quality reference gene set or genome(29). Inthis approach,the
`different types of errors (substitution errors or insertion and deletionerrors) can beidentified, ayd
`the error model (randomversus systematic errors) can be defined. Representationbiases canalso be
`uncovered by this approach when one examinesthe aligned reads for evidence of complete orpay.
`tial lack of representation.Ifthis lack of representation canbeclassified (for example, regions with
`>95% G+ C content), thenthe bias can be defined. Typically, the more sequencereadsare exayy_
`ined, the better defined are the error model, coveragebiases, and their contributing sources, Foy
`example, the use ofPCRorother types ofenzymatic amplification maycontribute systematic errors
`during the library construction or amplification processes described above. One might addressthis
`problem,independentlyof the instrumentsystem used, by employing a high-fidelity polymerase
`and/or bylimiting the numberof amplification cycles when possible. Somesourcesoferror, how-
`ever, are simply instrumentspecific and may not be readily addressed by the end user (although
`they may improve overtime with new chemistryandsoftware from the manufacturer). As discussed
`below, instruments that uselibrary amplification to enhancesignal produced from the sequencing
`process forego someofthe signal-to-noise issues that are experienced in single-molecule systems
`because there are so manyidentical fragments being sequencedperfocus that the numberoffrag-
`ments that are not misreporting far exceeds the number of fragments that are. In general, noise
`accumulates during the stepwise sequencing process and ultimately limits the read length obtained
`oncethesignal from any base incorporation step is outcompeted by incorrector out-of-phasein-
`corporation events, residual signal from priorreactions or reactants, and othersourcesofnoise,
`It is informative to discuss some of the predominant approaches to next-generation sequencing
`as a meansof tying together the concepts presented herein. Thefirst instrument system involves
`the use of reversible dye terminators in enzymatic sequencing of amplified foci oflibrary frag-
`ments. ‘This system wasinitially developed in 2007 by Solexa and was subsequently acquired by
`Ilumina®, Inc. (30). The library work flow follows steps similar to those outlined above, namely
`fragmentation of high-molecular weight DNA, enzymatic trimming, and adenylation ofthe frag-
`ment ends and ligation of specific adapters (Figure 3a). ‘The Illumina microfluidic conduit is a
`flow cell composed offlat glass with eight microfluidic channels, each decorated by covalent at-
`tachmentof adapter sequences complementary to the library adapters. By careful quantitation of
`the library concentration,a precisely diluted solution oflibrary fragments is amplified in situ on
`the Howcell surfaces by use ofa bridge amplification step to produce foci for sequencing (clusters)
`(Figure 3b), A subsequent step chemically effects the release of fragment ends carrying the same
`adapter, which is then primed with a complementary synthetic DNA(primer) to provide free
`3'-OH groupsthat can be extended in subsequent stepwise sequencing reactions, In reversible dye
`terminator sequencing, all four nucleotides are provided in each cycle because each nucleotide
`carries an identifying fluorescent label. The sequencing occurs as single-nucleotide addition re-
`actions because a blocking group exists at the 3'-OHpositionof the ribose sugar, preventing
`additional base incorporation reactions by the polymerase. As such, the series of events in each
`step includes the following, in order of occurrence: (a) The nucleotide is added by polymerase,
`(6) unincorporated nucleotides are washed away, (c) the flow cell is imaged on both inner sur-
`faces to identify each cluster that is reporting a fluorescent signal, (d) the fluorescent groups are
`chemically cleaved, and (e) the 3/-OLH is chemically deblocked (Figure 3¢). This series ofsteps
`is repeated for up to 150 nucleotide addition reactions, whereuponthe secondread preparations
`begin. ‘Vo read from the opposite end of each fragmentcluster, the instrumentfirst removes the
`a Illumina’s library-preparation work flow
`DNA fragments
`andexonuclease a0r
`Blunting byfill-in
`flow cell
`Addition of
` ®
`ss Ste,
`to adapters
`First cycle
`First cycle
`Second cycle
`Second cycle
`° A ch @
`, pinek
`Cleave fluor
`Secondcycle denaturation
`Denaturation and
`Block with
`a ddNTPs—sequencing primerOH free 3'end amplification inate
`Figure 3
`(a) Ullumina® library-construction process. (}) Illumina cluster generation by bridge amplification.(c) Sequencing by synthesis with
`reversible dye terminators.
`synthesized strands by denaturation and regeneratesthe clusters by performing a limited bridge
`amplification to improve thesignal-to-noise ratio in the second read. After the amplification step,
`the opposite endsof the fragmentsare released from the flow cell surfaces by a different chemical
`cleavage reagent (correspondingto a labile group onthe reverse adapter), and the fragments are
`primed with the reverse primer. Sequencing proceeds as described above.All of these steps occur
`on-instrumentwith the flow cell in place and without manual intervention,so thecorrelation of
`position from forward(first) to reverse (second) readsis maintained andyields a very high read-pair
`concordance uponread alignmentto the reference genome.
`Illumina data have an error model that is described as having decreasing accuracy with
`increasing nucleotide addition steps. When errors occur, they are predominantly substitution
`etrors, in which an incorrect nucleotide identity is assigned to the base. The error percentage
`of most Illuminareads is approximately 0.5% at best (i.e., 1 error in 200 bases). Sources of noise
`include (@) phasing, wherein increasing numbers of fragments fall out of phase with the majority
`www.annualreviews.org * Next-Generation Sequencing Platforms
`of fragments in the cluster due to incomplete deblockingin priorcycles or, conversely, due to lack
`of a blocking groupthatallows an additional base to be incorporated and (8) residual fluorescence
`interference noise due to incomplete fluorescent label cleavage from previous cycles.
`Read lengths have increased from the original Solexa instrument at 25-bp single-end reads to
`the current Hlumina HiSeq 2000 instrument’s 150-bp paired-endreads. Increased readlength has
`been one componentthatis contributing to an explosion in throughput-per-instrument run ovey
`a relatively short time frame(5 years), from 1 Gbfor the Solexa 1G to 600 Gbforthe HiSeq 2000.
`Thelatter instrument can thus produce sufficient data coverage for six whole-humangenome
`sequences in approximately 11 days. The coverage per genome needed is approximately 30-fold,
`and with a 3-Gb genome wherein approximately 90% ofthe reads will map, 100 Gbare required
`to produce the necessary 90 Gb ofdata per genome.
`Theother contributorto throughputhas beenthe ability to use increasingly more-concentrated
`librarydilutions onto the flowcell, resulting in significantincreasesin cluster density. The HiSeq
`2000 wasthefirst instrumenttoread clusters from bothsurfaces ofthe How cell channels, effectively
`doubling the throughputper run. Improvements in chemistry have made deblocking and fuor-
`removal steps more complete; polymerase engineering has improved incorporation fidelity ang
`decreased errors and has decreased the G + C biases associated with the instrumentat the bridge
`amplification step.
`A completely different approach to next-generation sequencingis embodied inan instrumentsys-
`tem thatdetects the release of hydrogen ions, a by-productof nucleotide incorporation, as quanti-
`tated changes in pH throughanovel coupled silicon detector. This instrument was commercialized
`in 2010 by Ion ‘Torrent (31), a company that was later purchased by Life Technologies™Corp,
`For this approach, library construction includes DNA fragmentation, enzymatic end polishing,
`and adapter ligation. Amplification of library fragments occurs by a unique approach knownas
`emulsion PCR,which quantitates the library fragments and dilutes them to be mixed in equimolar
`quantities with small beads, PCRreactants, and DNA polymerase molecules (32). The beads have
`covalently linked adapter complementary sequences on their surfaces to facilitate amplification
`on the bead. This mixture is then shaken to form an emulsion so that the beads and DNAare
`encapsulated in a 1:1 ratio (on average) in oil micelles that also contain the reactants needed for
`PCR-based amplification. The resulting mixture is placed into a specific apparatus that performs
`thermal cycling of the emulsion, effectively allowing hundreds of thousands ofindividual PCR
`amplifications to occur in parallel in one vessel, Subsequent steps are required first to separate
`the oil from the aqueoussolution and beads (so-called emulsion breaking) and then to enrich the
`beads that were successfully amplified (to remove beads with insufficient DNA). Enrichedbeads
`are primedfor sequencing by annealing a sequencingprimer and are deposited into the wells of an
`Ion Chip,a specialized silicon chip designed to detect pH changeswithin individual wellsofthe
`sequenceras the reaction progresses stepwise. Figure 4a shows that the Ion Chip has an upper
`surface that serves as a microfluidic conduit to deliver the reactants needed for the sequencing
`reaction. The lowersurface of the Ion Chip interfaces directly with a hydrogen ion detector that
`translates released hydrogenions fromeach well into a quantitative readout of nucleotide bases
`that were incorporated in each reaction step (Figure 46). In this instrument, the reactant flow
`is by nucleotide in a systematic order because there is no label to provide base-specific identity
`upon incorporation. The adapter sequence contains a series of four single bases downstream of
`the primer’s 3’-OH,in a sequence that matchesthefirst four individual nucleotide flows across
`S| —«<«#¢ 3'
`4 dNTPs
`ee OG
`Ma Matta aaa
`Fiewre 4
`(a) Sxructure of the Ion Torrent Ion Chip used in pH-based sequencing, (b) pHsensing of nucleotide incorporation.

