`
`doi: 10.1111/j.1755-0998.2011.03024.x
`
`Field guide to next-generation DNA sequencers
`
`TRAVIS C. GLENN
`Department of Environmental Health Science and Georgia Genomics Facility, Environmental Health Science Building, University of
`Georgia, Athens, GA 30602, USA
`
`Abstract
`
`The diversity of available 2nd and 3rd generation DNA sequencing platforms is increasing rapidly. Costs for these systems
`range from <$100 000 to more than $1 000 000, with instrument run times ranging from minutes to weeks. Extensive trade-
`offs exist among these platforms. I summarize the major characteristics of each commercially available platform to enable
`direct comparisons. In terms of cost per megabase (Mb) of sequence, the Illumina and SOLiD platforms are clearly superior
`(£$0.10 ⁄ Mb vs. >$10 ⁄ Mb for 454 and some Ion Torrent chips). In terms of cost per nonmultiplexed sample and instrument
`run time, the Pacific Biosciences and Ion Torrent platforms excel, with the 454 GS Junior and Illumina MiSeq also notable
`in this regard. All platforms allow multiplexing of samples, but details of library preparation, experimental design and data
`analysis can constrain the options. The wide range of characteristics among available platforms provides opportunities
`both to conduct groundbreaking studies and to waste money on scales that were previously infeasible. Thus, careful
`thought about the desired characteristics of these systems is warranted before purchasing or using any of them. Updated
`information from this guide will be maintained at: http://dna.uga.edu/ and http://tomato.biol.trinity.edu/blog/.
`
`Keywords: 2nd and 3rd generation sequencing, 454, Helicos, Illumina, Ion Torrent, Life Technologies, massively parallel
`sequencing, Pacific Biosystems, Roche, SOLiD
`
`Received 17 March 2011; revision accepted 22 March 2011
`
`Background
`
`DNA sequencing technologies and platforms are being
`updated at a blistering pace, so much so that reviews of
`sequencing platforms resemble the work of Sisyphus. It
`is important, however, for molecular ecologists to keep
`pace with these technologies, because they are transform-
`ing what we can do, how we should do it, and how much
`it will cost. Institutions and researchers are committing
`up to a million dollars to purchase massively parallel
`sequencing instruments. Such purchases lock laborato-
`ries and institutions into specific paths for large annual
`expenditures in both consumable supplies and service
`contracts. Differences in instrument engineering, plat-
`form chemistry and economics related to design con-
`strain what can be done with those instruments once
`they are purchased.
`Several recent major announcements and acquisitions
`make this an opportune time to evaluate available plat-
`forms and what is likely to be available in the immediate
`future. In this brief guide, I summarize instruments
`currently available and those that have been announced
`by major companies. Although several of these platforms
`
`Correspondence: Travis C. Glenn, Fax: 706 542 7472;
`E-mail: travisg@uga.edu
`
`Ó 2011 Blackwell Publishing Ltd
`
`have very different strengths touted by the vendors, the
`weaknesses are often much less clear. I have therefore
`summarized available information in tables with catego-
`ries of primary interest to purchasers and to users so that
`direct comparisons can be made. I will use the conven-
`tion of 2nd generation to indicate a platform that requires
`amplification of the template molecules prior to sequenc-
`ing, 3rd generation to indicate platforms that sequence
`directly individual DNA molecules, and next-generation
`sequencing (NGS) platforms to generically indicate 2nd or
`3rd generation instruments.
`This guide is intended to provide information for
`readers with little or advanced understanding of NGS
`platforms. I assume, however, that readers who are not
`familiar with these systems are learning details by: read-
`ing relevant publications (e.g. Mardis 2008; Shendure & Ji
`2008; Ansorge 2009; Richardson 2010; Tautz et al. 2010),
`reading information at company and independent web-
`sites and talking with staff of the companies making NGS
`instruments.
`My purpose is not to explain how these systems work
`in detail (that information is readily available from the
`sources noted above), but instead to focus on generally
`important traits of these systems and to provide relevant
`details for prospective buyers and users. In particular,
`my goal is to present information useful to researchers
`
`00001
`
`EX1018
`
`
`
`760 T . C . G L E N N
`
`who must determine what platform to use for their own
`experiments or who will recommend purchasing instru-
`ments so that they can make informed decisions and
`facilitate summaries of their decisions (e.g. for institu-
`tional purchasing support staff, administrators and in
`publications). I do not include information on Complete
`Genomics, deCode genetics, Knome or similar companies
`because they are focused solely on analysing human sam-
`ples. I also will not cover the Polonator, Intelligent Bio-
`Systems, or other similar companies that have not yet
`been able to make significant commercial impact. I pro-
`vide some information on Helicos because this company
`has only recently stopped selling instruments and
`reagents in favour of adopting a service-provider model,
`and their services are available for organisms of interest
`to molecular ecologists.
`
`Comparing the platforms
`
`Caveats to the comparisons – need for standards
`
`All companies put out data and statements that cast their
`systems in the best possible light.
`I have generally
`accepted values from the companies to get at measures
`that can then be compared, but these comparisons have
`inherent flaws. There are no accepted standards for what
`measures the companies need to report, let alone particu-
`lars of how the data are analysed. The templates used,
`types of pre-analysis data filters used and number of runs
`used (e.g. best single run, average of many runs, etc.) can
`have significant impacts. Independent testing of NGS plat-
`forms to determine yield, error rates, etc. would be ideal,
`but is expensive and problematic because companies
`frequently update chemistry, software and other compo-
`nents of their systems. In several cases, available data give
`a broad range of values and I generally condense these
`data into a single number from the middle of the available
`data distribution. There are few places where I indicate
`dispersion of the values. For these reasons, many compari-
`sons below are less than ideal. As in all field guides, the
`purpose here is to illustrate typical phenotypes.
`Everyone using NGS data would benefit from the
`development of a standard set of conditions, analyses
`and a complex template (e.g. Escherichia coli genomic
`DNA) or set of templates (e.g. specific clones, E. coli
`genomic DNA, mouse cDNA, etc.) that could be adopted
`and used for testing of all platforms. Results from these
`templates could then be used to determine values that
`would allow direct comparison of NGS platforms, chem-
`istry and software upgrades. Ideally, the standard tem-
`plate(s) would be similar to US National Institute of
`Standards and Technology (NIST) DNA standards for
`forensics and could be obtained from NIST or similar
`entities. Until
`such standards are developed and
`
`adopted, comparisons will remain difficult and inher-
`ently subjective, especially measures of error rate and
`mappable reads.
`
`Basic characteristics
`
`Six 2nd and 3rd generation sequencing platforms are cur-
`rently available, and a seventh is in advanced develop-
`ment (Table 1). Most platforms require that template
`DNA is short (200–1000 bp) and that each template con-
`tains a forward and reverse primer binding sites (i.e. a
`library of templates is needed). Libraries can be con-
`structed in many different ways (see Cost per sample); an
`entire review on this subject alone is warranted. In the
`next section, I describe the most salient features of the
`platforms.
`454 (http://www.454.com) was the 1st commercial
`NGS platform. 454 was acquired by Roche, but is still
`known as by the name 454. 454 uses beads that start with
`a single template molecule which is amplified via emPCR
`(Box 1). Millions of beads are loaded onto a picotitre
`plate designed so that each well can hold only a single
`bead. All beads are then sequenced in parallel by flowing
`pyrosequencing reagents across the plate.
`Solexa (http://www.illumina.com) developed the 2nd
`commercial NGS platform. Solexa was subsequently
`acquired by Illumina and is now known by the name Illu-
`mina. Illumina uses a solid glass surface (similar to a
`microscope slide) to capture individual molecules and
`bridge PCR (Box 1) to amplify DNA into small clusters of
`identical molecules. These clusters are then sequenced
`with a strategy that is similar to Sanger sequencing,
`except only dye-labelled terminators are added,
`the
`sequence at that position is determined for all clusters,
`then the dye is cleaved and another round of dye-labelled
`terminators are added.
`SOLiD (http://www.appliedbiosystems.com) was the
`3rd commercial NGS platform.
`Invitrogen acquired
`Applied Biosystems, forming Life Technologies, but the
`name SOLiD has remained stable. SOLiD uses ligation to
`determine sequences and until the most recent release of
`Illumina’s software and reagents, SOLiD has always had
`more reads (at lower cost) than Illumina.
`Helicos (http://www.helicosbio.com) developed the
`HeliScope, which was the first commercial single-mole-
`cule sequencer. Unfortunately, the high cost of the instru-
`ments and short read lengths limited adoption of this
`platform. Helicos no longer sells instruments, but con-
`ducts sequencing via a service centre model.
`Ion Torrent
`(http://www.iontorrent.com) uses a
`sequencing strategy similar to the 454, except that (i)
`hydrogen ions (H+) are detected (instead of a pyrophos-
`phatase cascade) and (ii) sequencing chips conform to
`common design and manufacturing standards used for
`
`Ó 2011 Blackwell Publishing Ltd
`
`00002
`
`
`
`Table 1 2nd and 3rd Generation DNA sequencing platforms listed in the order of commercial availability
`
`Platform
`
`Current
`company
`
`Former
`company
`
`Sequencing
`method
`
`Amplification
`method
`
`Claim to fame
`
`F I E L D G U I D E T O N E X T - G E N S E Q U E N C E R S 761
`
`454
`
`Roche
`
`454
`
`Illumina
`
`Illumina
`
`Solexa
`
`Synthesis
`(pyrosequencing)
`Synthesis
`
`emPCR
`
`BridgePCR
`
`SOLiD
`
`Life
`Technologies
`Helicos
`HeliScope
`Ion Torrent Life
`Technologies
`Pacific
`Biosciences
`Life
`Technologies
`
`PacBio
`
`Starlight‡
`
`Applied
`Biosystems
`N ⁄ A
`Ion Torrent
`
`N ⁄ A
`
`N ⁄ A
`
`Ligation
`
`Synthesis
`Synthesis
`(H+ detection)
`Synthesis
`
`Synthesis
`
`emPCR
`
`None
`emPCR
`
`None
`
`None
`
`Primary
`applications
`
`1*, 2, 3*, 4, 7, 8*
`
`1*, 2, 3*, 4, 5, 6, 7, 8
`
`3*, 5, 6, 8
`
`5, 8
`1, 2, 3, 4, 8
`
`1, 2, 3, 7, 8
`
`1, 2, 7, 8
`
`First Next-Gen Sequencer,
`Long reads
`First short-read sequencer;
`current leader in advantages†
`Second short-read sequencer;
`low error rates
`First single-molecule sequencer
`First Post-light sequencer;
`first system <$100 000
`First real-time single-molecule
`sequencing
`Single-molecule sequencing
`with quantum dots
`
`Bold indicates applications that are most often used, economical or growing.
`1 = de novo BACs, plastids, microbial genomes.
`2 = transcriptome characterization.
`3 = targeted re-sequencing.
`4 = de novo plant and animal genomes.
`5 = re-sequencing and transcript counting.
`6 = mutation detection.
`7 = metagenomics.
`8 = other (ChIP-Seq, lRNA-Seq, Methyl-Seq, etc.; see Brautigam & Gowik 2010, Shendure & Ji 2008).
`*Pooling multiple samples with sequence tags (i.e. MIDs or indexes) is required for efficient use of this application
`†Illumina currently leads in number and percentage of error-free reads, Illumina HiSeqs with v3 chemistry lead in reads per run,
`GB ⁄ run, and cost ⁄ GB.
`‡A commercial launch date for the Starlight system is not yet known, but it is included here because it is in advanced development, and
`some information about its performance characteristics is known.
`
`commercial microchips. Use of H+ means that no lasers,
`cameras or fluorescent dyes are needed. Using common
`microchip design standards means that low-cost manu-
`facturing can be used. Ion Torrent was purchased by Life
`Technologies in 2010, but is still known as Ion Torrent.
`The first early access instruments were deployed in late
`2010.
`has
`(http://www.pacificbiosciences.com)
`PacBio
`developed an instrument that sequences individual DNA
`molecules in real time. Individual DNA polymerases are
`attached to the surface of microscope slides. The
`sequence of individual DNA strands can be determined
`because each dNTP has a unique fluorescent label that is
`detected immediately prior to being cleaved off during
`synthesis. The first early access
`instruments were
`deployed in late 2010. The low cost per experiment, fast
`run times and cool factor have generated much enthusi-
`asm for this platform, especially among investors.
`Starlight uses quantum dots to achieve single-mole-
`cule sequencing. DNA is attached to the surface of a
`microscope slide where sequencing occurs in a manner
`similar to PacBio. A major advantage of Starlight relative
`to PacBio is that the DNA polymerase can be replaced
`after it has lost activity. Thus, sequencing can continue
`
`Ó 2011 Blackwell Publishing Ltd
`
`along the entire length of a template. Many characteris-
`tics of the Starlight technology are known (e.g. Karrow
`2010), but timing of a commercial launch, target costs,
`etc. are unknown.
`
`Broad characteristics
`
`The first three platforms (Table 1) are currently widely
`available through academic core laboratories and com-
`mercial
`service providers
`(see: http://pathogenom-
`ics.bham.ac.uk/hts/ for a hyperlinked global map of
`many NGS instruments; see http://seqanswers.com/
`forums/showthread.php?t=948/ for a list of NGS service
`providers; see Karrow & Toner 2011 for a recent survey).
`These three platforms have traditionally split their focus
`into fewer long reads (454) vs. more short reads (Illumina
`and SOLiD; see Box 1 for definitions). Long reads are
`optimal for initial genome and transcriptome character-
`ization because longer pieces assemble more efficiently
`than shorter pieces. Alternatively, the lower costs and
`increased number of reads associated with shorter
`read-lengths are better suited for re-sequencing and for
`frequency-based applications (i.e. counting, such as in
`gene expression studies).
`
`00003
`
`
`
`762 T . C . G L E N N
`
`Box 1 Glossary
`
`Barcode, index, MID or tag – a short, unique sequence of DNA added to samples so they can be pooled, then pro-
`cessed and sequenced in parallel with each resulting sequence containing information to determine the source sample,
`used with some variance by all platforms.
`Bridge PCR – PCR that occurs between primers bound to a surface, used by Illumina sequencers (see Shendure & Ji
`2008, and references therein).
`cBot – a required accessory instrument for many Illumina sequencers in which Bridge PCR is completed.
`de novo – from the beginning (i.e. without prior information).
`emPCR or emulsion PCR – PCR that occurs within aqueous microdroplets separated by oil so that up to thousands
`of independent reactions can occur per microlitre of volume; for NGS, one primer is usually covalently linked to a
`bead so PCR only occurs in microdroplets with beads, and a single template molecule per bead ⁄ microdroplet is
`needed, resulting in each bead having a homogeneous set of template molecules, used in 454, Ion Torrent, and SOLiD
`sequencers (see Shendure & Ji 2008, and references therein).
`Flow cell – single-use sequencing chip ⁄ plate ⁄ slide used by Illumina sequencers (most use 8-channel flow cells; all
`channels must be used within a run); the SOLiD 5500 adopts a similar design, but channels may be run one at a time.
`
`Reads
`
`Mappable reads – very short DNA sequences that can be determined to originate from a single location in the gen-
`ome (20–40 bases, length depends on genome complexity).
`Mate-paired reads – DNA sequences from ends of DNA templates that have been circularized so that distant ends
`are physically ligated and read together (also known as Paired-end tags, PET or jump libraries; see Fig. 1a).
`Paired-end reads – DNA sequences from each end of DNA templates (see Fig. 1c).
`Strobed reads – DNA sequences determined at intermittent locations along the length of a single template; when
`illuminated the sequence is determined, when dark the polymerase continues at the same pace, but it is not degraded
`by the light. This is a way, for example, of spreading 900 bases of sequence data among three 300 base reads each sepa-
`rated by 300 bases (Fig. 1d).
`
`(a)
`
`(b)
`
`(c)
`
`(d)
`
`Illustration of the methods used for the four types of reads. Arrowheads indicate 3¢ ends of DNA. F, forward primer; R, reverse
`Fig. 1
`primer. Double-stranded adapters of F plus its complement, and R plus its complement are added during the library construction
`phase for NGS. (a) Mate-pair libraries are constructed from fragments of double-stranded DNA (dsDNA) that are much longer than
`can be used directly for NGS libraries. In some embodiments, the join site may contain a linker that is used for selection purposes and
`to mark the join site. Following library construction, fragments are read using single- or paired-end reads. (b) Single-end reads yield
`data that are similar to Sanger sequences. (c) Paired-end reads allow both ends of a template to be sequenced. (d) Strobed reads spread
`the read length out along the template molecule by turning off the light source periodically, which allows synthesis to proceed at a
`known rate without photodegradation of the DNA polymerase. The data are used for the same purpose as mate-pair libraries.
`
`Ó 2011 Blackwell Publishing Ltd
`
`00004
`
`
`
`F I E L D G U I D E T O N E X T - G E N S E Q U E N C E R S 763
`
`No generally accepted standards exist for read length, but the following guidelines apply:
`Short reads – sequences £50 consecutive bases.
`Mid-length reads – sequences ‡51, but <400 consecutive bases.
`Long reads – sequences‡400, but <1000 consecutive bases (i.e. similar to Sanger ⁄ capillary).
`Extended reads – sequences >1000 bases; a small proportion of PacBio reads are up to a few kb; Starlight uses a
`replaceable polymerase allowing reads of indefinite length (up to the full length of the template).
`
`Computing
`
`Cloud computing – remote computational resources available (usually on a fee-for-use basis) via the internet [e.g.
`Amazon’s Elastic Compute Cloud (http://aws.amazon.com/ec2)].
`Commodity alternatives ⁄ computing ⁄ resources – computer parts and systems that conform to open standards
`and are thus available from many manufacturers and retailers (generally at low cost).
`Sneakernet – transferring files by physically transporting hardware (i.e. carrying or shipping hard drives containing
`data).
`
`The older NGS platforms have progressed signifi-
`cantly since they were first introduced. For example, 454
`has progressed from reads of 100, to 250, to 400–500
`bases, and is now on the verge of making 800-base reads
`available (mode = 800, average = 700). Illumina has pro-
`gressed from reads of less than 36 bases to ‡100 bases on
`each end of templates, with SOLiD making slightly less
`striking increases. Thus, many of the platforms can be
`used for the same applications (Table 1) and such overlap
`is increasing.
`Because it is possible to use most platforms for most
`applications, economics, length of time to data acquisi-
`tion, length of time in the queue and downstream analy-
`sis constraints become important for selecting a platform.
`As the number and variety of instruments increase and
`costs continue to decrease, we will become constrained
`only by our knowledge of the systems and our creativity
`to develop and adapt techniques to obtain data effi-
`ciently. In particular, developments in sample multiplex-
`ing and sequence capture will drastically increase the
`amount of data available at affordable costs for molecular
`ecological studies.
`
`Cost per run and cost per Mb
`
`Although all companies are continuously upgrading their
`platforms so that several fit into multiple read-length cat-
`egories, the platforms can still be grouped into those that
`offer smaller numbers of middle-to-extended reads at rel-
`atively high cost per megabase (Mb) of sequence (i.e. 454,
`Ion Torrent, PacBio and Starlight) and those that offer lar-
`ger numbers of short-to-middle-length reads at lower
`cost per Mb (i.e. Illumina, SOLiD, Helicose; Table 2).
`Technologies still in development (e.g. Oxford Nanopore,
`Roche+IBM, etc.) and expected updates to the current
`3rdgeneration sequencing technologies (Karrow & Toner
`2011) have the potential for many extended reads at low
`
`Ó 2011 Blackwell Publishing Ltd
`
`cost, but initial releases of the PacBio and Starlight plat-
`forms will not match the number of reads or cost per Mb
`of the short-read platforms (Table 2).
`There is clearly a continuum of performance charac-
`teristics for massively parallel sequencers, with a reason-
`ably strong dichotomy of these platforms in terms of the
`number of reads per run, cost per Mb and instrument
`time to conduct a run (Table 2). The variance in read
`lengths and supply costs per run are also important
`(Table 2). Because the read lengths of
`the Illumina
`sequencers can now equal or exceed 100 bases from each
`end of the template molecule, Illumina data can be used
`for de novo assemblies [e.g. Li et al. 2009 (but see Worley
`& Gibbs 2010); Paszkiewicz & Studholme 2010], espe-
`cially when supplemented with mate-paired reads
`(Gnerre et al. 2011), and ⁄ or data from one of the longer-
`read platforms (e.g. Dalloul et al. 2010). Indeed, it is clear
`that the combination of Illumina or SOLiD data with-
`mate-paired reads on the 454 or Illumina, strobed reads
`from PacBio or extended reads from Starlight will facili-
`tate many genome assemblies in the near future.
`
`Cost per sample
`
`A major difference between the typical biomedical
`experiments targeted by NGS platforms and the uses for
`which molecular ecologists wish to employ these instru-
`ments is that the latter often want to process many
`samples (100s) at relatively modest numbers of loci
`(10s–1000s), and to do it with limited funds. A key to
`accomplishing low per-sample cost is to be able to attach
`an identifying tag (see Box 1) to each sample prior to
`expensive processing and sequencing. In this way, the
`cost of processing and sequencing can be divided among
`many samples.
`All NGS platforms allow the use of sample tags. The
`importance of developing low-cost library preparations
`
`00005
`
`
`
`764 T . C . G L E N N
`
`Table 2 Comparison of sequencing instruments, sorted by cost ⁄ Mb, with expected performance by mid 2011
`
`Instrument
`
`3730xl (capillary)
`Ion Torrent – ‘314’chip
`454 GS Jr. Titanium
`Starlight*
`PacBio RS
`454 FLX Titanium
`454 FLX+e
`Ion Torrent – ‘316’chip*
`Helicosf
`Ion Torrent – ‘318’chip*
`Illumina MiSeq*
`Illumina iScanSQ
`Illumina GAIIx
`SOLiD – 4
`Illumina HiSeq 1000
`Illumina HiSeq 2000
`SOLiD – 5500 (PI)*
`SOLiD – 5500xl (4hq)*
`Illumina HiSeq 2000 – v3i*
`
`Run timea
`
`Millions of
`reads ⁄ run
`
`Bases ⁄ readb
`
`Yield
`Mb ⁄ run
`
`Reagent
`cost ⁄ runc
`
`Reagent
`cost ⁄ Mb
`
`Minimum
`unit cost (% run)d
`
`2 h
`2 h
`10 h
`†
`0.5–2 h
`10 h
`18–20 h
`2 h
`N ⁄ A
`2 h
`26 h
`8 days
`14 days
`12 days
`8 days
`8 days
`8 days
`8 days
`10 days
`
`0.000096
`0.10
`0.10
`0.01
`0.01
`1
`1
`1
`800
`4–8
`3.4
`250
`320
`>840g
`500
`1000
`>700g
`>1410g
`£3000
`
`650
`100
`400
`>1000
`860–1100
`400
`700
`>100
`35
`>100
`150 + 150
`100 + 100
`150 + 150
`50 + 35
`100 + 100
`100 + 100
`75 + 35
`75 + 35
`100 + 100
`
`0.06
`>10
`50
`†
`5–10
`500
`900
`>100
`28 000
`>1000
`1020
`50 000
`96 000
`71 400
`100 000
`200 000
`77 000
`155 100
`£600 000
`
`$96
`$500
`$1100
`†
`$110–900
`$6200
`$6200
`$750
`N ⁄ A
`$925
`$750
`$10 220
`$11 524
`$8128
`$10 220
`$20 120h
`$6101
`$10 503h
`$23 470h
`
`$1500
`<$50
`$22
`†
`$11–180
`$12.4
`$7
`<$7.5
`NA
`$0.93
`$0.74
`$0.20
`$0.12
`<$0.11
`$0.10
`$0.10
`<$0.08
`<$0.07
`‡$0.04
`
`$6 (1%)
`$750 (100%)
`$1500 (100%)
`†
`†
`$2000 (10%)
`$2000 (10%)
`$1000 (100%)
`$1100 (2%)
`$1200 (100%)
`$1000 (100%)
`$3000 (14%)
`$3200 (14%)
`$2500 (12%)
`$3000 (12%)
`$3000 (6%)
`$2000 (12%)
`$2000 (12%)
`$3500 (6%)
`
`aInstrument time for maximum read length.
`bAverage length for high-quality reads >200 bases (mode is higher); typical maximum for reads £150 bases (most reads reach this
`length).
`cIncludes all stages of sample preparation for a single sample (i.e. library preparation through sequencing; capillary = sequencing only).
`dTypical full cost (i.e. including labour, service contract, etc.) of the smallest generally available unit of purchase at an academic core lab-
`oratory provider for the longest available read (and percentage of reads relative to a full run, rounded to the nearest whole percentage).
`eUpgrade of the FLX instrument, due out summer 2011.
`fInstruments and reagents are no longer sold; services are available for any organism.
`gMappable reads [number of raw high-quality reads (as reported for all other platforms) is higher].
`hMore reads are obtained than is needed from any single sample within most experiments, but the value illustrates the costs.
`iAnnouncedTruSeq v3 reagents & software, reads and yield are half for HiSeq1000.
`*Information based on company sources alone (independent data not yet available).
`†Detail not yet available.
`‘’ Indicates a likely value based on unpublished information available in March 2011 (i.e. author speculation).
`
`and sample tagging has been clear for several years, dur-
`ing which time a variety of schemes have been developed
`(e.g. Binladen et al. 2007; Hoffmann et al. 2007; Meyer
`et al. 2007; Craig et al. 2008). Molecular ecologists (among
`others) will benefit from further development of low-cost
`library preparations in which many different sample tags
`are employed, thereby facilitating many samples being
`pooled together and thus dividing sequencing costs
`among many samples.
`To understand the importance of library construction
`costs, consider that when the Illumina HiSeq was intro-
`duced in early 2010,
`standard Illumina RNA-Seq
`libraries cost about $400 each to construct. Researchers
`could pool 12 indexed samples per lane and yield about
`5 million reads per sample (a sufficient number of reads
`for gene expression studies seeking modest sensitivity).
`Sequencing reagents for 192 samples were estimated at
`<$7000, but
`library preparations were estimated at
`
`192 · $400 = $76 800 (i.e.
`library preparation was ten
`times more expensive than sequencing the libraries).
`Thus, library preparation and sample tagging continues
`to be an active area of research with important ongoing
`developments.
`
`Purchase costs
`
`NGS platforms currently range from $49 500 to $695 000
`(Table 3). Additional ancillary equipment, extended ser-
`vice contracts and required computers extend the costs of
`most
`systems
`from about $75 000 to more than
`$1 000 000. Costs in Table 3 assume that equipment will
`be housed in an equipped, fully functioning laboratory. If
`the new sequencer will be placed into a new facility, then
`the costs will increase considerably.
`There are now three instruments that are <$150 000
`(Table 3), making them within the reach of many indi-
`
`Ó 2011 Blackwell Publishing Ltd
`
`00006
`
`
`
`F I E L D G U I D E T O N E X T - G E N S E Q U E N C E R S 765
`
`Table 3 Instrument purchase cost, additional instrument costs, service agreement costs, computational resources needed, size of data
`files, primary errors and error rates for commercially available DNA sequencing platforms in 2011. All costs are list price in thousands of
`US dollars
`
`Instrument
`
`3730xl (capillary)
`454 GS Jr. Titanium
`454 FLX Titanium
`454 FLX+f
`PacBio RS
`Ion Torrent – 314 chip
`Ion Torrent – 316 chip
`Ion Torrent – 318 chip
`SOLiD – 4
`SOLiD – 5500
`SOLiD – 5500xl
`Illumina MiSeq
`Illumina HiScanSQ
`Illumina GAIIx
`Illumina HiSeq1000
`Illumina HiSeq2000
`
`Purchase
`cost
`
`Additional
`instrumentsa
`
`Service
`contractb
`
`Computational
`resourcesc
`
`Data file
`sizes (GB)d
`
`Primary errors
`
`Error rate
`(%)e
`
`$376
`$108
`$500
`$29.5
`$695
`$49.5
`$49.5
`$49.5
`$475
`$349
`$595
`$125
`$405
`$250
`$560o
`$690
`
`–
`$16
`$30
`$30
`–
`$18g
`$18g
`$18g
`$54h
`$54h
`$54h
`–
`$55l
`$100n
`$55l
`$55l
`
`$19.8
`$12.6
`$50.0
`$50.0
`$85
`$7.5
`$7.5
`$7.5
`$38.4
`$29.0
`$38.4
`$12.5
`$41.5
`$44.5
`$62.0
`$75.9
`
`Desktop
`$5 (desktop)
`$5 (desktop)
`$5 (desktop)
`$65 cluster
`Desktop – $35
`Desktop – $35
`Desktop – $35
`$35 clusteri
`$35 clusteri
`$35 clusteri
`Desktop
`$222 clusterm
`$222 clusterm
`$222 clusterm
`$222 clusterm
`
`0.03
`<3 images, <1 sff
`20 images, 4 sff
`40 images, 8 sff
`20 pulsed, 2 Fastq
`0.1Fastq
`0.6Fastq
`TBD
`680j
`74k*
`148k*
`1k*
`50k*
`600
`£300k*
`£600k*
`
`Substitution
`Indel
`Indel
`Indel
`CG deletions
`Indel
`Indel
`Indel
`A-T bias
`A-T bias
`A-T bias
`Substitution
`Substitution
`Substitution
`Substitution
`Substitution
`
`0.1–1
`1
`1
`1*
`16
`1
`1*
`1*
`>0.06*
`>0.01*
`>0.01*
`>0.1*
`‡0.1
`‡0.1
`‡0.1
`‡0.1
`
`aDoes not include general purpose and library preparation equipment (e.g. Covaris [$45k], Agilent bioanalyzer [$18k], thermal cyclers,
`general purpose centrifuges, MilliQ water, etc.), but includes bead counters for emPCR (up to $20k), TissueLysers or similar for emPCR
`(up to $10k), specialty centrifuges, etc. when required by the instrument manufacturer. Many laboratories will need additional general
`purpose instruments.
`bAnnual maintenance agreements include on-site service, but do not include extra premiums for the fastest available service.
`cDesktops assume higher-end models with multiple processors, ‡8 GB RAM, ‡1 TB HD, etc. (up to $5k; except capillary = $2k desktop).
`dData file size transferred from instrument server to offline cluster.
`ePercentage of errors per base within single reads of the maximum length given in Table 2; rates among platforms are not exactly com-
`parable; reported Ion Torrent rates range from 0.46% to 2.4%; SOLiD rates are from reads with bases consistent on double or triple
`sequencing only; for Illumina, the 0.1% rate applies to > 85% of reads (not all reads); see text for additional details.
`fUpgrade to the 454 FLX instrument; FLX+ new purchases = $500k.
`gRequired $16.5k IonTorrent server for conversion of raw signals to basecalling; $1k ULTRA-TURRAXÒ Tube Drive; Argon gas tank
`(<$0.5k).
`hIncludes EZ Bead ePCR automation and required UPS; a Covaris ($45K) is also required but is not included to facilitate comparisons
`(because one is usually bought with any of the other sequencing systems).
`iCompute cluster available from Life Technologies.
`j85Gb run is a 2 · 50 run (the highest throughput on a SOLiD 4).
`kNew compressed binary data format saves base and quality-value data in a 1byte:1base ratio.
`lCost of cBot (required). Additional instruments for library preparation needed (Covaris, etc., similar to 454 FLX).
`mIllumina Compute – Tier 1 system: 3 cluster nodes with 8 cores and 48 GB RAM per node (i.e. 24 cores and 144 GB RAM) and 24 TB
`usable data storage; commodity equivalent systems are available for much less, but will require technical support (see text).
`nCost of cBot and Paired-end module (required for GAIIx).
`oHiSeq 1000 is upgradable to HiSeq 2000 for $175k.
`‘’ Indicates a likely value based on unpublished information available in February 2011 (i.e. author speculation).
`*Information based on company sources alone (independent data not yet available); also applies to Illumina TruSeq v3 chemistry.
`
`researchers. These instruments have signifi-
`vidual
`cantly reduced total costs per run and ⁄ or experiment.
`The smaller footprint of these instruments and poten-
`tial portability of at least some are also attractive fea-
`tures. Although these features will
`facilitate new
`research opportunities, researchers should be careful to
`weigh the significantly increased cost per read and
`cost per Mb of
`these instruments relative to other
`instruments and to the costs of outsourcing. It is obvi-
`
`instruments and low-cost
`these lower-cost
`ous that
`runs will be invaluable in small-scale experiments,
`gathering pilot data, quality control and validation.
`Researchers with these instruments will, however, still
`often find it economically most advantageous to send
`out fully processed and validated samples to be run
`on other lower-cost per read ⁄ Mb instruments (e.g.
`library pools with many indexes could be validated on
`the MiSeq ⁄ Ion Torrent
`314 ⁄ GS
`Junior, but
`then
`
`Ó 2011 Blackwell Publishing Ltd
`
`00007
`
`
`
`766 T . C . G L E N N
`
`Table 4 Primary advantages and disadvantages of each next-generation sequencing instrument
`
`Instrument
`
`3730xl (capillary)
`454 GS Jr. Titanium
`
`454 FLX Titanium
`454 FLX+
`Helicos
`
`PacBio
`
`Ion Torrent
`
`Ion Torrent – 314 chip
`
`Ion Torrent – 316 chip
`
`Ion Torrent – 318 chip
`
`SOLiD – 4
`
`SOLiD – 5500
`
`SOLiD – 5500xl
`Illumina MiSeq
`
`Illumina HiScanSQ
`
`Illumina GAIIx
`
`Illumina HiSeq 1000
`
`Illumina HiSeq 2000
`
`Primary
`advantages
`
`Low cost for very small studies
`Long-read length; low capital cost; low cost per
`experiment
`Long-read length
`Double the maximum read length of Titanium
`Large numbers