throbber
Molecular Ecology Resources (2011) 11, 759–769
`
`doi: 10.1111/j.1755-0998.2011.03024.x
`
`Field guide to next-generation DNA sequencers
`
`TRAVIS C. GLENN
`Department of Environmental Health Science and Georgia Genomics Facility, Environmental Health Science Building, University of
`Georgia, Athens, GA 30602, USA
`
`Abstract
`
`The diversity of available 2nd and 3rd generation DNA sequencing platforms is increasing rapidly. Costs for these systems
`range from <$100 000 to more than $1 000 000, with instrument run times ranging from minutes to weeks. Extensive trade-
`offs exist among these platforms. I summarize the major characteristics of each commercially available platform to enable
`direct comparisons. In terms of cost per megabase (Mb) of sequence, the Illumina and SOLiD platforms are clearly superior
`(£$0.10 ⁄ Mb vs. >$10 ⁄ Mb for 454 and some Ion Torrent chips). In terms of cost per nonmultiplexed sample and instrument
`run time, the Pacific Biosciences and Ion Torrent platforms excel, with the 454 GS Junior and Illumina MiSeq also notable
`in this regard. All platforms allow multiplexing of samples, but details of library preparation, experimental design and data
`analysis can constrain the options. The wide range of characteristics among available platforms provides opportunities
`both to conduct groundbreaking studies and to waste money on scales that were previously infeasible. Thus, careful
`thought about the desired characteristics of these systems is warranted before purchasing or using any of them. Updated
`information from this guide will be maintained at: http://dna.uga.edu/ and http://tomato.biol.trinity.edu/blog/.
`
`Keywords: 2nd and 3rd generation sequencing, 454, Helicos, Illumina, Ion Torrent, Life Technologies, massively parallel
`sequencing, Pacific Biosystems, Roche, SOLiD
`
`Received 17 March 2011; revision accepted 22 March 2011
`
`Background
`
`DNA sequencing technologies and platforms are being
`updated at a blistering pace, so much so that reviews of
`sequencing platforms resemble the work of Sisyphus. It
`is important, however, for molecular ecologists to keep
`pace with these technologies, because they are transform-
`ing what we can do, how we should do it, and how much
`it will cost. Institutions and researchers are committing
`up to a million dollars to purchase massively parallel
`sequencing instruments. Such purchases lock laborato-
`ries and institutions into specific paths for large annual
`expenditures in both consumable supplies and service
`contracts. Differences in instrument engineering, plat-
`form chemistry and economics related to design con-
`strain what can be done with those instruments once
`they are purchased.
`Several recent major announcements and acquisitions
`make this an opportune time to evaluate available plat-
`forms and what is likely to be available in the immediate
`future. In this brief guide, I summarize instruments
`currently available and those that have been announced
`by major companies. Although several of these platforms
`
`Correspondence: Travis C. Glenn, Fax: 706 542 7472;
`E-mail: travisg@uga.edu
`
`Ó 2011 Blackwell Publishing Ltd
`
`have very different strengths touted by the vendors, the
`weaknesses are often much less clear. I have therefore
`summarized available information in tables with catego-
`ries of primary interest to purchasers and to users so that
`direct comparisons can be made. I will use the conven-
`tion of 2nd generation to indicate a platform that requires
`amplification of the template molecules prior to sequenc-
`ing, 3rd generation to indicate platforms that sequence
`directly individual DNA molecules, and next-generation
`sequencing (NGS) platforms to generically indicate 2nd or
`3rd generation instruments.
`This guide is intended to provide information for
`readers with little or advanced understanding of NGS
`platforms. I assume, however, that readers who are not
`familiar with these systems are learning details by: read-
`ing relevant publications (e.g. Mardis 2008; Shendure & Ji
`2008; Ansorge 2009; Richardson 2010; Tautz et al. 2010),
`reading information at company and independent web-
`sites and talking with staff of the companies making NGS
`instruments.
`My purpose is not to explain how these systems work
`in detail (that information is readily available from the
`sources noted above), but instead to focus on generally
`important traits of these systems and to provide relevant
`details for prospective buyers and users. In particular,
`my goal is to present information useful to researchers
`
`00001
`
`EX1018
`
`

`

`760 T . C . G L E N N
`
`who must determine what platform to use for their own
`experiments or who will recommend purchasing instru-
`ments so that they can make informed decisions and
`facilitate summaries of their decisions (e.g. for institu-
`tional purchasing support staff, administrators and in
`publications). I do not include information on Complete
`Genomics, deCode genetics, Knome or similar companies
`because they are focused solely on analysing human sam-
`ples. I also will not cover the Polonator, Intelligent Bio-
`Systems, or other similar companies that have not yet
`been able to make significant commercial impact. I pro-
`vide some information on Helicos because this company
`has only recently stopped selling instruments and
`reagents in favour of adopting a service-provider model,
`and their services are available for organisms of interest
`to molecular ecologists.
`
`Comparing the platforms
`
`Caveats to the comparisons – need for standards
`
`All companies put out data and statements that cast their
`systems in the best possible light.
`I have generally
`accepted values from the companies to get at measures
`that can then be compared, but these comparisons have
`inherent flaws. There are no accepted standards for what
`measures the companies need to report, let alone particu-
`lars of how the data are analysed. The templates used,
`types of pre-analysis data filters used and number of runs
`used (e.g. best single run, average of many runs, etc.) can
`have significant impacts. Independent testing of NGS plat-
`forms to determine yield, error rates, etc. would be ideal,
`but is expensive and problematic because companies
`frequently update chemistry, software and other compo-
`nents of their systems. In several cases, available data give
`a broad range of values and I generally condense these
`data into a single number from the middle of the available
`data distribution. There are few places where I indicate
`dispersion of the values. For these reasons, many compari-
`sons below are less than ideal. As in all field guides, the
`purpose here is to illustrate typical phenotypes.
`Everyone using NGS data would benefit from the
`development of a standard set of conditions, analyses
`and a complex template (e.g. Escherichia coli genomic
`DNA) or set of templates (e.g. specific clones, E. coli
`genomic DNA, mouse cDNA, etc.) that could be adopted
`and used for testing of all platforms. Results from these
`templates could then be used to determine values that
`would allow direct comparison of NGS platforms, chem-
`istry and software upgrades. Ideally, the standard tem-
`plate(s) would be similar to US National Institute of
`Standards and Technology (NIST) DNA standards for
`forensics and could be obtained from NIST or similar
`entities. Until
`such standards are developed and
`
`adopted, comparisons will remain difficult and inher-
`ently subjective, especially measures of error rate and
`mappable reads.
`
`Basic characteristics
`
`Six 2nd and 3rd generation sequencing platforms are cur-
`rently available, and a seventh is in advanced develop-
`ment (Table 1). Most platforms require that template
`DNA is short (200–1000 bp) and that each template con-
`tains a forward and reverse primer binding sites (i.e. a
`library of templates is needed). Libraries can be con-
`structed in many different ways (see Cost per sample); an
`entire review on this subject alone is warranted. In the
`next section, I describe the most salient features of the
`platforms.
`454 (http://www.454.com) was the 1st commercial
`NGS platform. 454 was acquired by Roche, but is still
`known as by the name 454. 454 uses beads that start with
`a single template molecule which is amplified via emPCR
`(Box 1). Millions of beads are loaded onto a picotitre
`plate designed so that each well can hold only a single
`bead. All beads are then sequenced in parallel by flowing
`pyrosequencing reagents across the plate.
`Solexa (http://www.illumina.com) developed the 2nd
`commercial NGS platform. Solexa was subsequently
`acquired by Illumina and is now known by the name Illu-
`mina. Illumina uses a solid glass surface (similar to a
`microscope slide) to capture individual molecules and
`bridge PCR (Box 1) to amplify DNA into small clusters of
`identical molecules. These clusters are then sequenced
`with a strategy that is similar to Sanger sequencing,
`except only dye-labelled terminators are added,
`the
`sequence at that position is determined for all clusters,
`then the dye is cleaved and another round of dye-labelled
`terminators are added.
`SOLiD (http://www.appliedbiosystems.com) was the
`3rd commercial NGS platform.
`Invitrogen acquired
`Applied Biosystems, forming Life Technologies, but the
`name SOLiD has remained stable. SOLiD uses ligation to
`determine sequences and until the most recent release of
`Illumina’s software and reagents, SOLiD has always had
`more reads (at lower cost) than Illumina.
`Helicos (http://www.helicosbio.com) developed the
`HeliScope, which was the first commercial single-mole-
`cule sequencer. Unfortunately, the high cost of the instru-
`ments and short read lengths limited adoption of this
`platform. Helicos no longer sells instruments, but con-
`ducts sequencing via a service centre model.
`Ion Torrent
`(http://www.iontorrent.com) uses a
`sequencing strategy similar to the 454, except that (i)
`hydrogen ions (H+) are detected (instead of a pyrophos-
`phatase cascade) and (ii) sequencing chips conform to
`common design and manufacturing standards used for
`
`Ó 2011 Blackwell Publishing Ltd
`
`00002
`
`

`

`Table 1 2nd and 3rd Generation DNA sequencing platforms listed in the order of commercial availability
`
`Platform
`
`Current
`company
`
`Former
`company
`
`Sequencing
`method
`
`Amplification
`method
`
`Claim to fame
`
`F I E L D G U I D E T O N E X T - G E N S E Q U E N C E R S 761
`
`454
`
`Roche
`
`454
`
`Illumina
`
`Illumina
`
`Solexa
`
`Synthesis
`(pyrosequencing)
`Synthesis
`
`emPCR
`
`BridgePCR
`
`SOLiD
`
`Life
`Technologies
`Helicos
`HeliScope
`Ion Torrent Life
`Technologies
`Pacific
`Biosciences
`Life
`Technologies
`
`PacBio
`
`Starlight‡
`
`Applied
`Biosystems
`N ⁄ A
`Ion Torrent
`
`N ⁄ A
`
`N ⁄ A
`
`Ligation
`
`Synthesis
`Synthesis
`(H+ detection)
`Synthesis
`
`Synthesis
`
`emPCR
`
`None
`emPCR
`
`None
`
`None
`
`Primary
`applications
`
`1*, 2, 3*, 4, 7, 8*
`
`1*, 2, 3*, 4, 5, 6, 7, 8
`
`3*, 5, 6, 8
`
`5, 8
`1, 2, 3, 4, 8
`
`1, 2, 3, 7, 8
`
`1, 2, 7, 8
`
`First Next-Gen Sequencer,
`Long reads
`First short-read sequencer;
`current leader in advantages†
`Second short-read sequencer;
`low error rates
`First single-molecule sequencer
`First Post-light sequencer;
`first system <$100 000
`First real-time single-molecule
`sequencing
`Single-molecule sequencing
`with quantum dots
`
`Bold indicates applications that are most often used, economical or growing.
`1 = de novo BACs, plastids, microbial genomes.
`2 = transcriptome characterization.
`3 = targeted re-sequencing.
`4 = de novo plant and animal genomes.
`5 = re-sequencing and transcript counting.
`6 = mutation detection.
`7 = metagenomics.
`8 = other (ChIP-Seq, lRNA-Seq, Methyl-Seq, etc.; see Brautigam & Gowik 2010, Shendure & Ji 2008).
`*Pooling multiple samples with sequence tags (i.e. MIDs or indexes) is required for efficient use of this application
`†Illumina currently leads in number and percentage of error-free reads, Illumina HiSeqs with v3 chemistry lead in reads per run,
`GB ⁄ run, and cost ⁄ GB.
`‡A commercial launch date for the Starlight system is not yet known, but it is included here because it is in advanced development, and
`some information about its performance characteristics is known.
`
`commercial microchips. Use of H+ means that no lasers,
`cameras or fluorescent dyes are needed. Using common
`microchip design standards means that low-cost manu-
`facturing can be used. Ion Torrent was purchased by Life
`Technologies in 2010, but is still known as Ion Torrent.
`The first early access instruments were deployed in late
`2010.
`has
`(http://www.pacificbiosciences.com)
`PacBio
`developed an instrument that sequences individual DNA
`molecules in real time. Individual DNA polymerases are
`attached to the surface of microscope slides. The
`sequence of individual DNA strands can be determined
`because each dNTP has a unique fluorescent label that is
`detected immediately prior to being cleaved off during
`synthesis. The first early access
`instruments were
`deployed in late 2010. The low cost per experiment, fast
`run times and cool factor have generated much enthusi-
`asm for this platform, especially among investors.
`Starlight uses quantum dots to achieve single-mole-
`cule sequencing. DNA is attached to the surface of a
`microscope slide where sequencing occurs in a manner
`similar to PacBio. A major advantage of Starlight relative
`to PacBio is that the DNA polymerase can be replaced
`after it has lost activity. Thus, sequencing can continue
`
`Ó 2011 Blackwell Publishing Ltd
`
`along the entire length of a template. Many characteris-
`tics of the Starlight technology are known (e.g. Karrow
`2010), but timing of a commercial launch, target costs,
`etc. are unknown.
`
`Broad characteristics
`
`The first three platforms (Table 1) are currently widely
`available through academic core laboratories and com-
`mercial
`service providers
`(see: http://pathogenom-
`ics.bham.ac.uk/hts/ for a hyperlinked global map of
`many NGS instruments; see http://seqanswers.com/
`forums/showthread.php?t=948/ for a list of NGS service
`providers; see Karrow & Toner 2011 for a recent survey).
`These three platforms have traditionally split their focus
`into fewer long reads (454) vs. more short reads (Illumina
`and SOLiD; see Box 1 for definitions). Long reads are
`optimal for initial genome and transcriptome character-
`ization because longer pieces assemble more efficiently
`than shorter pieces. Alternatively, the lower costs and
`increased number of reads associated with shorter
`read-lengths are better suited for re-sequencing and for
`frequency-based applications (i.e. counting, such as in
`gene expression studies).
`
`00003
`
`

`

`762 T . C . G L E N N
`
`Box 1 Glossary
`
`Barcode, index, MID or tag – a short, unique sequence of DNA added to samples so they can be pooled, then pro-
`cessed and sequenced in parallel with each resulting sequence containing information to determine the source sample,
`used with some variance by all platforms.
`Bridge PCR – PCR that occurs between primers bound to a surface, used by Illumina sequencers (see Shendure & Ji
`2008, and references therein).
`cBot – a required accessory instrument for many Illumina sequencers in which Bridge PCR is completed.
`de novo – from the beginning (i.e. without prior information).
`emPCR or emulsion PCR – PCR that occurs within aqueous microdroplets separated by oil so that up to thousands
`of independent reactions can occur per microlitre of volume; for NGS, one primer is usually covalently linked to a
`bead so PCR only occurs in microdroplets with beads, and a single template molecule per bead ⁄ microdroplet is
`needed, resulting in each bead having a homogeneous set of template molecules, used in 454, Ion Torrent, and SOLiD
`sequencers (see Shendure & Ji 2008, and references therein).
`Flow cell – single-use sequencing chip ⁄ plate ⁄ slide used by Illumina sequencers (most use 8-channel flow cells; all
`channels must be used within a run); the SOLiD 5500 adopts a similar design, but channels may be run one at a time.
`
`Reads
`
`Mappable reads – very short DNA sequences that can be determined to originate from a single location in the gen-
`ome (20–40 bases, length depends on genome complexity).
`Mate-paired reads – DNA sequences from ends of DNA templates that have been circularized so that distant ends
`are physically ligated and read together (also known as Paired-end tags, PET or jump libraries; see Fig. 1a).
`Paired-end reads – DNA sequences from each end of DNA templates (see Fig. 1c).
`Strobed reads – DNA sequences determined at intermittent locations along the length of a single template; when
`illuminated the sequence is determined, when dark the polymerase continues at the same pace, but it is not degraded
`by the light. This is a way, for example, of spreading 900 bases of sequence data among three 300 base reads each sepa-
`rated by 300 bases (Fig. 1d).
`
`(a)
`
`(b)
`
`(c)
`
`(d)
`
`Illustration of the methods used for the four types of reads. Arrowheads indicate 3¢ ends of DNA. F, forward primer; R, reverse
`Fig. 1
`primer. Double-stranded adapters of F plus its complement, and R plus its complement are added during the library construction
`phase for NGS. (a) Mate-pair libraries are constructed from fragments of double-stranded DNA (dsDNA) that are much longer than
`can be used directly for NGS libraries. In some embodiments, the join site may contain a linker that is used for selection purposes and
`to mark the join site. Following library construction, fragments are read using single- or paired-end reads. (b) Single-end reads yield
`data that are similar to Sanger sequences. (c) Paired-end reads allow both ends of a template to be sequenced. (d) Strobed reads spread
`the read length out along the template molecule by turning off the light source periodically, which allows synthesis to proceed at a
`known rate without photodegradation of the DNA polymerase. The data are used for the same purpose as mate-pair libraries.
`
`Ó 2011 Blackwell Publishing Ltd
`
`00004
`
`

`

`F I E L D G U I D E T O N E X T - G E N S E Q U E N C E R S 763
`
`No generally accepted standards exist for read length, but the following guidelines apply:
`Short reads – sequences £50 consecutive bases.
`Mid-length reads – sequences ‡51, but <400 consecutive bases.
`Long reads – sequences‡400, but <1000 consecutive bases (i.e. similar to Sanger ⁄ capillary).
`Extended reads – sequences >1000 bases; a small proportion of PacBio reads are up to a few kb; Starlight uses a
`replaceable polymerase allowing reads of indefinite length (up to the full length of the template).
`
`Computing
`
`Cloud computing – remote computational resources available (usually on a fee-for-use basis) via the internet [e.g.
`Amazon’s Elastic Compute Cloud (http://aws.amazon.com/ec2)].
`Commodity alternatives ⁄ computing ⁄ resources – computer parts and systems that conform to open standards
`and are thus available from many manufacturers and retailers (generally at low cost).
`Sneakernet – transferring files by physically transporting hardware (i.e. carrying or shipping hard drives containing
`data).
`
`The older NGS platforms have progressed signifi-
`cantly since they were first introduced. For example, 454
`has progressed from reads of 100, to 250, to 400–500
`bases, and is now on the verge of making 800-base reads
`available (mode = 800, average = 700). Illumina has pro-
`gressed from reads of less than 36 bases to ‡100 bases on
`each end of templates, with SOLiD making slightly less
`striking increases. Thus, many of the platforms can be
`used for the same applications (Table 1) and such overlap
`is increasing.
`Because it is possible to use most platforms for most
`applications, economics, length of time to data acquisi-
`tion, length of time in the queue and downstream analy-
`sis constraints become important for selecting a platform.
`As the number and variety of instruments increase and
`costs continue to decrease, we will become constrained
`only by our knowledge of the systems and our creativity
`to develop and adapt techniques to obtain data effi-
`ciently. In particular, developments in sample multiplex-
`ing and sequence capture will drastically increase the
`amount of data available at affordable costs for molecular
`ecological studies.
`
`Cost per run and cost per Mb
`
`Although all companies are continuously upgrading their
`platforms so that several fit into multiple read-length cat-
`egories, the platforms can still be grouped into those that
`offer smaller numbers of middle-to-extended reads at rel-
`atively high cost per megabase (Mb) of sequence (i.e. 454,
`Ion Torrent, PacBio and Starlight) and those that offer lar-
`ger numbers of short-to-middle-length reads at lower
`cost per Mb (i.e. Illumina, SOLiD, Helicose; Table 2).
`Technologies still in development (e.g. Oxford Nanopore,
`Roche+IBM, etc.) and expected updates to the current
`3rdgeneration sequencing technologies (Karrow & Toner
`2011) have the potential for many extended reads at low
`
`Ó 2011 Blackwell Publishing Ltd
`
`cost, but initial releases of the PacBio and Starlight plat-
`forms will not match the number of reads or cost per Mb
`of the short-read platforms (Table 2).
`There is clearly a continuum of performance charac-
`teristics for massively parallel sequencers, with a reason-
`ably strong dichotomy of these platforms in terms of the
`number of reads per run, cost per Mb and instrument
`time to conduct a run (Table 2). The variance in read
`lengths and supply costs per run are also important
`(Table 2). Because the read lengths of
`the Illumina
`sequencers can now equal or exceed 100 bases from each
`end of the template molecule, Illumina data can be used
`for de novo assemblies [e.g. Li et al. 2009 (but see Worley
`& Gibbs 2010); Paszkiewicz & Studholme 2010], espe-
`cially when supplemented with mate-paired reads
`(Gnerre et al. 2011), and ⁄ or data from one of the longer-
`read platforms (e.g. Dalloul et al. 2010). Indeed, it is clear
`that the combination of Illumina or SOLiD data with-
`mate-paired reads on the 454 or Illumina, strobed reads
`from PacBio or extended reads from Starlight will facili-
`tate many genome assemblies in the near future.
`
`Cost per sample
`
`A major difference between the typical biomedical
`experiments targeted by NGS platforms and the uses for
`which molecular ecologists wish to employ these instru-
`ments is that the latter often want to process many
`samples (100s) at relatively modest numbers of loci
`(10s–1000s), and to do it with limited funds. A key to
`accomplishing low per-sample cost is to be able to attach
`an identifying tag (see Box 1) to each sample prior to
`expensive processing and sequencing. In this way, the
`cost of processing and sequencing can be divided among
`many samples.
`All NGS platforms allow the use of sample tags. The
`importance of developing low-cost library preparations
`
`00005
`
`

`

`764 T . C . G L E N N
`
`Table 2 Comparison of sequencing instruments, sorted by cost ⁄ Mb, with expected performance by mid 2011
`
`Instrument
`
`3730xl (capillary)
`Ion Torrent – ‘314’chip
`454 GS Jr. Titanium
`Starlight*
`PacBio RS
`454 FLX Titanium
`454 FLX+e
`Ion Torrent – ‘316’chip*
`Helicosf
`Ion Torrent – ‘318’chip*
`Illumina MiSeq*
`Illumina iScanSQ
`Illumina GAIIx
`SOLiD – 4
`Illumina HiSeq 1000
`Illumina HiSeq 2000
`SOLiD – 5500 (PI)*
`SOLiD – 5500xl (4hq)*
`Illumina HiSeq 2000 – v3i*
`
`Run timea
`
`Millions of
`reads ⁄ run
`
`Bases ⁄ readb
`
`Yield
`Mb ⁄ run
`
`Reagent
`cost ⁄ runc
`
`Reagent
`cost ⁄ Mb
`
`Minimum
`unit cost (% run)d
`
`2 h
`2 h
`10 h
`†
`0.5–2 h
`10 h
`18–20 h
`2 h
`N ⁄ A
`2 h
`26 h
`8 days
`14 days
`12 days
`8 days
`8 days
`8 days
`8 days
`10 days
`
`0.000096
`0.10
`0.10
`0.01
`0.01
`1
`1
`1
`800
`4–8
`3.4
`250
`320
`>840g
`500
`1000
`>700g
`>1410g
`£3000
`
`650
`100
`400
`>1000
`860–1100
`400
`700
`>100
`35
`>100
`150 + 150
`100 + 100
`150 + 150
`50 + 35
`100 + 100
`100 + 100
`75 + 35
`75 + 35
`100 + 100
`
`0.06
`>10
`50
`†
`5–10
`500
`900
`>100
`28 000
`>1000
`1020
`50 000
`96 000
`71 400
`100 000
`200 000
`77 000
`155 100
`£600 000
`
`$96
`$500
`$1100
`†
`$110–900
`$6200
`$6200
`$750
`N ⁄ A
`$925
`$750
`$10 220
`$11 524
`$8128
`$10 220
`$20 120h
`$6101
`$10 503h
`$23 470h
`
`$1500
`<$50
`$22
`†
`$11–180
`$12.4
`$7
`<$7.5
`NA
`$0.93
`$0.74
`$0.20
`$0.12
`<$0.11
`$0.10
`$0.10
`<$0.08
`<$0.07
`‡$0.04
`
`$6 (1%)
`$750 (100%)
`$1500 (100%)
`†
`†
`$2000 (10%)
`$2000 (10%)
`$1000 (100%)
`$1100 (2%)
`$1200 (100%)
`$1000 (100%)
`$3000 (14%)
`$3200 (14%)
`$2500 (12%)
`$3000 (12%)
`$3000 (6%)
`$2000 (12%)
`$2000 (12%)
`$3500 (6%)
`
`aInstrument time for maximum read length.
`bAverage length for high-quality reads >200 bases (mode is higher); typical maximum for reads £150 bases (most reads reach this
`length).
`cIncludes all stages of sample preparation for a single sample (i.e. library preparation through sequencing; capillary = sequencing only).
`dTypical full cost (i.e. including labour, service contract, etc.) of the smallest generally available unit of purchase at an academic core lab-
`oratory provider for the longest available read (and percentage of reads relative to a full run, rounded to the nearest whole percentage).
`eUpgrade of the FLX instrument, due out summer 2011.
`fInstruments and reagents are no longer sold; services are available for any organism.
`gMappable reads [number of raw high-quality reads (as reported for all other platforms) is higher].
`hMore reads are obtained than is needed from any single sample within most experiments, but the value illustrates the costs.
`iAnnouncedTruSeq v3 reagents & software, reads and yield are half for HiSeq1000.
`*Information based on company sources alone (independent data not yet available).
`†Detail not yet available.
`‘’ Indicates a likely value based on unpublished information available in March 2011 (i.e. author speculation).
`
`and sample tagging has been clear for several years, dur-
`ing which time a variety of schemes have been developed
`(e.g. Binladen et al. 2007; Hoffmann et al. 2007; Meyer
`et al. 2007; Craig et al. 2008). Molecular ecologists (among
`others) will benefit from further development of low-cost
`library preparations in which many different sample tags
`are employed, thereby facilitating many samples being
`pooled together and thus dividing sequencing costs
`among many samples.
`To understand the importance of library construction
`costs, consider that when the Illumina HiSeq was intro-
`duced in early 2010,
`standard Illumina RNA-Seq
`libraries cost about $400 each to construct. Researchers
`could pool 12 indexed samples per lane and yield about
`5 million reads per sample (a sufficient number of reads
`for gene expression studies seeking modest sensitivity).
`Sequencing reagents for 192 samples were estimated at
`<$7000, but
`library preparations were estimated at
`
`192 · $400 = $76 800 (i.e.
`library preparation was ten
`times more expensive than sequencing the libraries).
`Thus, library preparation and sample tagging continues
`to be an active area of research with important ongoing
`developments.
`
`Purchase costs
`
`NGS platforms currently range from $49 500 to $695 000
`(Table 3). Additional ancillary equipment, extended ser-
`vice contracts and required computers extend the costs of
`most
`systems
`from about $75 000 to more than
`$1 000 000. Costs in Table 3 assume that equipment will
`be housed in an equipped, fully functioning laboratory. If
`the new sequencer will be placed into a new facility, then
`the costs will increase considerably.
`There are now three instruments that are <$150 000
`(Table 3), making them within the reach of many indi-
`
`Ó 2011 Blackwell Publishing Ltd
`
`00006
`
`

`

`F I E L D G U I D E T O N E X T - G E N S E Q U E N C E R S 765
`
`Table 3 Instrument purchase cost, additional instrument costs, service agreement costs, computational resources needed, size of data
`files, primary errors and error rates for commercially available DNA sequencing platforms in 2011. All costs are list price in thousands of
`US dollars
`
`Instrument
`
`3730xl (capillary)
`454 GS Jr. Titanium
`454 FLX Titanium
`454 FLX+f
`PacBio RS
`Ion Torrent – 314 chip
`Ion Torrent – 316 chip
`Ion Torrent – 318 chip
`SOLiD – 4
`SOLiD – 5500
`SOLiD – 5500xl
`Illumina MiSeq
`Illumina HiScanSQ
`Illumina GAIIx
`Illumina HiSeq1000
`Illumina HiSeq2000
`
`Purchase
`cost
`
`Additional
`instrumentsa
`
`Service
`contractb
`
`Computational
`resourcesc
`
`Data file
`sizes (GB)d
`
`Primary errors
`
`Error rate
`(%)e
`
`$376
`$108
`$500
`$29.5
`$695
`$49.5
`$49.5
`$49.5
`$475
`$349
`$595
`$125
`$405
`$250
`$560o
`$690
`
`–
`$16
`$30
`$30
`–
`$18g
`$18g
`$18g
`$54h
`$54h
`$54h
`–
`$55l
`$100n
`$55l
`$55l
`
`$19.8
`$12.6
`$50.0
`$50.0
`$85
`$7.5
`$7.5
`$7.5
`$38.4
`$29.0
`$38.4
`$12.5
`$41.5
`$44.5
`$62.0
`$75.9
`
`Desktop
`$5 (desktop)
`$5 (desktop)
`$5 (desktop)
`$65 cluster
`Desktop – $35
`Desktop – $35
`Desktop – $35
`$35 clusteri
`$35 clusteri
`$35 clusteri
`Desktop
`$222 clusterm
`$222 clusterm
`$222 clusterm
`$222 clusterm
`
`0.03
`<3 images, <1 sff
`20 images, 4 sff
`40 images, 8 sff
`20 pulsed, 2 Fastq
`0.1Fastq
`0.6Fastq
`TBD
`680j
`74k*
`148k*
`1k*
`50k*
`600
`£300k*
`£600k*
`
`Substitution
`Indel
`Indel
`Indel
`CG deletions
`Indel
`Indel
`Indel
`A-T bias
`A-T bias
`A-T bias
`Substitution
`Substitution
`Substitution
`Substitution
`Substitution
`
`0.1–1
`1
`1
`1*
`16
`1
`1*
`1*
`>0.06*
`>0.01*
`>0.01*
`>0.1*
`‡0.1
`‡0.1
`‡0.1
`‡0.1
`
`aDoes not include general purpose and library preparation equipment (e.g. Covaris [$45k], Agilent bioanalyzer [$18k], thermal cyclers,
`general purpose centrifuges, MilliQ water, etc.), but includes bead counters for emPCR (up to $20k), TissueLysers or similar for emPCR
`(up to $10k), specialty centrifuges, etc. when required by the instrument manufacturer. Many laboratories will need additional general
`purpose instruments.
`bAnnual maintenance agreements include on-site service, but do not include extra premiums for the fastest available service.
`cDesktops assume higher-end models with multiple processors, ‡8 GB RAM, ‡1 TB HD, etc. (up to $5k; except capillary = $2k desktop).
`dData file size transferred from instrument server to offline cluster.
`ePercentage of errors per base within single reads of the maximum length given in Table 2; rates among platforms are not exactly com-
`parable; reported Ion Torrent rates range from 0.46% to 2.4%; SOLiD rates are from reads with bases consistent on double or triple
`sequencing only; for Illumina, the 0.1% rate applies to > 85% of reads (not all reads); see text for additional details.
`fUpgrade to the 454 FLX instrument; FLX+ new purchases = $500k.
`gRequired $16.5k IonTorrent server for conversion of raw signals to basecalling; $1k ULTRA-TURRAXÒ Tube Drive; Argon gas tank
`(<$0.5k).
`hIncludes EZ Bead ePCR automation and required UPS; a Covaris ($45K) is also required but is not included to facilitate comparisons
`(because one is usually bought with any of the other sequencing systems).
`iCompute cluster available from Life Technologies.
`j85Gb run is a 2 · 50 run (the highest throughput on a SOLiD 4).
`kNew compressed binary data format saves base and quality-value data in a 1byte:1base ratio.
`lCost of cBot (required). Additional instruments for library preparation needed (Covaris, etc., similar to 454 FLX).
`mIllumina Compute – Tier 1 system: 3 cluster nodes with 8 cores and 48 GB RAM per node (i.e. 24 cores and 144 GB RAM) and 24 TB
`usable data storage; commodity equivalent systems are available for much less, but will require technical support (see text).
`nCost of cBot and Paired-end module (required for GAIIx).
`oHiSeq 1000 is upgradable to HiSeq 2000 for $175k.
`‘’ Indicates a likely value based on unpublished information available in February 2011 (i.e. author speculation).
`*Information based on company sources alone (independent data not yet available); also applies to Illumina TruSeq v3 chemistry.
`
`researchers. These instruments have signifi-
`vidual
`cantly reduced total costs per run and ⁄ or experiment.
`The smaller footprint of these instruments and poten-
`tial portability of at least some are also attractive fea-
`tures. Although these features will
`facilitate new
`research opportunities, researchers should be careful to
`weigh the significantly increased cost per read and
`cost per Mb of
`these instruments relative to other
`instruments and to the costs of outsourcing. It is obvi-
`
`instruments and low-cost
`these lower-cost
`ous that
`runs will be invaluable in small-scale experiments,
`gathering pilot data, quality control and validation.
`Researchers with these instruments will, however, still
`often find it economically most advantageous to send
`out fully processed and validated samples to be run
`on other lower-cost per read ⁄ Mb instruments (e.g.
`library pools with many indexes could be validated on
`the MiSeq ⁄ Ion Torrent
`314 ⁄ GS
`Junior, but
`then
`
`Ó 2011 Blackwell Publishing Ltd
`
`00007
`
`

`

`766 T . C . G L E N N
`
`Table 4 Primary advantages and disadvantages of each next-generation sequencing instrument
`
`Instrument
`
`3730xl (capillary)
`454 GS Jr. Titanium
`
`454 FLX Titanium
`454 FLX+
`Helicos
`
`PacBio
`
`Ion Torrent
`
`Ion Torrent – 314 chip
`
`Ion Torrent – 316 chip
`
`Ion Torrent – 318 chip
`
`SOLiD – 4
`
`SOLiD – 5500
`
`SOLiD – 5500xl
`Illumina MiSeq
`
`Illumina HiScanSQ
`
`Illumina GAIIx
`
`Illumina HiSeq 1000
`
`Illumina HiSeq 2000
`
`Primary
`advantages
`
`Low cost for very small studies
`Long-read length; low capital cost; low cost per
`experiment
`Long-read length
`Double the maximum read length of Titanium
`Large numbers

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket