`
`Parallel tagged sequencing on the 454 platform
`
`Matthias Meyer, Udo Stenzel & Michael Hofreiter
`
`Max Planck Institute for Evolutionary Anthropology, Department of Evolutionary Genetics, Deutscher Platz 6, D-04103 Leipzig, Germany. Correspondence should be
`addressed to M.M. (mmeyer@eva.mpg.de).
`
`Published online 31 January 2008; doi:10.1038/nprot.2007.520
`
`Parallel tagged sequencing (PTS) is a molecular barcoding method designed to adapt the recently developed high-throughput 454
`parallel sequencing technology for use with multiple samples. Unlike other barcoding methods, PTS can be applied to any type of
`double-stranded DNA (dsDNA) sample, including shotgun DNA libraries and pools of PCR products, and requires no amplification or
`gel purification steps. The method relies on attaching sample-specific barcoding adapters, which include sequence tags and a
`restriction site, to blunt-end repaired DNA samples by ligation and strand-displacement. After pooling multiple barcoded samples,
`molecules without sequence tags are effectively excluded from sequencing by dephosphorylation and restriction digestion, and using
`the tag sequences, the source of each DNA sequence can be traced. This protocol allows for sequencing 300 or more complete
`mitochondrial genomes on a single 454 GS FLX run, or twenty-five 6-kb plasmid sequences on only one 16th plate region. Most of the
`reactions can be performed in a multichannel setup on 96-well reaction plates, allowing for processing up to several hundreds of
`samples in a few days.
`
`INTRODUCTION
`Rationale
`Over the last three decades, Sanger sequencing1 has been the
`dominant DNA sequencing technology in all areas of life sciences,
`used to retrieve individual sequences or to decipher entire genomes.
`Although the throughput of this technology has gradually increased
`over time, it has now been exceeded by recently developed next-
`generation sequencing technologies2, such as 454 (Roche)3, Solexa
`(Illumina)4 and SOLiD (ABI). These technologies have increased
`the number of sequences obtained in a single run of a machine by
`several orders of magnitude, from mere hundreds to hundreds of
`thousands or even millions. Their superior efficiency in terms of
`both cost and time per sequenced nucleotide has not only spawned
`exploration in new sequencing fields, for example, ultra deep
`amplicon sequencing5 or paleogenomics6, but has also replaced
`Sanger sequencing in some of its ancestral domains, such as
`genome sequencing3,7 and serial analysis of gene expression8.
`Among the next-generation sequencing technologies, 454
`currently offers by far the highest read length, which is B250 bp
`on the GS FLX platform, not far from the 700 bp achieved through
`routine Sanger sequencing. However, despite its comparatively low
`throughput, Sanger sequencing is still used for many everyday
`applications, for example, amplicon sequencing and the sequencing
`of DNA fragments, a few kilobases long by primer walking. One
`important reason for this lies in a conceptual difference between
`Sanger and 454 sequencing, which affects the number of samples
`that can be processed in parallel. Whereas in Sanger sequencing
`each sequence read is derived from a separate sequencing reaction,
`454 uses emulsion PCR9 to amplify a pool of templates in a single
`reaction vessel before sequencing. Within one emulsion PCR, no
`information is retained about a sequence’s sample origin. Thus, to
`process several samples in parallel, these must be kept in separate
`pools, physically subdivided from each other not only during
`library preparation, but also during bead-emulsion-amplification
`and sequencing. However, the 454 sequencing plate can only be
`divided into a maximum of 16 regions, each of which yields on
`average 3 Mb of sequence. In many cases, this amount of sequence
`data produces an unnecessarily high redundancy in coverage. For
`
`example, if a 6-kb plasmid is shotgun sequenced on one 16th GS
`FLX plate region, it will be covered 500-fold on average. If it were
`possible to retain information about the sample origin of the
`obtained sequence reads, the same capacity could be used to
`sequence 25 such plasmids to 20-fold coverage.
`The method described here, called parallel tagged sequencing
`(PTS), allows for parallel sequencing large numbers of double-
`stranded DNA (dsDNA) samples on the 454 platform10. This is
`achieved by barcoding each sample with a specific sequence tag.
`After pooling the tagged DNA samples, library preparation and
`sequencing, the tag sequences are used to identify each sequence’s
`sample origin. The protocol (illustrated in Fig. 1) begins by blunt-
`end repairing each sample in separate reactions. Subsequently,
`barcoding adapters are ligated to both ends of the molecules.
`These adapters comprise single self-hybridized oligos containing
`a sequence tag and an Srf I restriction site. After ligation, the
`resulting single-strand nicks are removed by fill-in using a strand
`displacing polymerase. The barcoded samples are then quantified
`and pooled in ratios reflecting the desired relative sequence
`representation. After dephosphorylation, half of the adapter is cut
`off using Srf I11, a rare cutting restriction enzyme with restriction
`sites approximately every 150 kb in the human genome. Srf I leaves
`5¢ phosphates for the ligation of universal 454 adapters during
`sequencing library preparation. The dephosphorylation step
`excludes unreacted molecule ends from sequencing.
`PTS offers several important features providing both efficient use
`of sequencing resources and high data reliability. First, all reactions
`are completed with B100% efficiency, ensuring highly homo-
`geneous
`sequence
`representation among
`samples. Second,
`background sequences without a sequence tag are efficiently
`excluded from sequencing by dephosphorylation in conjunction
`with the use of a restriction enzyme. The Srf I restriction site
`produces a run of Gs before the tag and immediately adjacent to
`the key sequence used by the 454 system for quality controls. As the
`last nucleotide of this key is also Gua, all nucleotides remaining
`from the Srf I restriction site are inserted within a single flow cycle
`without significantly reducing the read length of the following
`
`NATURE PROTOCOLS | VOL.3 NO.2 | 2008 | 267
`
`natureprotocols
`
`/moc.erutan.www//:ptth
`
`
`
` puorG gnihsilbuP erutaN 8002 ©
`
`00001
`
`EX1005
`
`
`
`a
`
`b
`
`I
`
`Blunt end repair
`
`II
`
`Adapter ligation
`
`III
`
`Adapter fill-in
`
`IV
`
`Pooling, dephosphorylation,
`restriction digestion
`
`V
`
`454 Sequencing library preparation
`
`A
`
`B′
`
`VI
`
`454 Sequencing
`
`SrfI
`site
`
`Specific
`tag
`
`Target sequence
`
`Complementary
`tag /SrfI site
`
`sequencing. We are currently using PTS particularly for two
`sequencing applications, which are discussed below.
`
`Shotgun sequencing contiguous DNA segments. Owing to its
`high throughput and the absence of microbial subcloning, the 454
`technology enables faster and cheaper shotgun sequencing com-
`pared to the Sanger methodology. Using PTS, this power can be
`fully exploited for parallel sequencing contiguous DNA segments
`
`PROTOCOL
`
`Figure 1 | Overview of the tagging protocol. (a) Each DNA sample is blunt-
`end repaired (I, Steps 2–8), before sample-specific barcoding adapters are
`ligated to both ends of the molecules (II, Steps 9–13). Nicks resulting from
`the ligation are removed by strand displacement with Bst polymerase
`(III, Steps 14–16). The barcoded samples are pooled in equimolar ratios and
`unligated molecule ends are excluded from sequencing through
`dephosphorylation and restriction digestion (IV, Steps 18–25). A single-
`stranded 454 sequencing library is prepared from the sample pool; this
`includes the blunt-end ligation of universal 454 adapters to the template
`molecules and isolation of correctly ligated molecules as single strands
`(V, Step 26). After sequencing (VI, Steps 27 and 28) the sequence reads are
`sorted according to their tag sequences (Step 29). Before downstream data
`processing, the sequence tags are removed from the 5¢ ends and, if applicable,
`the 3¢ ends of the reads. (b) Barcoding adapters comprise single self-
`hybridized palindromic oligonucleotides, carrying an SrfI restriction site in the
`middle (GCCCGGGC), a sequence tag at the 3¢ end and the reverse
`complementary tag sequence at the 5¢ end. Each sequence tag may start with
`either an A or T, followed by several freely chosen nt, and ends in C or G. No
`homopolymers are allowed within the tag sequence.
`
`sequence. Third, the tag design is particularly robust to sequencing
`errors in and around homopolymers, which are known to be the
`most common errors in 454 sequencing. In our experience, B97%
`of the sequences can be assigned to their sample origin with an
`extremely low false-assignment rate. Finally, the protocol is opti-
`mized for reaction setup using multichannel pipettes and a 96-well
`plate format, minimizing the time required for setup.
`
`Applications of PTS
`In existing 454 applications, physical subdivision of the 454
`sequencing plate is frequently used to process up to 16 different
`samples in one run. As this requires covering the sequencing plate
`with a gasket, the overall number of sequences retrieved from one
`run is reduced by half (Table 1). Using PTS instead of physical
`subdivision for such applications immediately doubles the sequen-
`cing throughput. Moreover, as theoretically an unlimited number
`of tags can be produced, PTS overcomes any limitation on the
`number of samples that can be processed in parallel. In principle,
`PTS can be applied to all types of double-stranded nucleic
`acid samples, allowing an efficient switch from Sanger to 454
`
`natureprotocols
`
`/moc.erutan.www//:ptth
`
`
`
` puorG gnihsilbuP erutaN 8002 ©
`
`TABLE 1 | Sequencing throughput of the GS 20 and GS FLX platforms. 454 Sequencing plates can physically be subdivided into a minimum of two
`and a maximum of 16 regions. When using 16 plate regions, roughly half of the output is lost, as parts of the plate are covered with a gasket.
`In contrast, PTS allows for sequencing hundreds of samples in parallel without requiring physical subdivision, thereby retaining the maximum
`throughput.
`
`Sequencing platform
`Average read length
`
`Plate region
`Reads per region
`Base pairs per regions
`Base pairs per plate
`
`GS 20
`B100 bp
`
`1/4th
`33,000
`3.3 Mb
`13.2 Mb
`
`1/16th
`6,300
`630 kb
`10 Mb
`
`1/2
`100,000
`10 Mb
`20 Mb
`
`1/16th
`12,000
`2.88 Mb
`46 Mb
`
`Number of samples per plate region that can be processed in parallel using PTS
`17 kb segments, for example, mtDNA genomes (shotgun sequenced, average 20-fold coverage)
`2
`10
`29
`6 kb segments, for example, plasmids (shotgun sequenced, average 20-fold coverage)
`5
`28
`83
`PCR products (o100/250 bp length, average 40-fold coverage)
`158
`825
`
`2,500
`
`8
`
`24
`
`300
`
`GS FLX
`B250 bp
`
`1/4th
`70,000
`16.8 Mb
`67 Mb
`
`49
`
`140
`
`1/2
`210,000
`50.4 Mb
`101 Mb
`
`148
`
`420
`
`1,750
`
`5,250
`
`268 | VOL.3 NO.2 | 2008 | NATURE PROTOCOLS
`
`00002
`
`
`
`PROTOCOL
`
`30
`
`25
`
`20
`
`15
`
`10
`
`5 0
`
`c
`
`Sequence reads
`
`0
`
`2,000 4,000 6,000 8,000 10,00012,000 14,00016,000 18,000
`Nucleotide position
`
`Tag 25
`
`Tag 27
`
`Tag 11
`
`Tag 7
`
`Tag 5
`
`Tag 21
`
`Tag 9
`Tag 17
`
`Tag 3
`
`Tag 23
`
`Tag 15
`
`Tag 19
`
`Tag 13
`
`Tag 1
`
`No tag
`
`2,000
`1,800
`1,600
`1,400
`1,200
`1,000
`800
`600
`400
`200
`0
`
`b
`
`Number of sequeces
`
`a
`
`Sequences
`
`Number % of total
`
`Average
`coverage
`
`Fully
`assembled
`
`Total
`
`Tag 1
`Tag 3
`Tag 5
`Tag 7
`Tag 9
`Tag 11
`Tag 13
`Tag 15
`Tag 17
`Tag 19
`Tag 21
`Tag 23
`Tag 25
`Tag 27
`No tag
`
`14,780
`683
`938
`1,058
`1,091
`1,019
`1,107
`776
`901
`987
`888
`1,027
`933
`1,525
`1,284
`563
`
`4.6
`6.3
`7.2
`7.4
`6.9
`7.5
`5.3
`6.1
`6.7
`6.0
`6.9
`6.3
`10.3
`8.7
`3.8
`
`9.6
`13.0
`14.6
`15.0
`14.1
`15.5
`12.6
`12.5
`13.8
`12.5
`14.1
`12.9
`20.3
`17.6
`
`Yes
`Yes
`Yes
`Yes
`Yes
`Yes
`No
`Yes
`Yes
`Yes
`Yes
`Yes
`Yes
`Yes
`
`Figure 2 | PTS of 14 complete human mtDNA genomes (B16.5 kb) on a small GS FLX plate region to on average 14-fold coverage. The mtDNA genomes were
`amplified in two overlapping long-range PCRs as described previously10. The long-range PCR products were then quantified, pooled in equimolar ratios and
`nebulized. Between 100 and 200 ng of each sample were used as templates for the tagging reactions using barcoding adapters with 7-bp sequence tags differing
`by at least three substitutions. From 40 barcoding adapters that had been synthesized and diluted in a single batch in order from 1 to 40, no immediate
`neighbors were used to obtain full power for detecting cross-contamination among barcoding adapters. After barcoding, the samples were pooled in equal mass
`ratios. (a) Table of sequencing results showing the number of sequences and coverage obtained for each sample. As a result of inaccuracies in quantification or
`pipetting when pooling the long-range PCR products, one sample exhibited uncovered positions. These could be filled-in by deeper sequencing or single Sanger
`reads. No sequences with sequence tags from unused barcoding adapters were observed, indicating no detectable presence of cross-contamination among the
`barcoding adapters and the absence of sequencing errors potentially leading to false assignment of sequences to their sample origin. Thus, the best estimate of
`the false-assignment frequency in this experiment is zero. (b) Bar chart visualizing the sequence representation among samples, ordered from lowest to highest.
`(c) Exemplary coverage plot for one of the mitochondrial genomes (tag 23).
`
`from numerous samples, such as plasmids or target regions pre-
`amplified by long-range PCR (see ref. 10, Fig. 2). For example,
`when sequencing is performed on the new 454 GS FLX platform,
`up to 300 complete mitochondrial genomes or a comparable
`number of nuclear DNA fragments of similar length (B17 kb)
`can theoretically be sequenced to 20-fold coverage in parallel in a
`single run (see Table 1). In this way, population data produced by
`re-sequencing can be obtained with unprecedented speed. In
`contrast to Sanger-based primer walking approaches, shotgun
`sequencing long-range PCR products does not require a priori
`sequence information for designing sequencing primers and
`saves
`time and costs
`for setting up individual PCRs and
`sequencing reactions.
`
`Sequencing pooled amplicons. This application is useful when
`short sequences within the 454 read length limit are desired, as in
`population studies using ancient DNA or DNA from museum
`specimens. As 454 sequence reads stem from single template
`molecules, miscoding base damage and contamination can be
`readily identified without microbial subcloning of the PCR pro-
`ducts. This allows for cost- and time-efficient sequencing of pooled
`amplicons from multiple samples, while retaining the highest
`standards in ancient DNA and museum research12,13. Phylogenetic
`and population genetic studies are also increasingly performed
`using multiple short nuclear sequences, totaling from a few to 30 kb
`of sequence14–16. In such applications, complete data sets for whole
`species groups as well as large population samples could be
`obtained in either a single or partial 454 run. In addition to low
`coverage sequencing of many pooled amplicons, PTS can be used
`for deep sequencing fewer amplicons in parallel.
`
`all types of dsDNA samples. It is also the only currently available
`method for barcoding shotgun DNA libraries and pooled PCR
`products.
`The previously reported barcoding methods used 5¢-tagged PCR
`primers to distinguish PCR products derived from different
`sources17–19. The universal 454 adapter sequences were either
`included as additional 5¢-tails or added in the regular 454 library
`preparation process. This approach is simple, quick and efficient
`for sequencing short (o250 bp) homologous PCR products from
`different samples, because combinations of tagged forward and
`reverse primers can be used to barcode a large number of samples.
`However, when dealing with many different targets, this approach
`becomes cost-prohibitive and prone to confusion, because sets of
`primers must be synthesized for each target under study, and the
`primers must be added separately both to each PCR and the
`corresponding control reaction. PTS is preferable in this case, as
`it is suitable for simultaneously barcoding a pool of PCR products.
`It does not require changes to the experimental design of existing
`PCR applications and provides the flexibility of choosing the
`sequencing strategy after amplification. Furthermore, because
`they consist only of single short self-hybridized oligos, barcoding
`adapters for PTS are cheap to synthesize and can be reused in
`subsequent experiments.
`Another method has recently been introduced for parallel
`sequencing small RNAs20. It involves stepwise single-stranded
`ligation of universal adapters to both ends of the RNA molecules.
`Barcoding is then achieved by re-amplification with tagged PCR
`primers. Although this method may be suitable for barcoding small
`RNAs, the protocol is very complex and has yet to be applied to
`dsDNA samples.
`
`Comparison to other barcoding methods
`Although other methods for barcoding and sample multiplexing on
`the 454 platform have been previously introduced, these methods
`are limited to the parallel sequencing of PCR products. In contrast,
`PTS is the first barcoding method to allow for parallel sequencing
`
`Limitations of the method
`One limitation of the method arises through the use of a restriction
`enzyme. With its GC-rich 8 bp recognition sequence, Srf I is a rare
`cutter in mammalian genomes, with restriction sites approximately
`only every 150 kb in the human genome. However, it may cut more
`
`NATURE PROTOCOLS | VOL.3 NO.2 | 2008 | 269
`
`natureprotocols
`
`/moc.erutan.www//:ptth
`
`
`
` puorG gnihsilbuP erutaN 8002 ©
`
`00003
`
`
`
`be measured and adjusted before barcoding (after Step 1).
`Although requiring very little material, the latter strategy yields
`less homogeneous sequence representation, as handling variation
`may cause different recoveries during the purification steps.
`For shotgun sequencing long-range PCR products, we generally
`recommend using long-range PCR kits from Roche (Expand Long-
`Range dNTPack, Expand 20kbPlus PCR Systems), which in our
`hands yield superior results as compared with other suppliers.
`
`Choosing a tag length. To avoid falsely assigning sequences to their
`respective samples, not all possible tags of a certain length should be
`used, as single substitutions attributable to sequencing errors could
`convert one tag into another. A tag length of 6 nt produces only 72
`different tags that are at least two substitutions apart, and 21 different
`tags that are at least three substitutions apart. These numbers are 173
`and 52 for 7-nt tags, and 475 and 130 for 8-nt tags, respectively. With
`a minimal distance of two substitutions between 6-nt tags, we
`previously estimated a false-assignment rate of B0.35% on the GS
`20 platform10. Using the new GS FLX platform and 7-nt tags, this
`number drops to 0.03% at a minimal distance of two, and o0.01%
`at a minimal distance of three substitutions, respectively (M.M.,
`unpublished data). However, these numbers should be considered as
`rough estimates only, as they may vary among runs and with the
`purity of oligos. We recommend independently estimating the false-
`assignment rate in each experiment (see Box 1 and QUALITY
`CONTROL, below). Since on the GS FLX platform the read length
`has increased to B250 bp compared with 100 bp on the previous GS
`20 system, tag lengths of 7 or 8 nt do not significantly reduce the
`amount of usable sequence data obtained by PTS, but provide the
`opportunity to sequence hundreds of samples in parallel at extremely
`low false-assignment rates.
`
`Coverage requirements and sequencing strategy. An issue that
`requires careful evaluation before PTS is begun is the amount of
`sequence coverage required. For shotgun sequencing, in our experi-
`ence 10- to 20-fold average coverage is sufficient for re-sequencing
`mono-allelic sequences, such as mitochondrial genomes. Indel
`sequencing errors around homopolymers can be eliminated by
`comparison to a reference sequence. If no closely related sequences
`are available for comparison, higher sequence coverage (B30-fold)
`is preferred for obtaining sequences with a low indel rate. If nuclear
`sequences with two potential alleles are sequenced, higher coverage
`is necessary to ensure that both alleles are detected in heterozygous
`samples. In general, the coverage requirements must be chosen
`according to the specific needs of a study.
`While estimating the coverage requirements for a study, it is
`important to understand that sequence representation is approxi-
`mately normally distributed among samples. Whereas most sam-
`ples will be covered by the desired number of sequence reads, a few
`samples will be covered higher or lower. For samples quantified
`after barcoding (see SAMPLE REQUIREMENTS, above), we
`usually observed at maximum B50% deviation from the mean
`coverage (Fig. 2 and M.M., unpublished data). This can be
`compensated for with either higher coverage sequencing a priori,
`or by subsequently filling in sequences from under-represented
`samples in an additional run on a small plate region. For large-scale
`projects, where one or several full runs will be completed, the
`sequencing resources can be optimally exploited by initially
`sequencing part of
`the sample pool on only a small plate
`
`PROTOCOL
`
`often in GC-rich bacterial genomes. As the 454 universal adapters
`are added after the Srf I restriction step, only sequence coverage
`immediately around an Srf I site is lost, and gaps can be filled with
`single Sanger sequence reads. If sequencing templates are known to
`contain Srf I sites, it is possible to enzymatically methylate all Srf I
`sites before adapter ligation using CpG methyltransferase according
`to the supplier’s protocol (http://www.neb.com). Owing to the
`inability of Srf I to cut CpG methylated restriction sites, this should
`effectively mask all restriction sites, thereby eliminating restriction
`occurring within template molecules.
`Another important issue is the quality of the resulting sequences.
`With regard to substitutional errors, well above 99.99% accuracy
`was consistently reported for shotgun consensus sequences on the
`GS 20 platform3,10,21,22, and 99.92% were estimated for single reads
`in a recent study23. However, single base pair insertions and
`deletions (indels) occur with considerable frequency both within
`and around homopolymer regions, often persisting even at high
`coverage. In shotgun consensus sequences of human mitochondrial
`genomes10, we recently observed indel errors at a frequency of
`0.27%, although previous estimates from shotgun consensus
`sequences were about ten times lower3,21,22. However, all current
`estimates should be considered with caution, as the error rate varies
`among different versions of the 454 assembly programs newbler
`and runMapper (see ref. 21, and M.M., unpublished data), and may
`also differ among runs. As single base pair indels can be identified
`as frame-shift mutations in coding sequences or by comparison to
`closely related sequences, they usually have no practical impact on
`sequence usability.
`
`Experimental design
`Several points should be considered before large-scale sequencing
`projects are performed using PTS.
`
`In principle, every double-stranded
`Sample requirements.
`nucleic acid sample with natural 5¢-ends (hydroxyl or phosphate)
`is a suitable template for parallel sequencing using PTS. However,
`there are upper and lower limits on template size. The upper limit is
`defined by the 454 process, as the maximum read lengths obtained
`on the GS 20 and GS FLX platforms are B100 and 250 bp,
`respectively. In addition, fragments of 800 bp or more amplify
`poorly in emulsion PCR. Adequate fragment length distributions,
`for example, for shotgun sequencing, can be achieved by DNA
`shearing. The lower size limit is introduced through the SPRI bead
`purification steps in the PTS protocol. SPRI bead purification24 is
`quick and efficient, but does not recover molecules o80–100 bp. If
`shorter molecules need to be sequenced (50–100 bp), all SPRI
`purification steps can be replaced by MinElute Spin column
`purification (Qiagen), using the same elution volumes and buffers
`without additional changes.
`The minimal material requirements for PTS are very low. As 454
`sequencing is possible from picogram amounts of DNA25, and
`there are no significant losses in the tagging protocol, o1 ng of
`initial material per sample is theoretically sufficient. However, the
`sequence representation of each barcoded sample depends on its
`relative concentration in the sample pool, and is thus affected by the
`accuracy of DNA quantification. For optimal results, we recom-
`mend measuring DNA concentration after barcoding (Step 18).
`This strategy requires at least 100 ng of initial material for Pico-
`Green quantification. If less material is available, DNA amounts can
`
`270 | VOL.3 NO.2 | 2008 | NATURE PROTOCOLS
`
`natureprotocols
`
`/moc.erutan.www//:ptth
`
`
`
` puorG gnihsilbuP erutaN 8002 ©
`
`00004
`
`
`
`PROTOCOL
`
`BOX 1 | ESTIMATING THE FALSE-ASSIGNMENT FREQUENCY
`
`The reliability of PTS should be independently evaluated in each experiment. This can be achieved by estimating the false-assignment
`frequency, that is, the frequency at which false assignment of sequences to their sample origin is expected, based on the occurrence of sequence
`reads that carry tag sequences from unused barcoding adapters. The ability to detect false assignment improves as the number of barcoding
`adapters that remain unused in an experiment increases.
`False-assignment frequency ¼ F
`T N
`A N
`F, Number of sequences carrying tags from unused barcoding adapters
`T, Total number of sequence reads obtained in the experiment
`N, Total number of barcoded samples that were sequenced in parallel
`A, Total number of barcoding adapters within the chosen category that have actually been synthesized (e.g., 52 if all 7-bp tags differing by at
`least three substitutions were synthesized)
`The formula is based on the assumption that all tags can be converted into one another with the same probability. It is a composite estimate
`of false assignment that occurs due to sequencing errors and cross-contamination of barcoding adapters during synthesis and dilution. It does
`not consider the possibility that cross-contamination is introduced while preparing samples for PTS or setting up the blunt-end repair and
`ligation reactions (Steps 1–10) and is unlikely to detect punctual contamination. Thus, careful pipetting is strongly advised.
`
`region. Subsequently, samples can be re-pooled according to the
`observed sequence representation, thereby guaranteeing optimal
`sequence representation among samples during the large-scale
`sequencing phase.
`
`Quality control. Finally, as the assignment of tags to the correct
`sample source is critical, some quality control should be performed.
`The two major factors leading to false assignment of sequences are
`cross-contamination of barcoding adapters and sequencing errors.
`By testing a subset of barcoding adapters in a small-scale experiment
`before large-scale adoption of PTS, it is possible to monitor whether
`cross-contamination of adapters occurred during synthesis or
`dilution. If cross-contamination goes undetected, misassignment
`of sequences will occur. Moreover, in such a small-scale preliminary
`experiment, the occurrence of tag sequences from unused adapters
`can be monitored and used to estimate the false-assignment rate due
`to sequencing errors and/or errors occurring during oligo synthesis.
`By randomly omitting a small subset of barcoding adapters used in a
`
`study, the same quality control is advised for every experiment using
`PTS (Box 1). As all molecules carry sequence tags on both ends,
`another independent, albeit less stringent, quality check can be
`performed by comparing the 5¢ and 3¢ tag sequences in reads where
`the ends of molecules have been reached. The repeated occurrence
`of
`identical
`false tag pairs indicates that cross-contamination
`persists among barcoding adapters or was introduced during
`pipetting. It should be noted, however, that the identification of
`3¢ tag sequences is less reliable due to the higher sequencing error
`rate near the ends of reads and possible misidentification of 3¢
`adapter sequence starting points. In addition, for many samples
`with relatively long mean fragment sizes, such as shotgun libraries,
`the majority of sequences will terminate before the end of the
`molecule and the 3¢ adapter are reached.
`When combining PTS with physical separation of the sequencing
`plate, avoid sequencing different libraries containing the same
`sequence tags on neighboring regions, as occasionally leakage of
`sequencing beads occurs.
`
`MATERIALS
`REAGENTS
`. T4 DNA polymerase (Fermentas, cat. no. EP0062)
`. T4 polynucleotide kinase (Fermentas, cat. no. EK0032)
`. T4 ligase (Fermentas, cat. no. EL0331), including 50%
`PEG-4000 solution and 10 ligation buffer
`. Bst DNA polymerase, large fragment (NEB, cat. no. M0275S), including
`10 ThermoPol buffer
`. Calf-intestine phosphatase (NEB, cat. no. M0290S), including
`10 NEBuffer 3
`. Srf I restriction enzyme (Stratagene, cat. no. 501064), including
`10 universal buffer
`. 10 Buffer Tango (Fermentas, cat. no. BY5)
`. ATP (Fermentas, cat. no. R0441), 100 mM stock solution
`. dNTPs (GE Healthcare, cat. no. US77119-500UL), 25 mM each
`. BSA (Sigma-Aldrich, cat. no. B4287), powder for preparation of a 10 mg
`ml 1 stock solution in water
`. Water, HPLC-grade (Sigma, cat. no. 270733)
`. Absolute ethanol (Merck, cat. no. 1.00983.2500)
`. TE buffer (many suppliers or self-made); 10 mM Tris–HCl, 0.1 mM EDTA,
`pH 8.0
`. EB buffer (supplied with MinElute PCR Purification kit); 10 mM Tris–HCl,
`pH 8.5
`. DNA-loading dye (Fermentas, cat. no. R0611)
`
`. Ethidium bromide (Sigma, cat. no. 46067), 1% solution ! CAUTION
`Mutagen and potential carcinogen.
`. TBE electrophoresis buffer (Sigma, cat. no. 51309), 10 concentrate
`. MinElute PCR purification kit (Qiagen, cat. no. 28006)
`. PicoGreen dsDNA quantitation reagent (Invitrogen, cat. no. P11495)
`. Oligonucleotides (Metabion), desalted, lyophilized. Sequences for sets of
`barcoding oligos with 6–8-nt tags are available in Supplementary Table 1
`online (see also REAGENT SETUP) m CRITICAL Basic post-synthesis
`purification (desalting) is sufficient. Additional purifications, such as HPLC
`or PAGE, increase the risk of cross-contaminating oligonucleotides. The
`oligos should be synthesized on larger scales to suffice for several
`rounds of PTS.
`. AMPure SPRI PCR purification kit (Agencourt, cat. no. 000130)
`. GS DNA Library Preparation Kit (Roche, cat. no. 04852265001), including
`nebulizers and nebulization buffer m CRITICAL Only 10 nebulizers and
`20 ml nebulization buffer are supplied. Additional nebulizers can be
`obtained from Graham-Field (cat. no. BF61402). Nebulization buffer consists
`of 53.1% glycerol, 37 mM Tris–HCl, 5.5 mM EDTA, pH 7.5 (ref. 3).
`EQUIPMENT
`. 96-Well PCR plates
`. Multichannel reagent basin (e.g., Thermo Scientific, cat. no. 9510027)
`. Filter tips
`. SPRIPlate 96R—Ring Magnet Plate (Agencourt, cat. no. 000219)
`
`NATURE PROTOCOLS | VOL.3 NO.2 | 2008 | 271
`
`natureprotocols
`
`/moc.erutan.www//:ptth
`
`
`
` puorG gnihsilbuP erutaN 8002 ©
`
`00005
`
`
`
`PROTOCOL
`
`. Agarose gel electrophoresis unit
`. NanoDrop spectrophotometer (NanoDrop Technologies)
`. Stratagene MX 3005P QPCR System or any other qPCR system or
`microplate reader suitable for PicoGreen fluorescence measurements
`. GS 20/GS FLX Genome Sequencer and associated equipment
`. Software for data analysis (available with accompanying usage instructions
`at http://bioinf.eva.mpg.de/pts/)
`REAGENT SETUP
`Preparing barcoding adapters Barcoding adapters comprise of single self-
`hybridized oligos. Dissolve the oligos in TE to obtain 500 mM stock solutions.
`Create barcoding adapters by setting up separate reactions in PCR tubes
`containing in final concentrations 400 mM of one oligo and 1 T4 ligase buffer.
`Incubate in a thermal cycler with a temperature profile of 95 1C for 10 s and a
`ramp to 25 1C at a rate of 0.1 1C s 1. Immediately freeze the barcoding adapters
`at 20 1C until further usage m CRITICAL The barcoding adapters may be
`prepared and stored in PCR strip tubes or plates to allow for subsequent
`handling with multichannel pipettes. However, be extremely careful
`not to cross-contaminate the adapters during preparation or later handling.
`Preparation of aliquots may be useful for minimizing the number
`of handling cycles.
`
`Generating a positive control template To control the efficiency of the tagging
`reactions, we strongly recommend carrying a PCR product as a control template
`alongside the samples for sequencing. Any PCR product producing a single
`distinct band of a size between 100 and 200 bp is suitable, as long as it is
`generated with Taq polymerase and unmodified PCR primers. Purify the PCR
`product using a MinElute spin column and adjust it to a concentration of 25 ng
`ml 1. As 24 ml is needed for one control experiment, several PCRs