throbber
Methods in next-Generation sequencinG
`
`Research Article • DOI: 10.2478/mngs-2013-0001 • MNGS • 201 • 10–203
`
`An improved approach to mate-paired library preparation for
`Illumina sequencing
`
`Abstract
`High quality data from mate-pair libraries provides long range sequence
`linkage across the genome, which is crucial for de novo assembly and
`structural-variant detection. Current commercial methods available
`for the construction of such libraries have differing limitations and are
`often linked to a single sequencing platform in a kit format, which may
`not be cost effective. We present an alternative mate-paired protocol,
`demonstrated using
`Illumina sequencing platforms, combining
`the
`specificity of hybridisation and ligation, to circularise fragments with high
`yield. An adapter sequence is incorporated between the junction site of
`the mate pairs, the length of which is evenly controlled by nick translation.
`We present a comparison of results from 3 Kb E. coli and Plasmodium
`falciparum 3D7 mate-pairs made with our protocol, alongside commercial
`mate-pair methods. Furthermore, we present the results of a set of 3 and
`6 Kb mate-pair libraries from seven different mouse strains made with our
`mate-pair protocol to demonstrate its reliability and robustness.
`
`Naomi Park*,
`Lesley Shirley,
`Yong Gu,
`
`Thomas M. Keane,
`Harold Swerdlow,
`Michael A. Quail
`
`The Wellcome Trust Sanger Institute,
`Wellcome Trust Genome Campus,
`Hinxton, Cambridge, UK
`
`Keywords
`Circularisation • Next Generation Sequencing • Mate-Pair • Long Insert • Illumina
`• Nextera • Pippin Prep
`
`Received 18 February 2013
`Accepted 26 June 2013
`
`© Versita Sp. z o.o.
`
`Introduction
`
`Mate-pair libraries, also known as long-insert libraries, have
`been used successfully to aid de novo sequencing, structural-
`variant detection and genome finishing [1]. Distance information
`from mate-pair reads has particular value for joining contigs
`flanking repetitive sequences. The resolution of larger structural
`rearrangements such as insertions, deletions and inversions are
`aided by mapping mate-pair reads to a reference sequence [2].
`However, the construction of mate-pair libraries is notoriously
`difficult [3], particularly for degraded samples or for samples with
`limited amounts of DNA. During mate-pair library preparation,
`distal sequences are brought together in a circularisation reaction
`after which the majority of the large fragment is removed, leaving
`the original ends as a juxtaposed mate-pair, insert sizes are
`typically between 2 and 40 Kb. Mate-pair libraries should map
`onto the reference sequence as ‘outward-facing’ paired reads,
`with a gap between the mapped reads that is approximately the
`same as the size of the original fragments that were selected.
`Low quality, damaged or degraded DNA samples often lead to
`an increase in undesirable ‘inward-facing’ reads, which align to
`the reference sequence pointing towards each other and tend
`to map close together. The desired insert size is directed by the
`specific project and this in turn directs selection of a suitable
`methodology. Techniques such as Illumina’s (San Diego, CA,
`USA) Mate Pair Library Preparation Kit v2 and Nextera Mate Pair
`
`* E-mail: nh4@sanger.ac.uk
`
`10
`
`Sample Prep Kit , as well as both of the SOLiD 4 and SOLiD 5500
`methods (Life Technologies; Staley Road, Grand Island, NY, USA),
`utilise intramolecular circularisation to bring together the ends
`of smaller fragments (i.e. 3-10 Kb). Cre-Lox recombination has
`been used for both smaller and larger fragment libraries (i.e. 3-20
`Kb) [3,4].Even larger fragment sizes (i.e. 40 Kb) may be inserted
`into a fosmid or BAC vector as in the Lucigen (Middleton, WI,
`USA) NxSeq 40 Kb mate-pair cloning kit [5], or up to 300 Kb from
`the utilisation of existing BAC libraries previously prepared for
`Sanger capillary sequencing [6]. Each circularisation technique
`is suitable for sequencing on Illumina NGS platforms, or may be
`easily adapted to do so. As per the Illumina mate-pair methods,
`linear mate-pair fragments generated by other techniques can be
`captured on streptavidin beads for end-repair, A-tailing, adapter
`ligation and PCR amplification. However, in our experience,
`no single commercially available circularisation method for the
`generation of 3 Kb mate pair libraries is both reliable and optimal
`for all sample types.
`
`Illumina mate-paired libraries
`Illumina mate-paired libraries utilise blunt-ended circularisation
`of 3-5 Kb fragments, followed by a secondary fragmentation
`step. During Mate Pair Library Preparation Kit v2 (Illumina v2)
`library construction, biotinylated nucleotides are incorporated
`at the ends of the sheared fragments. Degraded samples may
`contain nicks into which the biotin can also insert which, after the
`
`00001
`
`EX1029
`
`

`

`An improved approach to mate-paired library
`preparation for Illumina sequencing
`
`secondary fragmentation step, are bound by streptavidin beads
`alongside genuine mate-pairs. As ligation between blunt ends is
`generally more difficult to achieve than ligation between cohesive
`ends, [7] the circularisation yield may be poor, leading to a lower
`complexity final library. Ligation between two independent
`fragments can also generate an undesirable proportion of
`chimeric reads. Random secondary fragmentation of circularised
`fragments causes uneven genomic sequence length either
`side of the junction and, as the junction contains no adapter
`sequence, sequencing reads that pass through the junction of
`the two joined ends cannot be identified and pose problems
`during mapping and de novo assembly [4].The Nextera Mate
`Pair Sample Prep Kit (Nextera) circumvents a number of these
`issues. Transposome mediated fragmentation and biotinylated
`adapter tagging of genomic DNA generates an identifiable mate-
`pair junction sequence. Whilst biotin will not incorporate into the
`nicks of degraded DNA, tagmentation of poor quality samples
`is likely to fragment such DNA to a size below the desired size
`range (Nextera Mate Pair Sample Preparation Guide). As with
`Illumina v2, circularisation proceeds via blunt ended ligation and
`random secondary fragmentation.
`
`SOLiD mate-paired libraries
`The Life Technologies 2x50 bp Mate-Paired Library kit for the
`SOLiD 4 system incorporates hybridisation and ligation in order
`to circularise fragments. As described in the manufacturer’s
`protocol, an adapter is ligated to both ends of the 3 Kb fragment;
`the adapter has a 2-base overhang. A biotinylated internal
`adapter, complementary to the two overhangs is added and the
`ligation reaction is held at 20°C to complete circularisation. The
`efficiency of the circularisation is often low, probably due to the
`short 2 bp hybridisation region which may limit the efficiency of
`the ligation [8]. The mate-pair protocol for the SOLiD 5500 series
`improves the yield of circularisation in comparison to the SOLiD
`4 method (Life Technologies press release). A different left and
`right adapter are ligated which contain longer base overhangs,
`the exact length of which is undisclosed, blocked by short
`oligonucleotides. The reaction is heated to 70°C and cooled,
`during which time the blocking groups denature, allowing the
`left and right complementary overhangs to anneal to each other,
`thus forming a circle. Because the left and right adapters are of
`different sequence composition, only ~50% of adapter-ligated
`fragments are amenable to circularisation. Additionally, the 70°C
`temperature of the circularisation reaction may be detrimental
`to AT-rich genomes and/or to degraded samples. All SOLiD
`circularised fragments contain two nicks on opposite strands
`either side of the biotinylated adapter, which are nick-translated
`into the inserted genomic sequence, and subsequently digested
`with T7 and S1 exonuclease at the translated nick site to give
`linear dsDNA for streptavidin-bead capture. This process of
`nick translation and digestion from an adapter sequence is
`favourable in comparison to random shearing, which causes
`uneven genomic sequence length either side of the junction.
`As the junction of the joined DNA ends is marked by a known
`adapter sequence, the reads can be trimmed or split easily.
`
`Cre-Lox recombination
`libraries
`Roche
`(Penzberg, Germany) GS-FLX paired-end
`generate 3-20 Kb mate-pairs with cre-recombinase mediated
`recombination. Adaptations of this method to generate Illumina
`sequencing-ready libraries have previously been reported in the
`literature [3,4]. Circularisation adapters which contain LoxP sites
`are ligated to both ends of the fragment. This product undergoes
`cre recombination to generate circularised DNA containing
`biotinylated adapter sequence at the junction site. Although this
`method of circularisation is highly efficient, random secondary
`fragmentation of circularised fragments generates uneven
`sequence length either side of the LoxP adapter sequence,
`which may cause sequencing reads to pass through the adapter
`and into the other side, resulting in mapping issues.
`
`Improved (Sanger) mate-paired libraries
`In order to generate unbiased and diverse Illumina mate-paired
`libraries containing even genomic sequence either side of a
`common adapter sequence, we altered the Illumina mate-pair
`protocol to use a modified SOLiD 4 hybridisation and ligation
`circularisation approach. A single double stranded adapter
`(coloured green in Figure 1) is ligated to each end of the sheared
`fragment, leaving a 9 base overhang. This increased length of
`sticky end, as shown in Figure 2, generates a stable structure for
`subsequent ligation to a biotinylated internal adapter (coloured
`red in Figure 1). Due to the absence of a phosphate at the 5’
`end of the “Adapter Bottom” oligonucleotide, the circularised
`fragments contain one nick on each strand, which are used
`for nick translation into the genomic sequence, as is done with
`both SOLiD methods. The nicked sites are extended outward
`from the mate-pair region by T7 exonuclease (New England
`Biosciences; Ipswich, MA, USA), generating a single stranded
`region. This single stranded region is digested by S1 nuclease
`(Life Technologies), releasing a linear biotinylated mate-paired
`fragment from the rest of the circle. Subsequently only the
`biotinylated mate-paired fragment is captured by streptavidin-
`beads for Illumina library preparation.
`This new method
`(See Supplementary method) was
`compared to the Illumina v2, Nextera, SOLiD 4, SOLiD 5500
`, and the cre-lox methods of circularisation by making 3 Kb
`E. coli and Plasmodium falciparum 3D7 (pf3D7) mate-pair
`libraries. Each method was adapted for sequencing on Illumina
`platforms and sequenced in a multiplexed pool. The robustness
`and reproducibility of our mate-pair method was further
`demonstrated by 3 Kb and 6 Kb mate-paired libraries made from
`seven mouse strains. For each library, we present an analysis of
`the post-circularisation yield and the total library yield. We also
`show sequencing quality metrics for mapped read percentage,
`total mapped reads, proper-paired reads along with the number
`of singletons, duplicates and chimeras.
`
`Results and Discussion
`
`An ideal mate-pair library will have the following features:
`a diverse population of reads which align to the reference
`
`11
`
`00002
`
`

`

`Figure 1.
`
`N. Park et al.
`
`5’ pCTGCTTGTGGACGTTGTACATCGTGGTGC 3’
`
`
`Adapter Top
`5’ TGTACAACGTCCACAAGCAG 3’
`
`Adapter Bottom
`5’ pGGAGCCTAGTGCGCACCACGA 3’
`Internal Adapter Top
`5’ pGCACTAGGCTCCGCACCACGA 3’
`Internal Adapter Bottom
`Double stranded Adapter after annealing
`5’ pCTGCTTGTGGACGTTGTACATCGTGGTGC 3’
`3’ GACGAACACCTGCAACATGT 5’
`Double stranded Internal Adapter after annealing
`5’ pGGAGCCTAGTGCGCACCACGA 3’
`3’ AGCACCACGCCTCGGATCACGp 5’
`
`Figure 2.
`Figure 1 Adapter and Internal Adapter Oligonucleotides. The double stranded adapter (green) has a 9-base overhang which is complementary to each
`
`9-base overhang of the internal adapter (red). The biotinylated thymidine (T) is highlighted in yellow.
`
`Post Adapter ligation:
`5’ TGTACAACGTCCACAAGCAG NNNNNNNN CTGCTTGTGGACGTTGTACATCGTGGTGC 3’
`3’CGTGGTGCTACATGTTGCAGGTGTTCGTC NNNNNNNN GACGAACACCTGCAACATGT 5’
`
`Post Circularisation with the Internal Adapter:
` 5’ NNNN CTGCTTGTGGACGTTGTACATCGTGGTGCGGAGCCTAGTGCGCACCACGATGTACAACGTCCACAAGCAG NNNN 3’
` 3’ NNNN GACGAACACCTGCAACATGTAGCACCACGCCTCGGATCACGCGTGGTGCTACATGTTGCAGGTGTTCGTC NNNN 5’
`
`
`Figure 2. Schematic of the library preparation steps. NNNN denotes 3 Kb DNA fragments ligated to the adapter. The internal adapter hybridises and
`ligates to the adapter leaving a nick (underlined) to enable translation into the genomic region. The biotinylated thymidine (T) is highlighted
`in yellow.
`
`genome in an outward-facing orientation, reads which do
`not reach adapter sequence, an average size between the
`outward-facing paired reads matching that desired, no chimeric
`reads, and no GC bias. Generation of fragments of the desired
`size is dependent upon the genomic starting material being
`sufficiently intact, the method employed for fragmentation
`being reliable and the method (if any) used to size select for
`the desired fragment range being accurate and selective. In the
`case of the hybridisation/ligation/circularisation approach used
`within this paper, inward-facing paired reads with a small insert
`size are caused by the non-specific capture of non-biotinylated
`library fragments. This is in contrast to the Illumina v2 mate-
`pair method, in which biotinylated bases can insert into
`nicked sites of damaged DNA; the resulting molecules can be
`captured on the streptavidin-coated beads, and yield inward-
`facing paired reads. The diversity of the final mate-pair library
`is dependent upon a number of factors, including the amount
`of adapter-ligated DNA going into the circularisation reaction,
`the circularisation yield (which can be determined after Plasmid
`Safe® digestion (Epicentre; Madison, WI, USA) of remaining
`linear DNA) and the number of PCR cycles.
`
`Mate-Pair Method Comparison
`We compared the method presented in this paper with the
`Illumina v2, Nextera, SOLiD 4, SOLiD 5500 and cre-lox methods
`
`of circularisation. Both Nextera protocols, using 1 µg input
`with an AMPure (Beckman Coulter; Brea, CA, USA) bead size
`selection, and using a 4 µg input with a gel size selection were
`carried out. Each experiment was performed in duplicate.
`Where possible, variables such as fragmentation, size selection,
`circularisation input amount and Illumina sequencing preparation
`were standardised in order to aid a fair comparison. Each
`method was evaluated with the preparation and sequencing of
`both E.Coli (50.8% GC) and Plasmodium falciparum 3D7 (19.3%
`GC) genomes.
`
`Circularisation
`In theory, any improvement in the yield of circularisation
`should directly increase the complexity of the final mate-
`pair library. The yield of each circularisation reaction from a
`normalised input of 400 ng material is presented in Table 1. For
`both genomes, the Sanger method demonstrated a ~1.7-fold
`improvement in yield above the SOLiD 5500 method and a ~4-
`fold improvement in yield above the SOLiD 4 adapter protocol.
`The estimated library size (defined as the total number of unique
`fragments (Table 2)) for the SOLiD 4 E.Coli libraries, reflects
`this positive correlation between circularisation yield and final
`library complexity. This correlation is also demonstrated with
`the SOLiD 5500 pf3D7 library; however, the estimated library
`size of the E.Coli SOLiD 5500 library is twice that of the Sanger
`
`12
`
`00003
`
`

`

`An improved approach to mate-paired library
`preparation for Illumina sequencing
`
`library, indicating the relationship between circularisation yield
`and final library diversity may be influenced by other factors. In
`the case of the SOLiD 4 method, extremely poor circularisation
`of pf3D7 (Table 1) led to an inability to produce a successful
`library. The Illumina v2 protocol demonstrated a ~1.3-fold
`improvement in circularisation yield above the Sanger method.
`However, sequencing metrics (Table 2) indicates this is at
`least partially due to the formation of chimeric circles by two
`unrelated fragments (~15% of mapped reads) and not genuine
`mate-pairs. Conversely, the Nextera protocol demonstrated an
`improvement of 1.7-2.9-fold in circularisation yield which is not
`as markedly related to an increase in chimeric reads (although
`this is still elevated at ~2.3%). Circularisation yields of the
`cre-lox libraries were undetectable by high sensitivity qubit
`and generated a ~5-fold lower estimated library size than the
`Sanger libraries.
`
`Mate-Pair Size
`Genomic DNA was mechanically sheared for each 3 Kb library
`with the exception of the Nextera libraries. These libraries
`underwent transposome mediated fragmentation and adapter
`tagging of genomic DNA. Transposome mediated fragmentation
`is dependent upon high quality and accurately quantified starting
`material. In our experience, the accurate quantification of “real
`life” samples is often difficult and inaccurate due to impurities,
`even with the use of fluorometric based methods specific for
`duplex DNA such as the Qubit dsDNA BR kit. Additionally, the
`GC content of the genome alters the fragmentation pattern and,
`despite the proportional scaling up of all reaction components,
`the 4 µg tagmentation of pf3D7 generated a ~2.7-fold larger
`mean fragment size than that of the 1 µg tagmentation. With
`the exception of the 1 µg Nextera libraries, all libraries were
`size selected using the Blue Pippin (Sage Science; Beverly, MA,
`USA) using conditions shown in supplementary Table 1 and 2,
`
`to target the peak size maximum as determined by the Agilent
`Bioanalyzer (Agilent; Santa Clara, CA, USA).
`
`Library quality
`Library quality statistics of reads mapped to the E.Coli and
`pf3D7 genomes are given in Table 2. Despite multiple attempts
`pf3D7 SOLiD 4 libraries failed to yield a final library. All methods
`yielded a high proportion (>75%) of mappable reads, with the
`exception of one E.Coli SOLiD 4 library (38%) of which only
`34% were proper pairs and the cre-lox libraries (59-67%), of
`which only 36-55% were proper pairs. All other libraries were
`70-87% proper pairs, the lower end being the Nextera libraries.
`Singleton reads ranged from 2 to 5%, except for E.coli Nextera
`libraries (~11%) and cre-lox libraries (8-24%). Inward facing
`reads were low for all libraries, the highest being the Sanger
`(0.8-1.4%) and Illumina v2 (0.9-1.8%) libraries. Although
`still low, the inward facing reads are likely due to insufficient
`removal of non-biotinylated material during the washing steps
`of these particular libraries. Duplicate rates ranged between
`0.2-12%, Nextera libraries generating the lowest values and
`cre-lox libraries the highest.
`Intermolecular circularisation may occur if two different
`fragments concatamerise or, in the case of the Sanger,
`SOLiD 4, SOLiD 5500 and cre-lox libraries, it is possible for
`two fragments to incorrectly ligate to each other in addition
`to ligating to the circularisation adapter. Either of these
`scenarios will result in an artefact of structural variation,
`chimeric reads. The presence of chimeric reads poses a
`major problem in the generation of mate-pair libraries and the
`elimination of these is highly desirable. For both genomes, the
`Sanger and cre-lox methods of circularisation generated the
`lowest number of chimeric reads (0.06/0.1% respectively for
`E.Coli and 0.7/0.3% respectively for pf3D7). Due to the poor
`performance of the cre-lox libraries and the high performance
`
`Table 1. E.coli/pf3D7 circularisation and final library yields. 400 ng material was used as input into each circularisation reaction; all libraries went through
`13 cycles of PCR.
`
`E.Coli
`
`P. falciparum 3D7
`
`Method
`
`Mean ±
`Stdev
`Output Post
`Plasmid Safe
`Digestion
`(ng)
`
`Mean ± Stdev
`Circularisation
`Yield (%)
`
`Mean ±
`Stdev Library
`Yield (pMol)
`
`Mean ±
`Stdev
`Library Yield
`(nmol/l)
`
`Mean ±
`Stdev
`Output Post
`Plasmid Safe
`Digestion
`(ng)
`
`Mean ± Stdev
`Circularisation
`Yield (%)
`
`Mean ± Stdev
`Library Yield
`(pMol)
`
`Mean ±
`Stdev
`Library
`Yield
`(nmol/l)
`
`Sanger
`
`37.2 ±3
`
`9.3 ±0.7
`
`0.028 ±0.001
`
`1.4 ±0.07
`
`41.7 ±10
`
`10.4 ±2
`
`0.0248 ±0.01
`
`1.2 ±0.5
`
`Nextera
`1ug
`
`Nextera
`4ug
`
`110 ±12
`
`27.6 ±3
`
`2.921 ±0.2
`
`146 ±10.4
`
`74.0 ±6
`
`18.5 ±1
`
`1.941 ±1
`
`97.1 ±55.0
`
`107.4 ±14
`
`26.9 ±3
`
`2.200 ±0.4
`
`110 ±18.8
`
`69.4 ±1
`
`17.3 ±0.2
`
`0.504 ±0.008
`
`25.2 ±0.4
`
`SOLiD5500
`
`22.5 ±0.6
`
`5.6 ±0.2
`
`0.052 ±0.005
`
`2.6 ±0.2
`
`23.0 ±0.1
`
`5.7 ±0.02
`
`0.012 ±0.0002
`
`0.59 ±0.008
`
`SOLiD4
`
`9.6 ±2
`
`2.4 ±0.4
`
`0.008 ±0.003
`
`0.4 ±0.2
`
`9.9 ±3
`
`Illuminav2
`
`51.3 ±17
`
`12.8 ±4
`
`0.112 ±0.001
`
`5.6 ±0.06
`
`48.0 ±0
`
`2.5 ±0.7
`
`12.0 ±0
`
`Fail
`
`Fail
`
`0.202 ±0.04
`
`10.1 ±2.1
`
`454
`
`ND
`
`NA
`
`0.010 ±0.003
`
`0.5 ±0.2
`
`ND
`
`NA
`
`0.003 ±0.00001
`
`0.15
`±0.0007
`
`13
`
`00004
`
`

`

`N. Park et al.
`
`2522,2913,3456
`
`2501,2906,3489
`
`1,666,050
`
`1,436,903
`
`2872,3191,3962
`
`106,831,975
`
`2874,3176,3647
`
`96,778,532
`
`4281(0.34)
`
`3455(0.35)
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`3199,3589,4189
`
`3216,3592,4188
`
`3,181,087
`
`3,119,007
`
`11066(1.17)
`
`11413(1.09)
`
`6695,7976,10,109
`
`221,719,888
`
`6777,8032,10,223
`
`217,591,242
`
`2079,3010,4549
`
`474,453,473
`
`2073,3086,4689
`
`596,244,158
`
`2593,2867,3293
`
`2614,2907,3332
`
`6,662,099
`
`7,092,996
`
`3072,3406,3985
`
`3062,3415,4000
`
`2506,3017,3893
`
`2527,3050,3889
`
`3167,3409,3808
`
`3128,3386,3808
`
`3198,3531,4063
`
`3249,3582,4137
`
`1,578,265
`
`1,022,158
`
`45,559,230
`
`46,305,416
`
`748,054
`
`4,697,911
`
`17,608,752
`
`13,748,108
`
`4094,4740,5336
`
`408,363,590
`
`2570,3222,4323
`
`231,344,363
`
`1862,2521,3570
`
`637,651,262
`
`1418,2023,2925
`
`532,093,793
`
`3148,3407,3818
`
`2496,2884,3414
`
`8,446,293
`
`6,721,919
`
`Quartiles
`Insert Size
`
`Size
`
`Estimated Library
`
`21267(2.3)
`
`40831(2.3)
`
`6919(0.67)
`
`4242(0.67)
`
`1421(0.1)
`
`1695(0.12)
`
`8379(0.71)
`
`762(0.06)
`
`844(0.05)
`
`109748(10.92)
`
`76244(9.82)
`
`2226(0.14)
`
`1606(0.12)
`
`1477532(15.89)
`
`194408(2.25)
`
`167792(1.79)
`
`1920467(15.91)
`
`348911(3.1)
`
`217956(1.79)
`
`38810(2.51)
`
`25111(2.45)
`
`-
`
`-
`
`41250(5.21)
`
`54131(6.04)
`
`4414(0.32)
`
`2025(0.22)
`
`1624(0.2)
`
`5040(0.33)
`
`33446(3.48)
`
`12540(2.13)
`
`102108(9.45)
`
`124106(11.98)
`
`167799(15.39)
`
`11938(1.2)
`
`260410(15.56)
`
`16788(1.24)
`
`10824(0.76)
`
`13334(0.99)
`
`36445(2.25)
`
`34923(2.22)
`
`37013(2.03)
`
`41127(2.25)
`
`27408(1.79)
`
`40488(7.75)
`
`74499(5.83)
`
`19212(1.82)
`
`20201(2)
`
`29072(2.11)
`
`23894(1.83)
`
`34952(2.24)
`
`33331(2.28)
`
`43832(3.81)
`
`90133(5.68)
`
`-
`
`-
`
`3202(0.32)
`
`3722(0.35)
`
`1994(0.13)
`
`1286(0.12)
`
`1310(0.14)
`
`2386(0.13)
`
`16178(1.42)
`
`10290(1.51)
`
` falciparum 3D7
`
`P.
`
`4722(0.29)
`
`6486(0.37)
`
`9832(0.9)
`
`14522(0.86)
`
`1362(0.1)
`
`2686(0.19)
`
`2024(0.17)
`
`1242(0.09)
`
`5704(0.35)
`
`3478(0.22)
`
`8822(0.48)
`
`6060(0.33)
`
`10770(0.8)
`
`17220(0.99)
`
`E.Coli
`
`Chimeras (%)
`
` Duplicates (%)
`
`Inward reads (%)
`
`1546478
`
`1326766
`
`9395990
`
`11251194(92.32)
`
`12186512
`
`1545406(85.01)
`
`1080721(66.96)
`
`1035672(59.66)
`
`1534005(91.36)
`
`1277513(88.73)
`
`1058409(89.03)
`
`1008034(74.73)
`
`1379340(84.36)
`
`1305714(82.32)
`
`1558622(84.70)
`
`1462202(79.28)
`
`1149143(85.18)
`
`1585574(91.22)
`
`-
`
`-
`
`1006516
`
`1068608
`
`1576742
`
`1046822
`
`944780
`
`1817908
`
`1135602
`
`682164
`
`1613924
`
`1735912
`
`1635128
`
`1840202
`
`1093040
`
`1679048
`
`1374586
`
`1439736
`
`1188802
`
`1348984
`
`1349072
`
`1586134
`
`1844424
`
`1738250
`
`454_2
`
`454_1
`
`Illuminav2_2
`
`Illuminav2_1
`
`SOLiD4_2
`
`SOLiD4_1
`
`SOLID5500_2
`
`SOLiD5500_1
`
`Nextera 4µg_2
`
`Nextera 4µg_1
`
`Nextera 1µg_2
`
`Nextera 1µg_1
`
`Sanger_2
`
`Sanger_1
`
`454_2
`
`454_1
`
`Illuminav2_2
`
`Illuminav2_1
`
`SOLiD4_2
`
`SOLiD4_1
`
`SOLID5500_2
`
`SOLiD5500_1
`
`Nextera 4µg_2
`
`Nextera 4µg_1
`
`Nextera 1µg_2
`
`Nextera 1µg_1
`
`Sanger_2
`
`Sanger_1
`
`Method
`
`All reads
`
`
`
`
`14
`
`between unrelated DNA molecules.
`Illumina MiSeq. Singletons are defined as an individual read which does not have a corresponding mate-pair. Chimeras are defined as incorrect mate-pairs formed during circularisation when ligation occurs
`Table 2. Post sequencing analysis metrics for 3kb E.Coli and pf3D7 mate-paired libraries prepared using commercial methods alongside the Sanger method. Libraries were indexed, multiplexed and sequenced on the
`
`137129(8.87)
`
`857836(55.47)
`
`1005179(65.00)
`
`111472(8.40)
`
`657592(49.56)
`
`776177(58.50)
`
`233605(2.49)
`
`8175824(87.01)
`
`8637064(91.92)
`
`304631(2.50)
`
`10652296(87.41)
`
`-
`
`-
`
`18314(1.82)
`
`18714(1.75)
`
`64545(4.09)
`
`43751(4.18)
`
`47012(4.98)
`
`89371(4.92)
`
`44706(3.94)
`
`31129(4.56)
`
`-
`
`-
`
`-
`
`-
`
`763222(75.83)
`
`791843(78.67)
`
`865328(80.98)
`
`895871(83.84)
`
`1310604(83.12)
`
`1389665(88.14)
`
`866396(82.76)
`
`920122(87.90)
`
`753156(79.72)
`
`811459(85.89)
`
`1434318(78.90)
`
`878902(77.40)
`
`960130(84.55)
`
`536014(78.58)
`
`588694(86.30)
`
`381225(23.62)
`
`686376(42.53)
`
`399508(23.01)
`
`621090(35.78)
`
`62561(5.72)
`
`97663(5.82)
`
`38687(2.81)
`
`41563(2.89)
`
`34991(2.94)
`
`45338(3.36)
`
`921338(84.29)
`
`998143(91.32)
`
`1416994(84.39)
`
`481644(35.04)
`
`522669(38.02)
`
`1225494(85.12)
`
`1015338(85.41)
`
`957798(71.00)
`
`186088(11.38)
`
`1169260(71.51)
`
`169252(10.67)
`
`1124720(70.91)
`
`205162(11.15)
`
`1318192(71.63)
`
`213900(11.60)
`
`1224990(66.42)
`
`75641(5.61)
`
`83148(4.78)
`
`1051838(77.97)
`
`1452332(83.55)
`
`(%)
`
`Singleton reads
`
`Proper pairs (%)
`
`(%)
`
`Mapped reads
`
`00005
`
`

`

`An improved approach to mate-paired library
`preparation for Illumina sequencing
`
`of the Sanger libraries in other quality metrics, the Sanger
`method of circularisation provides an attractive option in
`order to avoid chimeric reads.
`All Nextera libraries generated the most diverse libraries
`(highest estimated library size) for both genomes by far (Covaris
`Inc; Woburn, MA, USA). Its low requirement for input DNA
`(1-4 µg without/with gel size selection), high circularisation
`efficiency and good performance in other quality metrics make
`it a promising option for the generation of mate-pair libraries.
`However the 2% chimera rate, the requirement for high quality
`and accurately quantified starting material, as well as the
`sensitivity of transposome mediated fragmentation to GC
`content, are all limitations. Illumina v2 libraries generated the
`second highest level of diversity but its historical sensitivity to
`poor quality starting material, lack of internal adapter sequence
`and high chimeric rate are limitations.
`SOLiD 4 libraries generated good data if sufficient library
`was generated, but the method is limited by its efficiency of
`circularisation, leading to limited library diversity. E.Coli SOLiD
`5500 libraries performed very well, with library diversity second
`to that of the Illumina methods and achieved excellent scoring
`Figure 3
`in other quality metrics aside from a 0.7% rate of chimeras
`(compared to 0.06% for the Sanger libraries). However, the
`Plasmodium pf3D7 libraries did not perform quite as well as the
`E.Coli libraries, resulting in a higher duplicate rate and much
`reduced estimated library size. This indicates the 70°C heating
`and snap cooling of circularisation may have a detrimental
`effect for low GC genomes. We postulate this method of
`circularisation may also pose problems for partially degraded
`samples.
`In our method, biotin is incorporated into the mate-pair
`junction via an adapter sequence and cannot insert into nicked
`sites of damaged DNA, as may be the case for Illumina v2
`libraries. Mechanical shearing is not affected by degraded
`material to the extent of Nextera fragmentation. We therefore
`consider our method to be preferable for partially degraded
`(<40kb but >5kb) starting material.
`The GC profiles of the E.Coli and pf3D7 mate-pair libraries
`are shown in Figure 3. Each method of circularisation shows
`minimal bias towards GC content which may be attributed to the
`standardised use of KAPA HiFi polymerase (Kapa Biosystems,
`Woburn, MA, USA) for all library amplifications. [11]
`
`Mus musculus Sanger Mate-Pair Libraries
`In order to demonstrate the robustness and reproducibility
`of our mate-pair method we prepared 3 Kb and 6 Kb mate-
`paired libraries made from seven mouse strains. Genomic DNA
`was sheared with a red miniTUBE for each 3 Kb library and a
`g-TUBE (Covaris Inc.) for each 6 Kb library. All libraries were
`size selected using the Blue Pippin to target the peak size
`maximum as determined by the Bioanalyzer. Fragment sizes
`after size selection and post mapping of reversed reads are
`listed in Table 3, and an example trace of each library size is
`presented in Figure 4. We noted an average discrepancy of
`1.5 Kb for the 3 Kb libraries and 1.2 Kb for the 6 Kb libraries
`
`between the peak fragment size observed on the Agilent
`Bioanalyzer and that observed after mapping.
`
`Library diversity
`Library diversity of the mate-pair libraries is dependent on the
`amount of adapter-ligated DNA going into the circularisation
`reaction, the circularisation yield and the number of PCR cycles
`used. Starting with 10 µg/20 µg of Mus musculus genomic
`DNA, we obtained between 0.7-1.8 µg/1.7-4.4 µg of adapter-
`ligated material for input into the 3 Kb/6 Kb circularisation
`reaction. This recovery could be increased or decreased, by
`decreasing or increasing respectively the width of the size
`range collected by the Blue Pippin. The Blue Pippin size-range
`parameters (Supplementary Table 1 and 2) were selected in
`order to balance good recovery at the cost of an increased size
`distribution. Table 4 shows that the circularisation yield for our
`method ranged from 11.2-17.1% (3 Kb) and 7.3-21.3% (6 Kb)
`which generated a post-PCR final library yield of between
`0.088 and 1.5 pMol (3 Kb) and 0.091 and 0.64 pMol (6 Kb). In
`addition to varying input amount for the circularisation reaction
`
`a)
`
`b)
`
`Figure 3. GC profile analysis of E.Coli and pf3D7 sequence data. The GC
`content distribution for each method: a) E.Coli and b) pf3D7 is
`shown alongside the theoretical data for the reference genome
`(red trace).
`
`15
`
`00006
`
`

`

`N. Park et al.
`
`Table 3. Library quality metrics for 3 Kb and 6 Kb Mus musculus mate-pair libraries, as determined after size selection (bioanalyzer 7500 kit) and mapping
`of reversed reads post sequencing. All libraries underwent 13 cycles of PCR.
`
`Fragment
`length
`Post Size
`Selection
`
`Fragment
`length
`Post
`Mapping
`
`Difference
`
`Circularisation
`Input (ng)
`
`Output Post
`Plasmid Safe
`Digestion (ng)
`
`Circularisation
`Yield (%)
`
`Library
`Yield
`(pMol)
`
`Library
`Yield
`(nmol/l)
`
`6.1
`5.1
`5.9
`6.2
`3.9
`4.5
`5.3
`5.3 ±0.9
`
`6.2
`6.3
`6.5
`6.5
`6.5
`6.9
`7.2
`6.6 ±0.3
`
`4.6
`3.2
`4.5
`4.6
`3
`3.1
`3.2
`3.7 ±0.8
`
`6.2
`6.3
`6.5
`6.5
`6.5
`6.9
`7.2
`6.6 ±0.3
`
`1.5
`1.9
`1.4
`1.6
`0.9
`1.4
`2.1
`1.5 ±0.4
`
`1.1
`0.7
`1.2
`0.9
`1.6
`1.1
`1.7
`1.2 ±0.4
`
`725
`840
`870
`1065
`1770
`1578
`1185
`1147.6 ±393.9
`
`2500
`2200
`3700
`1700
`4400
`1700
`2900
`2728.6 ±1017.7
`
`3 kb
`
`82
`128
`112
`119
`302
`188
`161
`156 ±72.9
`6 kb
`
`280
`413
`790
`221
`497
`124
`258
`369 ±222.8
`
`11.2
`15.2
`12.9
`11.2
`17.1
`11.9
`13.6
`13.3 ±2.2
`
`11.2
`18.8
`21.3
`13
`11.3
`7.3
`8.9
`13.1 ±5.1
`
`0.088
`0.239
`0.091
`0.111
`1.504
`0.937
`0.39
`0.48 ±0.5
`
`0.272
`0.236
`0.638
`0.09
`0.177
`0.088
`0.161
`0.24 ±0.2
`
`4.4
`12
`4.5
`5.6
`75.2
`46.9
`19.5
`24 ±27.1
`
`13.6
`11.8
`43.1
`4.5
`8.8
`4.4
`8.1
`13.5 ±13.5
`
`Sample
`AKR/J
`SPRET/EiJ
`PWK/PhJ
`C57BL/6NJ
`NOD/ShiLtJ
`WSB/EiJ
`FVB/NJ
`Mean ± stdev
`
`AKR/J
`SPRET/EiJ
`PWK/PhJ
`C57BL/6NJ
`NOD/ShiLtJ
`WSB/EiJ
`FVB/NJ
`Mean ± stdev
`Figure 4
`a)
`
`b)
`
`Figure 4. Example Mus musculus mate-pair library post size-selection bioanalyzer traces, alongside reverse-mapped library insert size measured after
`sequencing: a) 3 Kb and b) 6 Kb. Each library maps approximately 1 Kb smaller than the bioanalyzer peak maximum.
`
`16
`
`00007
`
`

`

`An improved approach to mate-paired library
`preparation for Illumina sequencing
`
`and fluctuating circularisation yield, the post-PCR final library
`yield is subject to random variation in sample loss during each
`reaction clean-up.
`
`Library quality
`Library quality statistics of reads mapped to the Mus musculus
`genome are given in Table 4. A high proportion of reads map (86-
`96%) of which the majority are proper pairs (70-91%). Chimeric
`reads were not observed to any significant level (below 1%)
`and may be a result of actual genomic alterations between
`the sequenced strains and the C57BL/6J reference genome
`(NCBIm37), and not due to true chimeras [9]. The majority of
`sequenced libraries have a high estimated library size and a
`low duplicate rate; those which have worse statist

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket