throbber
(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT)
`(19) World Intellectual Property
`1111111111111111 IIIIII IIIII IIIII IIIII IIII I II Ill lllll lllll lllll lllll lllll 11111111111111111111111
`Organization
`International Bureau
`(10) International Publication Number
`WO 2013/142389 Al
`
`~ ~
`
`(43) International Publication Date
`26 September 2013 (26.09.2013) WIPO I PCT
`
`(51) International Patent Classification:
`C04B 20/04 (2006.01)
`
`(21) International Application Number:
`
`(22) International Filing Date:
`
`(25) Filing Language:
`
`(26) Publication Language:
`
`PCT /US2013/032665
`
`15 March 2013 (15.03.2013)
`
`English
`
`English
`
`(30) Priority Data:
`61/613,413
`61/625,623
`61/625,319
`
`20 March 2012 (20.03.2012)
`17 April 2012 (17.04.2012)
`17 April 2012 (17.04.2012)
`
`us
`us
`us
`UNIVERSITY OF WASHINGTON
`(71) Applicant:
`THROUGH ITS CENTER FOR COMMERCIALIZA(cid:173)
`TION [US/US]; 4311 11th Avenue NE, Seattle, WA
`98105-4608 (US).
`
`(72) Inventors: SCHMITT, Michael; Seattle, WA 98105 (US).
`(US). LOEB,
`SALK, Jesse; Seattle, WA 98105
`Lawrence, A.; Bellevue, WA (US).
`
`(74) Agent: DUEPPEN, Lara, J.; Perkins Coie LLP, P.O. Box
`1208, Seattle, WA 98111-1208 (US).
`
`(81) Designated States (unless otherwise indicated, for every
`kind of national protection available): AE, AG, AL, AM,
`AO,AT,AU,AZ,BA,BB,BG,BH,BN,BR,BW,BY,
`BZ,CA,CH,CL,CN,CO,CR,CU,CZ,DE,DK,DM,
`DO, DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT,
`HN, HR, HU, ID, IL, IN, IS, JP, KE, KG, KM, KN, KP,
`KR, KZ, LA, LC, LK, LR, LS, LT, LU, LY, MA, MD,
`ME, MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI,
`NO, NZ, OM, PA, PE, PG, PH, PL, PT, QA, RO, RS, RU,
`RW, SC, SD, SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ,
`TM, TN, TR, TT, TZ, VA, VG, US, UZ, VC, VN,ZA,
`ZM,ZW.
`(84) Designated States (unless otherwise indicated, for every
`kind of regional protection available): ARIPO (BW, GH,
`GM, KE, LR, LS, MW, MZ, NA, RW, SD, SL, SZ, TZ,
`VG, ZM, ZW), Eurasian (AM, AZ, BY, KG, KZ, RU, TJ,
`TM), European (AL, AT, BE, BG, CH, CY, CZ, DE, DK,
`EE, ES, FI, FR, GB, GR, HR, HU, IE, IS, IT, LT, LU, LV,
`MC, MK, MT, NL, NO, PL, PT, RO, RS, SE, SI, SK, SM,
`TR), OAPI (BF, BJ, CF, CG, CI, CM, GA, GN, GQ, GW,
`ML, MR, NE, SN, TD, TG).
`
`[Continued on next page]
`;;;;;;;;;;;;;; ----------------------------------------------
`(54) Title: METHODS OF LOWERING THE ERROR RA TE OF MASSIVELY PARALLEL DNA SEQUENCING USING DU -
`PLEX CONSENSUS SEQUENCING
`
`;;;;;;;;;;;;;;
`
`---;;;;;;;;;;;;;;
`;;;;;;;;;;;;;; -
`-----
`---
`
`(57) Abstract: Next Generation DNA sequencing promises to revolutionize
`clinical medicine and basic research. However, while this technology has
`the capacity to generate hundreds of billions of nucleotides of DNA se(cid:173)
`quence in a single experiment, the error rate of approximately I% results in
`hundreds of millions of sequencing mistakes. These scattered errors can be
`tolerated in some applications but become extremely problematic when
`"deep sequencing" genetically heterogeneous mixtures, such as tumors or
`mixed microbial populations. To overcome limitations in sequencing accur(cid:173)
`acy, a method Duplex Consensus Sequencing (DCS) is provided. This ap(cid:173)
`proach greatly reduces errors by independently tagging and sequencing each
`of the two strands of a DNA duplex. As the two strands are complementary,
`true mutations are found at the same position in both strands. In contrast,
`PCR or sequencing errors will result in errors in only one strand.
`
`Figure 1
`
`SMI seauerices
`
`I-tailed UNA tragrnent
`Arrn·'--------__:_~(cid:173)
`Arm;:i:"i"'~;;;"
`
`Lig9!1or &.
`•,Le s~lri1d1un
`
`>a-::::::::::::::::i~
`
`PC9 w t:-i. +icW·C'ell
`adar:tor twuencci
`
`,,.
`~SMlfarT1lly
`e11111-11:111i111:11n11n11n1-,m1,;-1,:i-:,:1:
`1111-11:111:111:11::11::11::1_,,m::;i,,.,,ri,:,
`l!li!-llilll:111:11::1l:illiilllllll\lllll!)ijll•i•l<l•1
`
`E:!,SMl!amily
`e tllll-llil!l:lllilli:ll:ilii:l-li:11,il·i•II::,:, 1:;1
`Hlll-111111:1111111:ll:llli:l-11:~lli•:·11;:1,
`lllll-lilll!il!liil!il!iilllil-11~:lh·I·~-:,
`
`;, Capture target regions
`.,
`
`-;: Sacond round PGR
`
`,:; Massively p0rail1:1! ssqu1:1ncir1g
`
`----;;;;;;;;;;;;;;
`
`!!!!!!!!
`;;;;;;;;;;;;;;
`!!!!!!!!
`
`00001
`
`EX1009
`
`

`

`WO 2013 /14 23 89 Al 1111111111111111 IIIIII IIIII IIIII IIIII IIII I II Ill lllll lllll lllll lllll lllll 11111111111111111111111
`
`Published:
`
`-
`
`with international search report (Art. 21 (3))
`
`-
`
`with sequence listing part of description (Rule 5.2(a))
`
`00002
`
`

`

`WO 2013/142389
`
`PCT/0S2013/032665
`
`METHODS OF LOWERING THE ERROR RATE OF MASSIVELY PARALLEL DNA
`
`SEQUENCING USING DUPLEX CONSENSUS SEQUENCING
`
`PRIORITY CLAIM
`
`[0001]
`
`This application claims priority to U.S. Provisional Patent Application No.
`
`61/613,413, filed March 20, 2012; U.S. Provisional Patent Application No. 61/625,623,
`
`filed April 17, 2012; and U.S. Provisional Patent Application No. 61/625,319, filed April
`
`17, 2012; the subject matter of all of which are hereby incorporated by reference as if
`
`fully set forth herein.
`
`STATEMENT OF GOVERNMENT INTEREST
`
`[0002]
`
`The present invention was made with government support under Grant
`
`Nos. RO1 CA115802 and RO1 CA102029 awarded by the National Institutes of Health.
`
`The Government has certain rights in the invention.
`
`BACKGROUND
`
`[0003]
`
`The advent of massively parallel DNA sequencing has ushered in a new
`
`era of genomic exploration by making simultaneous genotyping of hundreds of billions
`
`of base-pairs possible at small fraction of the time and cost of traditional Sanger
`
`methods [1 ]. Because these technologies digitally tabulate the sequence of many
`
`individual DNA fragments, unlike conventional techniques which simply report the
`
`average genotype of an aggregate collection of molecules, they offer the unique ability
`
`to detect minor variants within heterogeneous mixtures [2].
`
`[0004]
`
`This concept of "deep sequencing" has been implemented in a variety
`
`fields including metagenomics [3, 4], paleogenomics [5], forensics [6], and human
`
`genetics [7, 8] to disentangle subpopulations in complex biological samples. Clinical
`
`applications, such prenatal screening for fetal aneuploidy [9, 1 0], early detection of
`
`cancer [11] and monitoring its response to therapy [12, 13] with nucleic acid-based
`
`serum biomarkers, are rapidly being developed. Exceptional diversity within microbial
`
`[14, 15] viral [16-18] and tumor cell populations [19, 20] has been characterized through
`
`next-generation sequencing, and many low-frequency, drug-resistant variants of
`
`00003
`
`

`

`WO 2013/142389
`
`PCT/0S2013/032665
`
`therapeutic importance have been so identified [12, 21, 22]. Previously unappreciated
`
`intra-organismal mosasism in both the nuclear [23] and mitochondrial [24, 25] genome
`
`has been revealed by these technologies, and such somatic heterogeneity, along with
`
`that arising within the adaptive immune system [13], may be an important factor in
`
`phenotypic variability of disease.
`
`[0005]
`
`Deep sequencing, however, has limitations. Although, in theory, DNA
`
`subpopulations of any size should be detectable when deep sequencing a sufficient
`
`number of molecules, a practical limit of detection is imposed by errors introduced
`
`during sample preparation and sequencing. PCR amplification of heterogeneous
`
`mixtures can result in population skewing due to stoichastic and non-stoichastic
`
`amplification biases and lead to over- or under-representation of particular variants [26].
`
`Polymerase mistakes during pre-amplification generate point mutations resulting from
`
`base mis-incorporations and rearrangements due to template switching
`
`[26, 27].
`
`Combined with the additional errors that arise during cluster amplification, cycle
`
`sequencing and image analysis, approximately 1 % of bases are incorrectly identified,
`
`depending on the specific platform and sequence context [2, 28]. This background level
`
`of artifactual heterogeneity establishes a limit below which the presence of true rare
`
`variants is obscured [29].
`
`[0006]
`
`A variety of improvements at the level of biochemistry [30-32] and data
`
`processing [19, 21, 28, 32, 33] have been developed to improve sequencing accuracy.
`
`The ability to resolve subpopulations below 0.1 %, however, has remained elusive.
`
`Although several groups have attempted to increase sensitivity of sequencing, several
`
`limitations remain. For example techniques whereby DNA fragments to be sequenced
`
`are each uniquely tagged [34, 35] prior to amplification [36-41] have been reported.
`
`Because all amplicons derived from a particular starting molecule will bear its specific
`
`tag, any variation in the sequence or copy number of identically tagged sequencing
`
`reads can be discounted as technical error. This approach has been used to improve
`
`counting accuracy of DNA [38, 39, 41] and RNA templates [37, 38, 40] and to correct
`
`base errors arising during PCR or sequencing [36, 37, 39]. Kinde et. al. reported a
`
`reduction in error frequency of approximately 20-fold with a tagging method that is
`
`based on labeling single-stranded DNA fragments with a primer containing a 14 bp
`
`-2-
`
`00004
`
`

`

`WO 2013/142389
`
`PCT/0S2013/032665
`
`degenerate sequence. This allowed for an observed mutation frequency of -0.001 %
`
`mutations/bp in normal human genomic DNA [36]. Nevertheless, a number of highly
`
`sensitive genetic assays have indicated that the true mutation frequency in normal cells
`
`is likely to be far lower, with estimates of per-nucleotide mutation frequencies generally
`ranging from 10-9 to 10-11 [42]. Thus, the mutations seen in normal human genomic
`
`DNA by Kinde et al. are likely the result of significant technical artifacts.
`
`[0007]
`
`Traditionally, next-generation sequencing platforms rely upon generation
`
`of sequence data from a single strand of DNA. As a consequence, artifactual mutations
`
`introduced during the initial rounds of PCR amplification are undetectable as errors -
`
`even with tagging techniques - if the base change is propagated to all subsequent PCR
`
`duplicates. Several types of DNA damage are highly mutagenic and may lead to this
`
`scenario. Spontaneous DNA damage arising from normal metabolic processes results
`
`in thousands of damaging events per cell per day [43].
`
`In addition to damage from
`
`oxidative cellular processes, further DNA damage is generated ex vivo during tissue
`
`processing and DNA extraction [44]. These damage events can result in frequent
`
`copying errors by DNA polymerases: for example a common DNA lesion arising from
`
`oxidative damage, 8-oxo-guanine, has the propensity to incorrectly pair with adenine
`
`during complementary strand extension with an overall efficiency greater than that of
`
`correct pairing with cytosine, and thus can contribute a large frequency of artifactual
`
`G~ T mutations [45]. Likewise, deamination of cytosine to form uracil is a particularly
`
`common event which leads to the inappropriate insertion of adenine during PCR, thus
`producing artifactual c~ T mutations with a frequency approaching 100% [46].
`
`[0008]
`
`It would be desirable to develop an approach for tag-based error
`
`correction, which reduces or eliminates artifactual mutations arising from DNA damage,
`
`PCR errors, and sequencing errors; allows rare variants in heterogeneous populations
`
`to be detected with unprecedented sensitivity; and which capitalizes on the redundant
`
`information stored in complexed double-stranded DNA.
`
`SUMMARY
`
`[0009]
`
`In one embodiment, a single molecule identifier (SMI) adaptor molecule
`
`for use in sequencing a double-stranded target nucleic acid molecule is provided. Said
`
`-3-
`
`00005
`
`

`

`WO 2013/142389
`
`PCT/0S2013/032665
`
`SMI adaptor molecule includes a single molecule identifier (SMI) sequence which
`
`comprises a degenerate or semi-degenerate DNA sequence; and an SMI ligation
`
`adaptor that allows the SMI adaptor molecule to be ligated to the double-stranded target
`
`nucleic acid sequence. The SMI sequence may be single-stranded or double-stranded.
`
`In some embodiments, the double-stranded target nucleic acid molecule is a double(cid:173)
`
`stranded DNA or RNA molecule.
`
`[0010]
`
`In another embodiment, a method of obtaining the sequence of a double-
`
`stranded target nucleic acid is provided (also known as Duplex Consensus Sequencing
`
`or DCS) is provided. Such a method may include steps of ligating a double-stranded
`
`target nucleic acid molecule to at least one SMI adaptor molecule to form a double(cid:173)
`
`stranded SMl-target nucleic acid complex; amplifying the double-stranded SMl-target
`
`nucleic acid complex, resulting in a set of amplified SMl-target nucleic acid products;
`
`and sequencing the amplified SMl-target nucleic acid products.
`
`[0011]
`
`In some embodiments, the method may additionally include generating an
`
`error-corrected double-stranded consensus sequence by (i) grouping the sequenced
`
`SM I-target nucleic acid products into families of paired target nucleic acid strands based
`
`on a common set of SMI sequences; and (ii) removing paired target nucleic acid strands
`
`having one or more nucleotide positions where the paired target nucleic acid strands
`
`are non-complementary (or alternatively removing individual nucleotide positions in
`
`cases where the sequence at the nucleotide position under consideration disagrees
`
`among the two strands). In further embodiments, the method confirms the presence of
`
`a true mutation by (i) identifying a mutation present in the paired target nucleic acid
`
`strands having one or more nucleotide positions that disagree; (ii) comparing the
`
`mutation present in the paired target nucleic acid strands to the error corrected double(cid:173)
`
`stranded consensus sequence; and (iii) confirming the presence of a true mutation
`
`when the mutation is present on both of the target nucleic acid strands and appears in
`
`all members of a paired target nucleic acid family.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0012]
`
`Figure 1 illustrates an overview of Duplex Consensus Sequencing.
`
`Sheared double-stranded DNA that has been end-repaired and T-tailed is combined
`
`-4-
`
`00006
`
`

`

`WO 2013/142389
`
`PCT/0S2013/032665
`
`with A-tailed SMI adaptors and ligated according to one embodiment. Because every
`
`adaptor contains a unique, double-stranded, complementary n-mer random tag on each
`
`end (n-mer = 12 bp according to one embodiment), every DNA fragment becomes
`
`labeled with two distinct SMI sequences (arbitrarily designated a and 13
`
`in the single
`
`capture event shown). After size-selecting for appropriate length fragments, PCR
`
`amplification with primers containing lllumina flow-cell-compatible tails is carried out to
`
`generate families of PCR duplicates. By virtue of the asymmetric nature of adapted
`
`fragments, two types of PCR products are produced from each capture event. Those
`
`derived from one strand will have the a SMI sequence adjacent to flow-cell sequence 1
`
`and the 13 SMI sequence adjacent to flow cell sequence 2. PCR products originating
`
`from the complementary strand are labeled reciprocally.
`
`[0013]
`
`Figure 2 illustrates Single Molecule Identifier (SMI) adaptor synthesis
`
`according to one embodiment. Oligonucleotides are annealed and the complement of
`
`the degenerate lower arm sequence (N's) plus adjacent fixed bases is produced by
`
`polymerase extension of the upper strand in the presence of all four dNTPs. After
`
`reaction cleanup, complete adaptor A-tailing is ensured by extended incubation with
`
`polymerase and dATP.
`
`[0014]
`
`Figure 3
`
`illustrates error correction
`
`through Duplex Consensus
`
`Sequencing (DCS) analysis according to one embodiment.
`
`(a-c) shows sequence
`
`reads (brown) sharing a unique set of SMI tags are grouped into paired families with
`members having strand identifiers in either the al3 or l3a orientation. Each family pair
`reflects one double-stranded DNA fragment.
`(a) shows mutations (spots) present in
`
`only one or a few family members representing sequencing mistakes or PCR-introduced
`
`errors occurring late in amplification.
`
`(b) shows mutations occurring in many or all
`
`members of one family in a pair representing mutations scored on only one of the two
`
`strands, which can be due to PCR errors arising during the first round of amplification
`
`such as might occur when copying across sites of mutagenic DNA damage. (c) shows
`
`true mutations (* arrow) present on both strands of a captured fragment appear in all
`
`members of a family pair. While artifactual mutations may co-occur in a family pair with
`
`a true mutation, these can be independently identified and discounted when producing
`
`(d) an error-corrected consensus sequence (i.e., single stranded consensus sequence)
`
`-5-
`
`00007
`
`

`

`WO 2013/142389
`
`PCT/0S2013/032665
`
`(+ arrow) for each duplex.
`
`(e) shows consensus sequences from all independently
`
`captured, randomly sheared fragments containing a particular genomic site are
`
`identified and (f) compared to determine the frequency of genetic variants at this locus
`
`within the sampled population.
`
`[0015]
`
`Figure 4 illustrates an example of how a SMI sequence with n-mers of 4
`
`nucleotides in length (4-mers) are read by Duplex Consensus Sequencing (DCS)
`
`according to some embodiments. (A) shows the 4-mers with the PCR primer binding
`
`sites (or flow cell sequences) 1 and 2 indicated at each end.
`
`(B) shows the same
`
`molecules as in (A) but with the strands separated and the lower strand now written in
`
`the 5'-3' direction. When these molecules are amplified with PCR and sequenced, they
`
`will yield the following sequence reads: The top strand will give a read 1 file of TAAC--(cid:173)
`
`and a read 2 file of GCCA---. Combining the read 1 and read 2 tags will give
`
`TAACCGGA as the SMI for the top strand. The bottom strand will give a read 1 file of
`
`CGGA---- and a read 2 file of TAAC---. Combining the read 1 and read 2 tags will give
`
`CGGATAAC as the SMI for the bottom strand. (C) illustrates the orientation of paired
`
`strand mutations in DCS.
`
`In the initial DNA duplex shown in Figures 4A and 4B, a
`
`mutation "x" (which is paired to a complementary nucleotide "y") is shown on the left
`
`side of the DNA duplex. The "x" will appear in read 1, and the complementary mutation
`
`on the opposite strand, "y," will appear in read 2. Specifically, this would appear as "x"
`
`in both read 1 and read 2 data, because "y" in read 2 is read out as "x" by the
`
`sequencer owing to the nature of the sequencing primers, which generate the
`
`complementary sequence during read 2.
`
`[0016]
`
`Figure 5 illustrates duplex sequencing of human mitochondrial DNA. (A)
`
`Overall mutation frequency as measured by a standard sequencing approach, SSCS,
`
`and DCS. (B) Pattern of mutation in human mitochondrial DNA by a standard
`
`sequencing approach. The mutation frequency (vertical axis) is plotted for every position
`
`in the -16-kb mitochondrial genome. Due to the substantial background of technical
`
`error, no obvious mutational pattern is discernible by this method. (C) DCS analysis
`
`eliminates sequencing artifacts and reveals the true distribution of mitochondrial
`
`mutations to include a striking excess adjacent to the mtDNA origin of replication. (D)
`
`-6-
`
`00008
`
`

`

`WO 2013/142389
`
`PCT/0S2013/032665
`
`SSCS analysis yields a large excess of G~ T mutations relative to complementary c~A
`
`mutations, consistent with artifacts from damaged-induced 8-oxo-G lesions during PCR.
`
`All significant (P < 0.05) differences between paired reciprocal mutation frequencies are
`
`noted. (E) DCS analysis removes the SSCS strand bias and reveals the true mtDNA
`
`mutational spectrum to be characterized by an excess of transitions.
`
`[0017]
`
`Figure 6 shows
`
`that consensus sequencing
`
`removes artifactual
`
`sequencing errors as compared to Raw Reads. Duplex Consensus Sequencing (DCS)
`
`results in an approximately equal number of mutations as the reference and single
`
`strand consensus sequencing (SSCS) .
`
`[0018]
`
`Figure 7 illustrates duplex sequencing of M13mp2 DNA. (A) Single-strand
`consensus sequences (SSCSs) reveal a large excess of G~Nc~ T and G~ T/C~A
`mutations, whereas duplex consensus sequences (DCSs) yield a balanced spectrum.
`
`Mutation frequencies are grouped into reciprocal mispairs, as DCS analysis only scores
`
`mutations present in both strands of duplex DNA. All significant (P < 0.05) differences
`
`between DCS analysis and
`
`the
`
`literature
`
`reference values are noted.
`
`(B)
`
`Complementary types of mutations should occur at approximately equal frequencies
`
`within a DNA fragment population derived from duplex molecules. However, SSCS
`
`analysis yields a 15-fold excess of G~ T mutations relative to c~A mutations and an
`
`11-fold excess of c~ T mutations relative to G~A mutations. All significant (P < 0.05)
`
`differences between paired reciprocal mutation frequencies are noted.
`
`[0019]
`
`Figure 8 shows the effect of DNA damage on the mutation spectrum. DNA
`
`damage was induced by incubating purified M13mp2 DNA with hydrogen peroxide and
`
`FeSO4. (A) SSCS analysis reveals a further elevation from baseline of G~ T mutations,
`
`indicating these events to be the artifactual consequence of nucleotide oxidation. All
`
`significant (P < 0.05) changes from baseline mutation frequencies are noted. (B)
`
`Induced DNA damage had no effect on the overall frequency or spectrum of DCS
`
`mutations.
`
`[0020]
`
`Figure 9 shows duplex sequencing results in accurate recovery of spiked-
`
`control mutations. A series of variants of M13mp2 DNA, each harboring a known
`
`single-nucleotide substitution, were mixed in together at known ratios and the mixture
`
`-7-
`
`00009
`
`

`

`WO 2013/142389
`
`PCT/0S2013/032665
`
`was sequenced to -20,000-fold final depth. Standard sequencing analysis cannot
`
`accurately distinguish mutants present at a ratio of less than 1/100, because artifactural
`
`mutations occurring at every position obscure the presence of less abundant true
`
`mutations, rendering apparent recovery greater than 100%. Duplex consensus
`
`sequences, in contrast, accurately identify spiked-in mutations down to the lowest
`
`tested ratio of 1/10,000.
`
`[0021]
`
`Figure 10 is a Python Code that may used to carry out methods described
`
`herein according to one embodiment.
`
`DETAILED DESCRIPTION
`
`[0022]
`
`Single molecule identifier adaptors and methods for their use are provided
`
`herein. According to the embodiments described herein, a single molecule identifier
`
`(SMI) adaptor molecule is provided. Said SMI adaptor molecule is double stranded,
`
`and may include a single molecule identifier (SMI) sequence, and an SMI ligation
`
`adaptor (Figure 2). Optionally, the SMI adaptor molecule further includes at least two
`
`PCR primer binding sites, at least two sequencing primer binding sites, or both.
`
`[0023]
`
`The SMI adaptor molecule may form a "Y-shape" or a "hairpin shape." In
`
`some embodiments, the SMI adaptor molecule is a "Y-shaped" adaptor, which allows
`
`both strands to be independently amplified by a PCR method prior to sequencing
`
`because both the top and bottom strands have binding sites for PCR primers FC1 and
`
`FC2 as shown in the examples below. A schematic of a Y-shaped SMI adaptor
`
`molecule is also shown in Figure 2. A Y-shaped SMI adaptor requires successful
`
`amplification and recovery of both strands of the SMI adaptor molecule.
`
`In one
`
`embodiment, a modification that would simplify consistent recovery of both strands
`
`entails ligation of a Y-shaped SMI adaptor molecule to one end of a DNA duplex
`
`molecule, and ligation of a "U-shaped" linker to the other end of the molecule. PCR
`
`amplification of the hairpin-shaped product will then yield a linear fragment with flow cell
`
`sequences on either end. Distinct PCR primer binding sites (or flow cell sequences
`
`FC1 and FC2) will flank the DNA sequence corresponding to each of the two SMI
`
`adaptor molecule strands, and a given sequence seen in Read 1 will then have the
`
`sequence corresponding to the complementary DNA duplex strand seen in Read 2.
`
`-8-
`
`00010
`
`

`

`WO 2013/142389
`
`PCT/0S2013/032665
`
`Mutations are scored only if they are seen on both ends of the molecule (corresponding
`
`to each strand of the original double-stranded fragment), i.e. at the same position in
`
`both Read 1 and Read 2. This design may be accomplished as described in the
`
`examples relating to double stranded SMI sequence tags.
`
`[0024]
`
`In other embodiments, the SMI adaptor molecule is a "hairpin" shaped (or
`
`"U-shaped") adaptor. A hairpin DNA product can be used for error correction, as this
`
`product contains both of the two DNA strands. Such an approach allows for reduction
`
`of a given sequencing error rate N to a lower rate of N*N*(1/3), as independent
`
`sequencing errors would need to occur on both strands, and the same error among all
`
`three possible base substitutions would need to occur on both strands. For example,
`
`the error rate of 1/100 in the case of lllumina sequencing [32] would be reduced to
`
`(1/100)*(1/100)*(1/3) = 1/30,000.
`
`[0025]
`
`An additional, more remarkable reduction in errors can be obtained by
`
`inclusion of a single-stranded SMI in either the hairpin adaptor or the "Y-shaped"
`
`adaptor will also function to label both of the two DNA strands. Amplification of hairpin(cid:173)
`
`shaped DNA may be difficult as the polymerase must synthesize through a product
`
`containing significant regions of self-complementarity, however, amplification of hairpin(cid:173)
`
`shaped structures has already been established in the technique of hairpin PCR, as
`
`described below. Amplification using hairpin PCR is further described in detail in U.S.
`
`Patent No. 7,452,699, the subject matter of which is hereby incorporated by reference
`
`as if fully set forth herein.
`
`[0026]
`
`According to the embodiments described herein, the SMI sequence (or
`
`"tag") may be a double-stranded, complementary SMI sequence or a single-stranded
`
`SMI sequence.
`
`In some embodiments, the SMI adaptor molecule includes an SMI
`
`sequence (or "tag") of nucleotides that is degenerate or semi-degenerate.
`
`In some
`
`embodiments, the degenerate or semi-degenerate SMI sequence may be a random
`
`degenerate sequence. A double-stranded SMI sequence includes a first degenerate or
`
`semi-degenerate nucleotide n-mer sequence and a second n-mer sequence that is
`
`complementary to the first degenerate or semi-degenerate nucleotide n-mer sequence,
`
`while a single-stranded SMI sequence includes a first degenerate or semi-degenerate
`
`-9-
`
`00011
`
`

`

`WO 2013/142389
`
`PCT/0S2013/032665
`
`nucleotide n-mer sequence. The first and/or second degenerate or semi-degenerate
`
`nucleotide n-mer sequences may be any suitable length to produce a sufficiently large
`
`number of unique tags to label a set of sheared DNA fragments from a segment of DNA.
`
`Each n-mer sequence may be between approximately 3 to 20 nucleotides in length.
`
`Therefore, each n-mer sequence may be approximately 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
`
`14, 15, 16, 17, 18, 19, 20 nucleotides in length. In one embodiment, the SMI sequence
`
`is a random degenerate nucleotide n-mer sequence which is 12 nucleotides in length.
`
`A 12 nucleotide SMI n-mer sequence that is ligated to each end of a target nucleic acid
`molecule, as described in the Example below, results in generation of up to 424 (i.e., 2.8
`x 1014
`) distinct tag sequences.
`
`[0027]
`
`In some embodiments,
`
`the SMI
`
`tag nucleotide sequence may be
`
`completely random and degenerate, wherein each sequence position may be any
`
`nucleotide.
`
`(i.e., each position, represented by "X," is not limited, and may be an
`
`adenine (A), cytosine (C), guanine (G), thymine (T), or uracil (U)) or any other natural or
`
`non-natural DNA or RNA nucleotide or nucleotide-like substance or analog with base(cid:173)
`
`pairing properties (e.g., xanthosine, inosine, hypoxanthine, xanthine, 7-methylguanine,
`
`7-methylguanosine, 5,6-dihydrouracil, 5-methylcytosine, dihydouridine,
`
`isocytosine,
`
`isoguanine, deoxynucleosides, nucleosides, peptide nucleic acids, locked nucleic acids,
`
`glycol nucleic acids and threose nucleic acids). The term "nucleotide" as described
`
`herein, refers to any and all nucleotide or any suitable natural or non-natural DNA or
`
`RNA nucleotide or nucleotide-like substance or analog with base pairing properties as
`
`described above.
`
`In other embodiments, the sequences need not contain all possible
`
`bases at each position. The degenerate or semi-degenerate n-mer sequences may be
`
`generated by a polymerase-mediated method described in the Example below, or may
`
`be generated by preparing and annealing a library of individual oligonucleotides of
`
`known sequence. Alternatively, any degenerate or semi-degenerate n-mer sequences
`
`may be a randomly or non-randomly fragmented double stranded DNA molecule from
`
`any alternative source that differs from the target DNA source.
`
`In some embodiments,
`
`the alternative source is a genome or plasmid derived from bacteria, an organism other
`
`than that of the target DNA, or a combination of such alternative organisms or sources.
`
`The random or non-random fragmented DNA may be introduced into SMI adaptors to
`
`-10-
`
`00012
`
`

`

`WO 2013/142389
`
`PCT/0S2013/032665
`
`serve as variable tags. This may be accomplished through enzymatic ligation or any
`
`other method known in the art.
`
`[0028]
`
`In some embodiments, the SMI adaptor molecules are ligated to both
`
`ends of a target nucleic acid molecule, and then this complex is used according to the
`
`methods described below.
`
`In certain embodiments, it is not necessary to include n(cid:173)
`
`mers on both adapter ends, however, it is more convenient because it means that one
`
`does not have to use two different types of adaptors and then select for ligated
`
`fragments that have one of each type rather than two of one type. The ability to
`
`determine which strand is which is still possible in the situation wherein only one of the
`
`two adaptors has a double-stranded SMI sequence.
`
`[0029]
`
`In some embodiments, the SMI adaptor molecule may optionally include a
`
`double-stranded fixed reference sequence downstream of the n-mer sequences to help
`
`make ligation more uniform and help computationally filter out errors due to ligation
`
`problems with improperly synthesized adaptors. Each strand of the double-stranded
`
`fixed reference sequence may be 4 or 5 nucleotides in length sequence, however, the
`
`fixed reference sequence may be any suitable length including, but not limited to 3, 4, 5
`
`or 6 nucleotides in length.
`
`[0030]
`
`The SMI ligation adaptor may be any suitable ligation adaptor that is
`
`complementary to a ligation adaptor added to a double-stranded target nucleic acid
`
`sequence including, but not limited to a T-overhang, an A-overhang, a CG overhang, a
`
`blunt end, or any other ligatable sequence.
`
`In some embodiments, the SMI ligation
`
`adaptor may be made using a method for A-tailing or T-tailing with polymerase
`
`extension; creating an overhang with a different enzyme; using a restriction enzyme to
`
`create a single or multiple nucleotide overhang, or any other method known in the art.
`
`[0031]
`
`According
`
`to
`
`the embodiments described herein,
`
`the SMI adaptor
`
`molecule may include at least two PCR primer or "flow cell" binding sites: a forward
`
`PCR primer binding site (or a "flow cell 1" (FC1) binding site); and a reverse PCR primer
`
`binding site (or a "flow cell 2" (FC2) binding site). The SMI adaptor molecule may also
`
`include at least two sequencing primer binding sites, each corresponding to a
`
`sequencing read. Alternatively, the sequencing primer binding sites may be added in a
`
`-11-
`
`00013
`
`

`

`WO 2013/142389
`
`PCT/0S2013/032665
`
`separate step by inclusion of the necessary sequences as tails to the PCR primers, or
`
`by ligation of the needed sequences. Therefore, if a double-stranded target nucleic acid
`
`molecule has an SMI adaptor molecule ligated to each end, each sequenced strand will
`
`have two reads - a forward and a reverse read.
`
`Double-stranded SMI sequences
`
`[0032]
`
`Adaptor 1 (shown below) is a Y-shaped SMI adaptor as described above
`
`(the SMI sequence is shown as X's in the top strand (a 4-mer), with the complementary
`
`bottom strand sequence shown as Y's):
`
`\
`\_
`
`-----L•:.,."'iX-- -
`-- ---1"'1'1.Y ---
`
`F·("·">
`'i..·"-
`
`(Adaptor 1)
`
`[0033]
`
`Adaptor 2 (shown below) is a hairpin (or "Li-shaped") linker:
`
`(Adaptor 2)
`
`[0034]
`Following ligation of both adaptors to a double-stranded target nucleic acid,
`the following is structure is obtained:
`FC:f
`\
`\
`\
`
`-----XX .• 'LV.--------·DNA -----·------ \
`
`-·-·---Y":lt':C'f-·-------DNA'-·-·- --

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket