throbber
-~
`
` WIPO
`
`WORLD
`INTELLECTUAL PROPERTY
`ORGANIZATION
`
`DOCUMENT MADE AVAILABLE UNDER THE
`PATENT COOPERATION TREATY (PCT)
`PCT /US2013/032665
`International application number:
`
`International filing date:
`
`15 March 2013 (15.03.2013)
`
`Document type:
`
`Document details:
`
`Certified copy of priority document
`
`Country/Office:
`Number:
`Filing date:
`
`us
`61/625,623
`17 April 2012 (17.04.2012)
`
`Date of receipt at the International Bureau:
`
`01 April 2013 (01.04.2013)
`
`Remark: Priority document submitted or transmitted to the International Bureau in compliance with Rule
`17.1(a),(b) or (b-bis)
`
`34, chemin des Colombettes
`1 211 Geneva 20, Switze1·1and
`
`www.wipo.int
`
`00001
`
`EX1083
`
`

`

`SSSARSARA
`SRR —
`Ce SESoSSSRK
`a BRSREE
`SShennerSN~ORS
`aSANSNS
`SSS
`XS
`&
`&
`x
`eae
`
`:
`
`.
`
`<

`
`Patent and Trademark Office
`
`THIS IS TO CERTIFY THAT ANNEXED HERETOIS A TRUE COPY FROM
`THE RECORDSOF THE UNITED STATES PATENT AND TRADEMARK
`OFFICE OF THOSE PAPERS OF THE BELOW IDENTIFIED PATENT
`APPLICATION THAT MET THE REQUIREMENTSTO BE GRANTED A
`FILING DATE.
`
`UNITED: aPredidLe weOG
`
`SAK
`.
`" <e
`
`‘S
`
`.
`
`ATES DEPARTMENTOF COMMERCE
`
`United States Patent and Trademark Office
`
`March 31, 2013
`
`Senties
`
`APPLICATION NUMBER: 61/625,623
`FILING DATE: April 17, 2012
`RELATED PCT APPLICATION NUMBER: PCT/US13/32665
`
`THE COUNTRY CODE AND NUMBEROF YOUR PRIORITY
`APPLICATION, TO BE USED FOR FILING ABROAD UNDER THEPARIS
`CONVENTION,IS US61/625,623
`
`Certified by
`
`Si Laps
`
`Under Secretary of Conmierce
`for Intellectual Property
`and Director af the United States
`
`00002
`
`

`

`Doc Code: TR.PROV
`Document Description: Provisional Cover Sheet (SB16)
`
`PTO/SB/16 (04-07)
`Approved for use through 06/30/2010 0MB 0651-0032
`U.S. Patent and Trademark Office: U.S. DEPARTMENT OF COMMERCE
`Under the Paperwork Reduction Act of 1995. no persons are required to respond to a collection of information unless it displays a valid 0MB control number
`Provisional Application for Patent Cover Sheet
`This is a request for filing a PROVISIONAL APPLICATION FOR PATENT under 37 CFR 1.53(c)
`
`lnventor(s)
`
`Inventor 1
`
`Given Name
`
`Middle Name
`
`Family Name
`
`City
`
`State
`
`Michael
`
`Inventor 2
`
`Schmitt
`
`Seattle
`
`WA
`
`Given Name
`
`Middle Name
`
`Family Name
`
`City
`
`State
`
`Jesse
`
`Inventor 3
`
`Salk
`
`Seattle
`
`WA
`
`Given Name
`
`Middle Name
`
`Family Name
`
`City
`
`State
`
`Lawrence
`
`A.
`
`Loeb
`
`Bellevue
`
`WA
`
`Remove
`
`Country
`
`i
`
`us
`
`Remove
`
`Country
`
`i
`
`us
`
`Remove
`
`Country
`
`i
`
`us
`
`All Inventors Must Be Listed -Additional Inventor Information blocks may be
`generated within this form by selecting the Add button.
`
`I
`
`Add
`
`I
`
`Title of Invention
`
`METHODS OF LOWERING THE ERROR RATE OF MASSIVELY PARALLEL
`DNA SEQUENCING USING DUPLEX CONSENSUS SEQUENCING
`
`Attorney Docket Number (if applicable)
`
`72227 .8043. USO 1
`
`Correspondence Address
`
`Direct all correspondence to (select one):
`
`® The address corresponding to Customer Number O Firm or Individual Name
`
`Customer Number
`
`94991
`
`The invention was made by an agency of the United States Government or under a contract with an agency of the United
`States Government.
`0 No.
`® Yes, the name of the U.S. Government agency and the Government contract number are:
`NIH RO1 CA115802; NIH RO1 CA102029
`
`EFS - Web 1.0.1
`
`00003
`
`

`

`Doc Code: TR.PROV
`Document Description: Provisional Cover Sheet (SB16)
`
`PTO/SB/16 (04-07)
`Approved for use through 06/30/2010 0MB 0651-0032
`U.S. Patent and Trademark Office: U.S. DEPARTMENT OF COMMERCE
`Under the Paperwork Reduction Act of 1995. no persons are required to respond to a collection of information unless it displays a valid 0MB control number
`
`Entity Status
`Applicant claims small entity status under 37 CFR 1.27
`
`(!) Yes, applicant qualifies for small entity status under 37 CFR 1.27
`0 No
`Warning
`
`Petitioner/applicant is cautioned to avoid submitting personal information in documents filed in a patent application that may
`contribute to identity theft. Personal information such as social security numbers, bank account numbers, or credit card
`numbers (other than a check or credit card authorization form PT0-2038 submitted for payment purposes) is never required
`by the USPTO to support a petition or an application. If this type of personal information is included in documents submitted
`to the US PTO, petitioners/applicants should consider redacting such personal information from the documents before
`submitting them to USPTO. Petitioner/applicant is advised that the record of a patent application is available to the public
`after publication of the application (unless a non-publication request in compliance with 37 CFR 1.213(a) is made in the
`application) or issuance of a patent. Furthermore, the record from an abandoned application may also be available to the
`public if the application is referenced in a published application or an issued patent (see 37 CFR1 .14). Checks and credit
`card authorization forms PTO-2038 submitted for payment purposes are not retained in the application file and therefore are
`not publicly available.
`
`Signature
`
`Please see 37 CFR 1.4(d) for the form of the signature.
`
`Signature
`
`/Lara J. Dueppen/
`
`Date (YYYY-MM-DD)
`
`Apr17,2012
`
`First Name
`
`Lara J.
`
`Last Name
`
`Dueppen
`
`Registration Number
`(If appropriate)
`
`65002
`
`This collection of information is required by 37 CFR 1.51. The information is required to obtain or retain a benefit by the public which is to
`file (and by the USPTO to process) an application. Confidentiality is governed by 35 U.S.C. 122 and 37 CFR 1.11 and 1.14. This collection
`is estimated to take 8 hours to complete, including gathering, preparing, and submitting the completed application form to the USPTO.
`Time will vary depending upon the individual case. Any comments on the amount of time you require to complete this form and/or
`suggestions for reducing this burden, should be sent to the Chief Information Officer, U.S. Patent and Trademark Office, U.S. Department
`of Commerce, P.O. Box 1450, Alexandria, VA 22313-1450. DO NOT SEND FEES OR COMPLETED FORMS TO THIS ADDRESS. This
`form can only be used when in conjunction with EFS-Web. If this form is mailed to the USPTO, it may cause delays in handling
`the provisional application.
`
`EFS - Web 1.0.1
`
`00004
`
`

`

`Privacy Act Statement
`
`The Privacy Act of 1974 (P.L. 93-579) requires that you be given certain information in connection with your submission of
`the attached form related to a patent application or paten. Accordingly, pursuant to the requirements of the Act, please be
`advised that: (1) the general authority for the collection of this information is 35 U.S.C. 2(b)(2); (2) furnishing of the
`information solicited is voluntary; and (3) the principal purpose for which the information is used by the U.S. Patent and
`Trademark Office is to process and/or examine your submission related to a patent application or patent. If you do not
`furnish the requested information, the U.S. Patent and Trademark Office may not be able to process and/or examine your
`submission, which may result in termination of proceedings or abandonment of the application or expiration of the patent.
`
`The information provided by you in this form will be subject to the following routine uses:
`
`1.
`
`2.
`
`3.
`
`4.
`
`5.
`
`6.
`
`7.
`
`8.
`
`9.
`
`The information on this form will be treated confidentially to the extent allowed under the Freedom of Information
`Act (5 U.S.C. 552) and the Privacy Act (5 U.S.C 552a). Records from this system of records may be disclosed to the
`Department of Justice to determine whether disclosure of these records is required by the Freedom of Information
`Act.
`A record from this system of records may be disclosed, as a routine use, in the course of presenting evidence to
`a court, magistrate, or administrative tribunal, including disclosures to opposing counsel in the course of settlement
`negotiations.
`A record in this system of records may be disclosed, as a routine use, to a Member of Congress submitting a
`request involving an individual, to whom the record pertains, when the individual has requested assistance from the
`Member with respect to the subject matter of the record.
`A record in this system of records may be disclosed, as a routine use, to a contractor of the Agency having need
`for the information in order to perform a contract. Recipients of information shall be required to comply with the
`requirements of the Privacy Act of 1974, as amended, pursuant to 5 U.S.C. 552a(m).
`A record related to an International Application filed under the Patent Cooperation Treaty in this system of
`records may be disclosed, as a routine use, to the International Bureau of the World Intellectual Property
`Organization, pursuant to the Patent Cooperation Treaty.
`A record in this system of records may be disclosed, as a routine use, to a n other federal agency for purposes
`of National Security review (35 U.S.C. 181) and for review pursuant to the Atomic Energy Act (42 U.S.C. 218(c)).
`A record from this system of records may be disclosed, as a routine use, to the Administrator, General Services,
`or his/her designee, during an inspection of records conducted by GSA as part of that agency's responsibility to
`recommend improvements in records management practices and programs, under authority of 44 U.S.C. 2904 and
`2906. Such disclosure shall be made in accordance with the GSA regulations governing inspection of records for this
`purpose, and any other relevant (i.e., GSA or Commerce) directive. Such disclosure shall not be used to make
`determinations about individuals.
`A record from this system of records may be disclosed, as a routine use, to the public after either publication of
`the application pursuant to 35 U.S.C. 122(b) or issuance of a patent pursuant to 35 U.S.C. 151. Further, a record
`may be disclosed, subject to the limitations of 37 CFR 1.14, as a routine use, to the public if the record was filed in an
`application which became abandoned or in which the proceedings were terminated and which application is
`referenced by either a published application, an application open to public inspection or an
`issued patent.
`A record from this system of records may be disclosed, as a routine use, to a Federal, State, or local law
`enforcement agency, if the USPTO becomes aware of a violation or potential violation of law or regulation.
`
`00005
`
`

`

`Attorney Docket No. 72227.8043.US00
`
`METHODS OF LOWERING THE ERROR RATE OF MASSIVELY PARALLEL DNA
`SEQUENCING USING DUPLEX CONSENSUS SEQUENCING
`
`STATEMENT OF GOVERNMENT INTEREST
`
`[0001]
`
`The present invention was made with government support under Grant
`
`Nos. RO1 CA 115802 and RO1 CA 102029 awarded by the National Institutes of Health.
`
`The Government has certain rights in the invention.
`
`BACKGROUND
`
`[0002]
`
`The advent of massively parallel DNA sequencing has ushered in a new
`
`era of genomic exploration by making simultaneous genotyping of hundreds of billions
`
`of base-pairs possible at small fraction of the time and cost of traditional Sanger
`
`methods [1 ]. Because these technologies digitally tabulate the sequence of many
`
`individual DNA fragments, unlike conventional techniques which simply report the
`
`average genotype of an aggregate collection of molecules, they offer the unique ability
`
`to detect minor variants within heterogeneous mixtures [2].
`
`[0003]
`
`This concept of "deep sequencing" has been implemented in a variety
`
`fields including metagenomics [3, 4], paleogenomics [5], forensics [6], and human
`
`genetics [7, 8] to disentangle subpopulations in complex biological samples. Clinical
`
`applications, such prenatal screening for fetal aneuploidy [9, 1 0], early detection of
`
`cancer [11] and monitoring its response to therapy [12, 13] with nucleic acid-based
`
`serum biomarkers, are rapidly being developed. Exceptional diversity within microbial
`
`[14, 15] viral [16-18] and tumor cell populations [19, 20] has been characterized through
`
`next-generation sequencing, and many
`
`low-frequency, drug-resistant variants of
`
`72227-8043/LEGAL23158670.l
`
`00006
`
`

`

`Attorney Docket No. 72227.8043.US00
`
`therapeutic importance have been so identified [12, 21, 22]. Previously unappreciated
`
`intra-organismal mosasism in both the nuclear [23] and mitochondrial [24, 25] genome
`
`has been revealed by these technologies, and such somatic heterogeneity, along with
`
`that arising within the adaptive immune system [13], may be an important factor in
`
`phenotypic variability of disease.
`
`[0004]
`
`Deep sequencing, however, has limitations. Although, in theory, DNA
`
`subpopulations of any size should be detectable when deep sequencing a sufficient
`
`number of molecules, a practical limit of detection is imposed by errors introduced
`
`during sample preparation and sequencing. PCR amplification of heterogeneous
`
`mixtures can result in population skewing due to stoichastic and non-stoichastic
`
`amplification biases and lead to over- or under-representation of particular variants [26].
`
`Polymerase mistakes during pre-amplification generate point mutations resulting from
`
`base mis-incorporations and rearrangements due to template switching [26, 27].
`
`Combined with the additional errors that arise during cluster amplification, cycle
`
`sequencing and image analysis, approximately 1 % of bases are incorrectly identified,
`
`depending on the specific platform and sequence context [2, 28]. This background level
`
`of artifactual heterogeneity establishes a limit below which the presence of true rare
`
`variants is obscured [29].
`
`[0005]
`
`A variety of improvements at the level of biochemistry [30-32] and data
`
`processing [19, 21, 28, 32, 33] have been developed to improve sequencing accuracy.
`
`The ability to resolve subpopulations below 0.1 %, however, has remained elusive.
`
`Although several groups have attempted to increase sensitivity of sequencing, several
`
`limitations remain. For example techniques whereby DNA fragments to be sequenced
`
`72227-8043/LEGAL23158670.l
`
`-2-
`
`00007
`
`

`

`Attorney Docket No. 72227.8043.US00
`
`are each uniquely tagged [34, 35] prior to amplification [36-41] have been reported.
`
`Because all amplicons derived from a particular starting molecule will bear its specific
`
`tag, any variation in the sequence or copy number of identically tagged sequencing
`
`reads can be discounted as technical error. This approach has been used to improve
`
`counting accuracy of DNA [38, 39, 41] and RNA templates [37, 38, 40] and to correct
`
`base errors arising during PCR or sequencing [36, 37, 39]. Kinde et. al. reported a
`
`reduction in error frequency of approximately 20-fold with a tagging method that is
`
`based on labeling single-stranded DNA fragments with a primer containing a 14 bp
`
`degenerate sequence. This allowed for an observed mutation frequency of -0.001 %
`
`mutations/bp in normal human genomic DNA [36]. Nevertheless, a number of highly
`
`sensitive genetic assays have indicated that the true mutation frequency in normal cells
`
`is likely to be far lower, with estimates of per-nucleotide mutation frequencies generally
`
`ranging from 10-9 to 10-11 [42]. Thus, the mutations seen in normal human genomic
`
`DNA by Kinde et al. are likely the result of significant technical artifacts.
`
`[0006]
`
`Traditionally, next-generation sequencing platforms rely upon generation
`
`of sequence data from a single strand of DNA. As a consequence, artifactual mutations
`
`introduced during the initial rounds of PCR amplification are undetectable as errors -
`
`even with tagging techniques - if the base change is propagated to all subsequent PCR
`
`duplicates. Several types of DNA damage are highly mutagenic and may lead to this
`
`scenario. Spontaneous DNA damage arising from normal metabolic processes results
`
`in thousands of damaging events per cell per day [43].
`
`In addition to damage from
`
`oxidative cellular processes, further DNA damage is generated ex vivo during tissue
`
`processing and DNA extraction [44]. These damage events can result in frequent
`
`72227-8043/LEGAL23158670.l
`
`-3-
`
`00008
`
`

`

`Attorney Docket No. 72227.8043.US00
`
`copying errors by DNA polymerases: for example a common DNA lesion arising from
`
`oxidative damage, 8-oxo-guanine, has the propensity to incorrectly pair with adenine
`
`during complementary strand extension with an overall efficiency greater than that of
`
`correct pairing with cytosine, and thus can contribute a large frequency of artifactual
`
`G----* T mutations [45]. Likewise, deamination of cytosine to form uracil is a particularly
`
`common event which leads to the inappropriate insertion of adenine during PCR, thus
`
`producing artifactual C----* T mutations with a frequency approaching 100% [46].
`
`[0007]
`
`It would be desirable to develop an approach for tag-based error
`
`correction, which reduces or eliminates artifactual mutations arising from DNA damage,
`
`PCR errors, and sequencing errors; allows rare variants in heterogeneous populations
`
`to be detected with unprecedented sensitivity; and which capitalizes on the redundant
`
`information stored in complexed double-stranded DNA.
`
`SUMMARY
`
`[0008]
`
`In one embodiment, a single molecule identifier (SMI) adaptor molecule
`
`for use in sequencing a double-stranded target nucleic acid molecule is provided. Said
`
`SMI adaptor molecule includes a double-stranded single molecule identifier (SMI)
`
`sequence which comprises a double-stranded degenerate or semi-degenerate DNA
`
`sequence; and an SMI ligation adaptor that allows the SMI adaptor molecule to be
`
`ligated to the double-stranded target nucleic acid sequence. In some embodiments, the
`
`double-stranded target nucleic acid molecule is a double-stranded DNA or RNA
`
`molecule.
`
`[0009]
`
`In another embodiment, a method of obtaining the sequence of a double-
`
`stranded target nucleic acid is provided (also known as Duplex Consensus Sequencing
`
`72227-8043/LEGAL23158670.l
`
`-4-
`
`00009
`
`

`

`Attorney Docket No. 72227.8043.US00
`
`or DCS) is provided. Such a method may include steps of ligating a double-stranded
`
`target nucleic acid molecule to at least one SMI adaptor molecule to form a double(cid:173)
`
`stranded SMl-target nucleic acid complex; amplifying the double-stranded SMl-target
`
`nucleic acid complex, resulting in a set of amplified SMl-target nucleic acid products;
`
`and sequencing the amplified SMl-target nucleic acid products.
`
`[0010]
`
`In some embodiments, the method may additionally include generating an
`
`error-corrected double-stranded consensus sequence by (i) grouping the sequenced
`
`SMl-target nucleic acid products into families of paired target nucleic acid strands based
`
`on a common set of SMI sequences; and (ii) removing paired target nucleic acid strands
`
`having one or more nucleotide positions where the paired target nucleic acid strands
`
`are non-complementary (or alternatively removing individual nucleotide positions in
`
`cases where the sequence at the nucleotide position under consideration disagrees
`
`among the two strands).
`
`In further embodiments, the method confirms the presence of
`
`a true mutation by (i) identifying a mutation present in the paired target nucleic acid
`
`strands having one or more nucleotide positions that disagree; (ii) comparing the
`
`mutation present in the paired target nucleic acid strands to the error corrected double(cid:173)
`
`stranded consensus sequence; and (iii) confirming the presence of a true mutation
`
`when the mutation is present on both of the target nucleic acid strands and appears in
`
`all members of a paired target nucleic acid family.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[0011]
`
`Figure 1 illustrates an overview of Duplex Consensus Sequencing.
`
`Sheared double-stranded DNA that has been end-repaired and T-tailed is combined
`
`with A-tailed SMI adaptors and ligated according to one embodiment. Because every
`
`72227-8043/LEGAL23158670.l
`
`-5-
`
`00010
`
`

`

`Attorney Docket No. 72227.8043.US00
`
`adaptor contains a unique, double-stranded, complementary n-mer random tag on each
`
`end (n-mer = 12 bp according to one embodiment), every DNA fragment becomes
`
`labeled with two distinct SMI sequences (arbitrarily designated a and 13 in the single
`
`capture event shown). After size-selecting for appropriate length fragments, PCR
`
`amplification with primers containing lllumina flow-cell-compatible tails is carried out to
`
`generate families of PCR duplicates. By virtue of the asymmetric nature of adapted
`
`fragments, two types of PCR products are produced from each capture event. Those
`
`derived from one strand will have the a SMI sequence adjacent to flow-cell sequence 1
`
`and the 13 SMI sequence adjacent to flow cell sequence 2. PCR products originating
`
`from the complementary strand are labeled reciprocally.
`
`[0012]
`
`Figure 2 illustrates Single Molecule Identifier (SMI) adaptor synthesis
`
`according to one embodiment. Oligonucleotides are annealed and the complement of
`
`the degenerate lower arm sequence (N's) plus adjacent fixed bases is produced by
`
`polymerase extension of the upper strand in the presence of all four dNTPs. After
`
`reaction cleanup, complete adaptor A-tailing is ensured by extended incubation with
`
`polymerase and dATP.
`
`[0013]
`
`Figure 3
`
`illustrates error correction
`
`through Duplex Consensus
`
`Sequencing (DCS) analysis according to one embodiment.
`
`(a-c) shows sequence
`
`reads (brown) sharing a unique set of SMI tags are grouped into paired families with
`
`members having strand identifiers in either the al3 or l3a orientation. Each family pair
`
`reflects one double-stranded DNA fragment.
`
`(a) shows mutations (spots) present in
`
`only one or a few family members representing sequencing mistakes or PCR-introduced
`
`errors occurring late in amplification.
`
`(b) shows mutations occurring in many or all
`
`72227-8043/LEGAL23158670.l
`
`-6-
`
`00011
`
`

`

`Attorney Docket No. 72227.8043.US00
`
`members of one family in a pair representing mutations scored on only one of the two
`
`strands, which can be due to PCR errors arising during the first round of amplification
`
`such as might occur when copying across sites of mutagenic DNA damage. (c) shows
`
`true mutations (* arrow) present on both strands of a captured fragment appear in all
`
`members of a family pair. While artifactual mutations may co-occur in a family pair with
`
`a true mutation, these can be independently identified and discounted when producing
`
`(d) an error-corrected consensus sequence (+ arrow) for each duplex.
`
`(e) shows
`
`consensus sequences from all independently captured, randomly sheared fragments
`
`containing a particular genomic site are identified and (f) compared to determine the
`
`frequency of genetic variants at this locus within the sampled population.
`
`[0014]
`
`Figure 4 illustrates an example of how a SMI sequence with n-mers of 4
`
`nucleotides in length (4-mers) are read by Duplex Consensus Sequencing (DCS)
`
`according to some embodiments.
`
`(A) shows the 4-mers with the PCR primer binding
`
`sites (or flow cell sequences) 1 and 2 indicated at each end.
`
`(B) shows the same
`
`molecules as in (A) but with the strands separated and the lower strand now written in
`
`the 5'-3' direction. When these molecules are amplified with PCR and sequenced, they
`
`will yield the following sequence reads: The top strand will give a read 1 file of TAAC--(cid:173)
`
`and a read 2 file of GCCA---. Combining the read 1 and read 2 tags will give
`
`TAACCGGA as the SMI for the top strand. The bottom strand will give a read 1 file of
`
`CGGA---- and a read 2 file of TAAC---. Combining the read 1 and read 2 tags will give
`
`CGGATAAC as the SMI for the bottom strand.
`
`(C) illustrates the orientation of paired
`
`strand mutations in DCS.
`
`In the initial DNA duplex shown in Figures 4A and 4B, a
`
`mutation "x" (which is paired to a complementary nucleotide "y") is shown on the left
`
`72227-8043/LEGAL23158670.l
`
`-7-
`
`00012
`
`

`

`Attorney Docket No. 72227.8043.US00
`
`side of the DNA duplex. The "x" will appear in read 1, and the complementary mutation
`
`on the opposite strand, "y," will appear in read 2. Specifically, this would appear as "x"
`
`in both read 1 and read 2 data, because "y" in read 2 is read out as "x" by the
`
`sequencer owing to the nature of the sequencing primers, which generate the
`
`complementary sequence during read 2.
`
`DETAILED DESCRIPTION
`
`[0015]
`
`Single molecule identifier adaptors and methods for their use are provided
`
`herein. According to the embodiments described herein, a single molecule identifier
`
`(SMI) adaptor molecule is provided. Said SMI adaptor molecule may include a double(cid:173)
`
`stranded single molecule identifier (SMI) sequence, and an SMI ligation adaptor (Figure
`
`2). Optionally, the SMI adaptor molecule further includes at least two PCR primer
`
`binding sites, at least two sequencing primer binding sites, or both.
`
`[0016]
`
`In some embodiments, the SMI adaptor molecule includes a double-
`
`stranded, complementary SMI sequence (or "tag") of nucleotides that is degenerate or
`
`semi-degenerate.
`
`In some embodiments, the degenerate or semi-degenerate SMI
`
`sequence may be a random degenerate sequence.
`
`The double-stranded SMI
`
`sequence includes a first degenerate or semi-degenerate nucleotide n-mer sequence
`
`and a second n-mer sequence that is complementary to the first degenerate or semi(cid:173)
`
`degenerate nucleotide n-mer sequence. The first and second degenerate or semi(cid:173)
`
`degenerate nucleotide n-mer sequences may be any suitable length to produce a
`
`sufficiently large number of unique tags to label a set of sheared DNA fragments from a
`
`segment of DNA. Each n-mer sequence may be between approximately 4 to 20
`
`nucleotides in length. Therefore, each n-mer sequence may be approximately 4, 5, 6, 7,
`
`72227-8043/LEGAL23158670.l
`
`-8-
`
`00013
`
`

`

`Attorney Docket No. 72227.8043.US00
`
`8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 nucleotides in length. In one embodiment,
`
`the SMI sequence is a random degenerate nucleotide n-mer sequence which is 12
`
`nucleotides in length. A 12 nucleotide SMI n-mer sequence that is ligated to each end
`
`of a target nucleic acid molecule, as described in the Example below, results in
`
`generation of up to 424 (i.e., 2.8 x 1014
`
`) distinct tag sequences.
`
`[0017]
`
`In some embodiments,
`
`the SMI
`
`tag nucleotide sequence may be
`
`completely random and degenerate, wherein each sequence position may be any
`
`nucleotide.
`
`(i.e., each position, represented by "X," is not limited, and may be an
`
`adenine (A), cytosine (C), guanine (G), thymine (T), or uracil (U)) or any other natural or
`
`non-natural DNA or RNA nucleotide or nucleotide-like substance or analog with base(cid:173)
`
`pairing properties (e.g., xanthosine, inosine, hypoxanthine, xanthine, 7-methylguanine,
`
`7-methylguanosine, 5,6-dihydrouracil, 5-methylcytosine, dihydouridine,
`
`isocytosine,
`
`isoguanine, deoxynucleosides, nucleosides, peptide nucleic acids, locked nucleic acids,
`
`glycol nucleic acids and threose nucleic acids). The term "nucleotide" as described
`
`herein, refers to any and all nucleotide or any suitable natural or non-natural DNA or
`
`RNA nucleotide or nucleotide-like substance or analog with base pairing properties as
`
`described above.
`
`In other embodiments, the sequences need not contain all possible
`
`bases at each position. The degenerate or semi-degenerate n-mer sequences may be
`
`generated by a polymerase-mediated method described in the Example below, or may
`
`be generated by preparing and annealing a library of individual oligonucleotides of
`
`known sequence. Alternatively, any degenerate or semi-degenerate n-mer sequences
`
`may be a randomly or non-randomly fragmented double stranded DNA molecule from
`
`any alternative source that differs from the target DNA source.
`
`In some embodiments,
`
`72227-8043/LEGAL23158670.l
`
`-9-
`
`00014
`
`

`

`Attorney Docket No. 72227.8043.US00
`
`the alternative source is a genome or plasmid derived from bacteria, an organism other
`
`than that of the target DNA, or a combination of such alternative organisms or sources.
`
`The random or non-random fragmented DNA may be introduced into SMI adaptors to
`
`serve as variable tags. This may be accomplished through enzymatic ligation or any
`
`other method known in the art.
`
`[0018]
`
`In some embodiments, the SMI adaptor molecules are ligated to both
`
`ends of a target nucleic acid molecule, and then this complex is used according to the
`
`methods described below.
`
`In certain embodiments, it is not necessary to include n(cid:173)
`
`mers on both adapter ends, however, it is more convenient because it means that one
`
`does not have to use two different types of adaptors and then select for ligated
`
`fragments that have one of each type rather than two of one type. The ability to
`
`determine which strand is which is still possible in the situation wherein only one of the
`
`two adaptors has a double-stranded SMI sequence.
`
`[0019]
`
`In some embodiments, the SMI adaptor molecule may optionally include a
`
`double-stranded fixed reference sequence downstream of the n-mer sequences to help
`
`make ligation more uniform and help computationally filter out errors due to ligation
`
`problems with improperly synthesized adaptors. Each strand of the double-stranded
`
`fixed reference sequence may be 4 or 5 nucleotides in length sequence, however, the
`
`fixed reference sequence may be any suitable length including, but not limited to 3, 4, 5
`
`or 6 nucleotides in length.
`
`[0020]
`
`The SMI ligation adaptor may be any suitable ligation adaptor that is
`
`complementary to a ligation adaptor added to a double-stranded target nucleic acid
`
`sequence including, but not limited to a T-overhang, an A-overhang, a CG overhang, a
`
`72227-8043/LEGAL23158670.l
`
`-10-
`
`00015
`
`

`

`Attorney Docket No. 72227.8043.US00
`
`blunt end, or any other ligatable sequence.
`
`In some embodiments, the SMI ligation
`
`adaptor may be made using a method for A-tailing or T-tailing with polymerase
`
`extension; creating an overhang with a different enzyme; using a restriction enzyme to
`
`create a single or multiple nucleotide overhang, or any other method known in the art.
`
`[0021]
`
`According
`
`to
`
`the embodiments described herein,
`
`the SMI adaptor
`
`molecule may include at least two PCR primer or "flow cell" binding sites: a forward
`
`PCR primer binding site (or a "flow cell 1" (FC1) binding site); and a reverse PCR primer
`
`binding site (or a "flow cell 2" (FC2) binding site). The SMI adaptor molecule may also
`
`include at least two sequencing primer binding sites, each corresponding to a
`
`sequencing read. Alternatively, the sequencing primer binding sites may be added in a
`
`separate step by inclusion of the necessary sequences as tails to the PCR primers, or
`
`by ligation of the needed sequences. Therefore, if a double-stranded target nucleic acid
`
`molecule has an SMI adaptor molecule ligated to each end, each sequenced strand will
`
`have two reads - a forward and a reverse read.
`
`[0022]
`
`In some embodiments, the SMI adaptor molecule is a "Y-shaped" adaptor,
`
`which allows both strands to be independently amplified by a PCR method prior to
`
`sequencing because both the top and bottom strands have binding sites for PCR
`
`primers FC1 and FC2 as shown below. A schematic of a Y-shaped SMI adaptor
`
`molecule is also shown in Figure 2.
`
`[0023]
`
`A Y-shaped SMI adaptor requires successful amplification and recovery of
`
`both strands. In one embodiment, a modification that would simplify consistent recovery
`
`of both strands entails ligation of a Y-shaped SMI adaptor molecule to one end of a
`
`DNA duplex molecule, and ligation of a "Li-shaped" linker to the other end of the
`
`72227-8043/LEGAL23158670.l
`
`-11-
`
`00016
`
`

`

`Attorney Docket No. 72227.8043.US00
`
`molecule. PCR amplification of the hairpin-shaped product will then yield a linear
`
`fragment with flow cell sequences on either end. Distinct PCR primer binding sites (or
`
`flow cell sequences FC1 and FC2) will flank the DNA sequence corresponding to each
`
`of the two strands, and a given sequence seen in Read 1 will then have the sequence
`
`corresponding to the complementary DNA duplex strand seen in Read 2. Mutations can
`
`be scored only if they are seen on both ends of the molecule (corresponding to each
`
`strand of the original double-stranded fragment), i.e. at the same position in both Read
`
`1 and Read 2. This design may be accomplished as follows.
`
`[0024]
`
`Adaptor 1 (shown below) is a Y-shaped SMI adaptor as described above
`
`(the SMI sequence is shown as X's in the top strand (a 4-mer), with the complementary
`
`bottom strand sequence shown as Y's):
`
`\
`\
`\
`
`------1C{X...'X-----
`-----"l''C'{'t: :.. __ _
`
`FC2;
`
`(Adap

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket