`
`United States Patent [19J
`Jones
`
`[54]
`
`ITERATIVE AND REGENERATIVE DNA
`SEQUENCING METHOD
`
`[75]
`
`Inventor: Douglas H. Jones, Iowa City, Iowa
`
`[73]
`
`Assignee: The University of Iowa Research
`Foundation, Iowa City, Iowa
`
`[21]
`
`Appl. No.: 742,755
`
`[22]
`
`Filed:
`
`Nov. 1, 1996
`
`[51]
`[52]
`[58]
`
`[56]
`
`Int. Cl.6
`............................... C12Q 1/68; C12P 19/34
`U.S. Cl. ............................. 435/6; 435/91.1; 435/91.2
`Field of Search ............................... 435/6, 91.1, 91.2
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,102,785
`5,403,708
`5,508,169
`5,552,278
`5,599,675
`5,604,097
`5,695,934
`5,714,330
`5,763,175
`
`................................ 435/6
`4/1992 Livak et al.
`4/1995 Brennan et al. ............................ 435/6
`4/1996 Deugau et al.
`............................. 435/6
`9/1996 Brenner ....................................... 435/6
`2/1997 Brenner ....................................... 435/6
`2/1997 Brenner ....................................... 435/6
`12/1997 Brenner ....................................... 435/6
`2/1998 Brenner et al. ............................. 435/6
`6/1998 Brenner ....................................... 435/6
`
`FOREIGN PATENT DOCUMENTS
`
`WO 91/02796
`WO 92/15711
`WO 93/21340
`WO 95/27080
`WO 96/12039
`
`3/1991
`9/1992
`10/1993
`10/1995
`4/1996
`
`WIPO.
`WIPO.
`WIPO.
`WIPO.
`WIPO.
`
`OIBER PUBLICATIONS
`
`Straus et al., Proc. Natl. Acad. Sci. USA 87, 1889-1893
`(1990).
`Jones, BioTechniques 22, 938-946 (1997).
`Baxter, G. et al., "Microfabrication in Silicon Microphysi(cid:173)
`ometry," Clin. Chem., vol. 40, No. 9, 1800-1804 (1994).
`Beattie, K. et al., "Advances in Genosensor Research," Clin.
`Chem., vol. 41, No. 5, 700-706 (1995).
`Brenner, S. and Livak, K., "DNA Fingerprinting by Sampled
`Sequencing," Proc. Natl. Acad. Sci. USA, vol. 86,
`8902-8906 (1989).
`
`I 1111111111111111 11111 lllll lllll 111111111111111 1111111111 111111111111111111
`US005858671A
`[11] Patent Number:
`[45] Date of Patent:
`
`5,858,671
`Jan. 12, 1999
`
`Broude, N. et al., "Enhanced DNA Sequencing by Hybrid(cid:173)
`ization," Proc. Natl. Acad. Sci. USA, vol. 91, 3072-3076
`(1994).
`Burns, M. et al., "Microfabricated Structures for Integrated
`DNA Analysis," Proc. Natl. Acad. Sci. USA, vol. 93,
`5556-5561 (1996).
`Caetano-Anolles, G. et al., "Primer-Template Interactions
`During DNAAmplification Fingerprinting with Single Arbi(cid:173)
`trary Oligonucleotides," Mal. Gen. Genet., vol. 235,
`157-165 (1992).
`Canard, B. and Sarfati, R.S., "DNA Polymerase Fluorescent
`Substrates with Reversible 3'-tags," Gene, vol. 148, 1-6
`(1994).
`Carrano, A.V. et al., "A High-Resolution, Fluorescence(cid:173)
`-Based, Semiautomated Method for DNA Fingerprinting,"
`Genomics, vol. 4, 129-136 (1989).
`Cheng, J. et al., "Chip PCR. II. Investigation of Different
`PCR Amplification Systems in Microfabricated Silicon(cid:173)
`-Glass Chips," Nucleic Acids Research, vol. 24, No. 2,
`380-385 (1996).
`Chetverin,A. and Kramer, F., "OligonucleotideArrays: New
`Concepts and Possibilities," BioTechnology, vol. 12,
`1093-1099 (1994).
`Davis, L. et al., "Rapid DNA Sequencing Based Upon
`Single Molecule Detection," Genetic Analysis, Techniques,
`and Applications, vol. 8, No. 1, 1-7 (1991).
`Drmanac, R. et al., "DNA Sequence Determination by
`Hybridization: A Strategy
`for Effecient Large-Scale
`Sequencing," Science, vol. 260, 1649-1652 (1993).
`Eggers, M. and Ehrlich, D., "A Review of Microfabricated
`Devices
`for Gene-Based Diagnostics," Hematologic
`Pathology, vol. 9, No. 1, 1-15 (1995).
`Eggers, M. et al., "A Microchip for Quantitative Detection
`of Molecules Utilizing Luminescent and Radioisotope
`Reporter Groups," BioTechniques, vol. 17, No. 3, 516-524
`(1994).
`Gibbs, R. et al., "Identification of Mutations Leading to the
`Lesch-Nyhan Syndrome Automated Direct DNA Sequenc(cid:173)
`ing of In Vitro Amplified cDNA," Proc. Natl. Acad. Sci.
`USA, vol. 86, 1919-1923 (1989).
`Green, E. and Green, P., "Sequence-tagged Site (STS)
`Content Mapping of Human Chromosomes: Theoretical
`Considerations and Early Experiences," PCR Methods and
`Applications, vol. 1, 77-90 (1991).
`
`~recogrntions1te
`
`I 1.DIgestwithFoki Wash
`I TEMPLATE
`
`.
`
`S'P
`
`-1ournucleot1deoverhang
`
`J
`
`ADAPTORS: DURING EACH CYCLE
`THIS REGION DIFFERS
`
`,....,-:',----------L--- -~P5'
`Fluorescent label
`threedegeneratenucleot1des
`Fokll
`\~;;~~1~%/~~1~~trand r;~nit1on with a fixed G,A T, or C 5' end
`
`,,.,
`
`3 ldent1fylhel1gatedadaptorbyfluorometry.
`4 PCRw1thadifferentpnmerduringeachsequencingcycle
`prevenl1nga~pl1f1cat1onoftemplateprecursorsfromprev10uscycles.
`(Th1sprimer1scomplementarytotheadapto(slowerstrandreg1onthat
`variesw1theachsequenc1ngcycle)
`
`5 '~
`
`5 81ndthePCRproducttoasolIdmatnx Wash.
`
`
`
`5,858,671
`Page 2
`
`Gyllensten, U. and Allen, M., "PCR-based HLA Class II
`Typing," PCR Methods and Applications, vol. 1, 91-98
`(1991).
`Gyllensten, U. and Erlich, H., "Generation of Sin(cid:173)
`gle-Stranded DNA by the Polymerase Chain Reaction and
`its Application to Direct Sequencing of the HLA-DQA
`Locus," Proc. Natl. Acad. Sci. USA, vol. 85, 7652-7656
`(1988).
`Han, J. and Rutter, W., "t..gt22S, a Phage Expression Vector
`for the Directional Cloning of cDNA by the use of a Single
`Restriction Enzyme Sfil," Nucleic Acids Research, vol. 16,
`No. 24, 11837 (1988).
`Jacobs, K. et al., "The Thermal Stability of Oligonucleotide
`Duplexes is Sequence Independent in Tetraalkylammonium
`Salt Solutions: Application to Identifying Recombinant
`DNA Clones," Nucleic Acids Res., vol. 16, 4637-4650
`(1988).
`Kikuchi, Y. et al., "Optically Accessible Microchannels
`Formed in a Single-Crystal Silicon Substrate for Studies of
`Blood Rheology," Microvascular Research, vol. 44,
`226-240, (1992).
`Kobayashi, M. et al., "Fluorescence-based DNA Minise(cid:173)
`quence Analysis for Detection of Known Single-base
`Changes in Genomic DNA," Molecular and Cellular
`Probes, vol. 9, 175-182 (1995).
`Kohsaka, H. and Carson, D., "Solid-Phase Polymerase
`Chain Reaction," Journal of Clinical Laboratory Analysis,
`vol. 8, 452-455 (1994).
`Kricka, L. et al., "Imaging of the Chemiluminescent Reac(cid:173)
`tions in Mesoscale Silicon-Glass Microstructures," J Biolu(cid:173)
`min Chemilumin, vol. 9, 135-138 (1994).
`Kuppuswamy, M. et al., "Angle Nucleotide Primer Exten(cid:173)
`sion to Detect Genetic Diseases: Experimental Application
`to Hemophilia B (Factor IX) and Cystic Fibrosis Genes,"
`Proc. Natl. Acad. Sci. USA, vol. 88, 1143-1147 (1991).
`Lagerkvist, A et al., "Manifold Sequencing: Effecient Pro(cid:173)
`cessing of Large Sets of Sequencing Reactions," Proc. Natl.
`Acad. Sci. USA, vol. 91, 2245-2249 (1994).
`Lamture, J. et al., "Direct Detection of Nucleic Acid Hybrid(cid:173)
`ization on the Surface of a Charge Coupled Device," Nucleic
`Acids Research, vol. 22, No. 11, 2121-2125 (1994).
`Mauro, J. et al., "Fiber-Optic Fluorometric Sensing of
`Polymerase Chain Reaction-Amplified DNA Using an
`Immobilized DNA Capture Protein," Analytical Biochemis(cid:173)
`try, vol. 235, 61-72 (1996).
`Maxam, A. and Gilbert, W., "A New Method for Sequencing
`DNA," Proc. Natl. Acad. Sci. USA, vol. 74, No. 2, 560---564
`(1977).
`Metzker, M. et al., "Termination of DNA Synthesis by Novel
`3'-Modified-Deoxyribonucleoside
`5'-Triphosphates,"
`Nucleic Acids Res., vol. 22, No. 20, 4259-4267 (1994).
`Nikiforov, T. et al., "Genetic Bit Analysis: A Solid Phase
`Method for Typing Single Nucleotide Polymorphisms,"
`Nucleic Acids Res., vol. 22, No. 20, 4167-4175 (1994).
`Riccelli, P. and Benight, A., "Tetramethylammonium Does
`not Universally Neutralize Sequence Dependent DNA Sta(cid:173)
`bility," Nucleic Acids Res., vol. 21, No. 16, 3786-3788
`(1993).
`
`Rosenthal, A. et al., "Large-Scale Production of DNA
`Sequencing Templates by Microtitre Format PCR," Nucleic
`Acids Research, vol. 21, No. 1, 173-174 (1993).
`Sanger, F. et al., "DNA Sequencing with Chain-Terminating
`Inhibitors," Proc. Natl. Acad. Sci. USA, vol. 74, No. 12,
`5463-5467 (1977).
`Shoffner, M. et al., "Chip PCR. I. Surface Passivation of
`Microfabricated Silicon-Glass Chips for PCR," Nucleic
`Acids Research, vol. 24, No. 2, 375-379 (1996).
`Sokolov, B., "Primer Extension Technique for the Detection
`of Single Nucleotide in Genomic DNA," Nucleic Acids Res.,
`vol. 18, No. 12, 3671 (1989).
`Strezoska, Z. et al., "DNA Sequencing by Hybridization:
`100 Bases Read by a Non-Gel-Based Method," Proc. Natl.
`Acad. Sci. USA, vol. 88, 10089-10093 (1991).
`Syviinen, A. et al., "Convenient and Quantitative Determi(cid:173)
`nation of the Frequency of a Mutant Allele Using Solid(cid:173)
`-Phase Minisequencing: Application
`to Aspartylglu(cid:173)
`cosaminuria in Finland," Genomics, vol. 12, 590-595
`(1992).
`Versalovic, J. et al., "Distribution of Repetitive DNA
`Sequences in Eubacteria and Application to Fingerprinting
`of Bacterial Genomes," Nucleic Acid Res., vol. 19, No. 24,
`6823-6831 (1991).
`Warren S., "The Expanding World of Trinucleotide
`Repeats," Science, vol. 271, 1374-1375 (1996).
`Wilding, P. et al., "Manipulation and Flow of Biological
`Fluids in Straight Channels Micromachined in Silicon,"
`Clin. Chem., vol. 40, No. 1, 43-47 (1994).
`Williams, J., et al., "Studies of Oligonucleotide Interactions
`by Hybridisation to Arrays: The Influence of Dangling Ends
`on Duplex Yield," Nucleic Acids Res., vol. 22, No. 8,
`1365-1367 (1994).
`Woolley, A and Mathies, R., "Ultra-High-Speed DNA
`Fragment Seperations Using Microfabricated Capillary
`Array Electrophoresis Chips," Proc. Natl. Acad. Sci. USA,
`vol. 91, 11348-11352 (1994).
`
`Primary Examiner-Kenneth R. Horlick
`Attorney, Agent, or Firm----Lahive & Cockfield, LLP;
`Elizabeth A Hanley
`
`[57]
`
`ABSTRACT
`
`An iterative and regenerative method for sequencing DNA is
`described. This method sequences DNA in discrete intervals
`starting at one end of a double stranded DNA segment. This
`method overcomes problems inherent in other sequencing
`methods, including the need for gel resolution of DNA
`fragments and the generation of artifacts caused by single(cid:173)
`stranded DNA secondary structures. A particular advantage
`of this invention is that it can create offset collections of
`DNA segments and sequence the segments in parallel to
`provide continuous sequence information over long inter(cid:173)
`vals. This method is also suitable for automation and mul(cid:173)
`tiplex automation to sequence large sets of segments.
`
`118 Claims, 9 Drawing Sheets
`
`
`
`U.S. Patent
`
`Jan. 12, 1999
`
`Sheet 1 of 9
`
`5,858,671
`
`FIG. 1
`
`t
`TEMPLATE PRECURSOR: Fokl cut site
`5 ' - - - - - - - - - - ,~ - - - - - - - - - - - -
`3'
`_,__.
`I
`Fokl recognition site
`
`' 1. Digest with Fokl. Wash.
`
`TEMPLATE:
`
`+
`
`-- four nucleotide overhang
`...........
`
`5'P
`
`cycle
`
`ADAPTORS: DURING EACH CYCLE
`THIS REGION DIFFERS
`
`- - - - -dd P5'
`~*
`- - -~
`Fok I I
`Fluor.es.cent lab71
`three degenerate nucleotides
`1den~1fy1ng the. 5
`recognition with a fixed G,A,T, or C 5' end
`terminus of this strand site
`
`2. Ligate. Wash. (The bottom strand undergoes template-directed ligation
`if its 5' end nucleotide is complementary to the template at the ligation
`junction; this strand's 3' region differs with each sequencing cycle. The
`upper strand facilitates ligation of the lower strand.
`
`I
`* - - - - -d
`I
`
`3. Identify the ligated adaptor by fluorometry.
`4. PCR with a different primer during each sequencing cycle,
`preventing amplification of template precursors from previous cycles.
`(This primer is complementary to the adapter's lower strand region that
`varies with each sequencing cycle.)
`5'-----------------(cid:173)
`*------------------1....-
`•
`Bi6tin
`
`,__ _ __ .....,I
`
`5. Bind the PCR product to a solid matrix. Wash.
`
`
`
`U.S. Patent
`
`Jan. 12, 1999
`
`Sheet 2 of 9
`
`5,858,671
`
`FIG. 2
`
`I
`
`TEMPLATE PRECURSOR: BseRI cut site
`5'
`3'
`
`-
`
`,--1
`
`I
`
`BseR I recognition site
`
`! 1. Digest with Bse RI. Bind to a solid matrix.Wash.
`
`TEMPLATE:
`
`5'P
`3'
`---two nucleotide overhang
`
`L--.
`
`+
`ADAPTORS: DURING EACH
`BseRI
`r~cognition one degenerate nucleotide followed
`CYCLE THIS
`REGION DIFFERS site~ by a fixed GA T or C 3' end
`__..-,*
`, I
`3,
`Fluorescent label
`identifying the 3'
`terminus of this strand
`12. Ligate. Wash.(The upper strand undergoes template-directed ligation
`if its 3' encl nucleotide is complementary to the template at the liPeation
`e
`junction; this strand's 5' region differs with each sequencing eye e. Th
`bottom strand improves ligation efficiency.)
`
`I
`
`cycle
`
`I
`
`*
`
`~
`
`3. Identify the ligated adaptor by fluorometry.
`
`I 4. PCR with a different primer during each sequencing cycle,
`
`preventing amplification of template precursors from previous cycl
`es
`(This primer is homologous to the adaptor's upper strand region
`that varies with each sequencing cycle.)
`
`5'•
`
`I
`
`-
`
`B~otin
`
`•
`
`
`
`U.S. Patent
`
`Jan. 12, 1999
`
`Sheet 3 of 9
`
`5,858,671
`
`FIG. 3
`
`J
`TEMPLATE PRECURSOR: Fokl cut set
`...............
`5'
`....___,
`-----
`3'
`I
`Fokl recognition site
`t 1. Digest with Fokl. Wash.
`~ four nucleotide overhang
`TEMPLATE:
`5'P
`
`~
`
`2. Polymerize with 4 ddNTPs, each labeled with a distinct
`fluorescent tag. Wash.
`~ three nucleotide overhang
`5'P
`
`Fluorescently
`labelled ddNTP
`*
`3. Identify incorporated ddNTP by fluorometry.
`
`+
`ADAPTOR: DURING EACH CYCLE
`THIS REGION DIFFERS
`
`cycle
`
`L_
`
`L -
`
`L_
`
`.............. -Fok I three degenerate nucleotides
`l 4. Ligate the adapto~s upper strand; this strand's 5' region differs
`l 5.PCR with a different primer during each sequencing cycle,
`~
`
`5'
`
`I
`
`recognition site
`
`with each sequencing cycle. Wash.
`
`*
`
`preventing amplification of template precursors from previous cycl
`es.
`(This primer is homologous to the ada~tor's upper strand region
`that varies with each sequencing eye e.)
`
`5'
`
`*
`
`I 6. Bind the PCR product to a solid matrix. Wash.
`
`•
`
`L.,_
`
`I
`Biotin
`
`
`
`U.S. Patent
`
`Jan. 12, 1999
`
`Sheet 4 of 9
`
`5,858,671
`
`FIG. 4
`
`l
`
`TEMPLATE PRECURSOR: Fok I cut site
`....-..
`7
`I
`3'
`
`~
`
`.___,......
`I
`Fok I recognition site
`
`l 1. Digest with Fok I. Remove and save supernatant.
`
`TEMPLATE:
`~
`3'
`
`P5'
`...........
`--four nucleotide overhang
`
`! 2. Polymerize with 4 ddNTPs, each labelled with a distinct
`
`fluorescent tag. Wash.
`
`7
`3'
`
`Fluorescently labelled ddNTP
`,
`*_ P5'
`--three nucleotide overhang
`3. Identify incorporated ddNTP by fluorometry.
`ADAPTOR: DURING EACH CYCLE
`THIS REGION DIFFERS
`
`cycle
`
`.......,._. ...........
`Fok I four degenerate nucleotides
`recognition site
`
`! 4. Ligate this adaptor to the trimmed DNA segment in the supernatan
`
`from Step 1. Wash. Either or both strands can be ligated.
`
`th
`cycle. (This primer is targeted to the ada_P.tor's region that varies wi
`each sequencing cycle, preventing amplification of template
`precursors from previous cycles.)
`
`I 5. PCR using a different biotinylated primer with each sequencing
`..
`
`Biotin
`I
`
`I
`
`6. Bind the PCR product to a solid matrix. Wash.
`
`
`
`U.S. Patent
`US. Patent
`
`Jan. 12, 1999
`Jan. 12, 1999
`
`Sheet 5 0f9
`Sheet 5 of 9
`
`5,858,671
`5,858,671
`
`
`
`
`
`U.S. Patent
`
`Jan. 12, 1999
`
`Sheet 6 of 9
`
`5,858,671
`
`FIG. 6
`
`Hemi-methylated DpnI
`recognition domain
`Methyl
`I
`CCATCCGTAAGATGATCTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAG
`GGTAGGCATTCTACTAGAAGACACTGACCACTCATGAGTTGGTTCAGTAAGACTC
`55 bp PCR product
`
`l Dpnl
`
`TCTTCTGTGACTGGTGAGTACTCAACCAAGTCATTCTGAG
`AGAAGACACTGACCACTCATGAGTTGGTTCAGTAAGACTC
`40 bp product
`
`
`
`U.S. Patent
`US. Patent
`
`Jan. 12, 1999
`Jan. 12, 1999
`
`Sheet 7 0f 9
`Sheet 7 of 9
`
`5,858,671
`5,858,671
`
`
`
`FIG~7
`
`
`
`U.S. Patent
`US. Patent
`
`Jan. 12, 1999
`Jan. 12, 1999
`
`Sheet 8 0f9
`Sheet 8 of 9
`
`5,858,671
`5,858,671
`
`FIG. 8
`
`
`
`100~
`
`
`
`10
`
`I‘
`I
`|
`F-
`I
`|
`Adaptor,
`::::=::::::::5
`
`
`
`
`
`a=a==aaaaaaaa :3
`
`
`
`I
`IEI==IIIIIIIII
`or
`
`
`I
`Polymerase]
`IIIIIIIIIIIIII
`I
`
`
`
`ddNTPs
`
`L "‘ _ — ‘— “I
`TEMPLATES-MAGNETIC PARTICLES
`
`.+—.-
`I
`-
`|
`| CHARGE COUPLED DEVICE I
`59 -
`40
`70
`J: __ __ __ _
`_ _|
`I
`|
`|
`
`PRINTOUT OF
`SEQUENCES
`
`+— CONTROL
`
`&
`
`MAGNET
`
`_____
`
`If”— u______
`
`| |
`
`l
`
`I I |
`
`& PCR mix +
`Primerset1,2,3, or 4
`
`
`
`AIR OVEN THERMOCYCLER
`
`I I I |
`
`|_ _____________ _|
`
`
`
`U.S. Patent
`
`Jan. 12, 1999
`
`Sheet 9 of 9
`
`5,858,671
`
`FIG. 9
`
`11S Restriction Wash
`buffer
`Endonuclease
`Polymerase/
`Adaptor/
`Ligase
`ddNTPs
`PCR mix/
`PCR mix/
`Primer set 1 Primer set 2
`PCR mix/
`PCR mix/
`Primer set 3 Primer set 4
`
`11 b
`
`
`
`5,858,671
`
`1
`ITERATIVE AND REGENERATIVE DNA
`SEQUENCING METHOD
`
`BACKGROUND OF THE INVENTION
`
`Analysis of DNA with currently available techniques
`provides a spectrum of information ranging from the con(cid:173)
`firmation that a test DNA is the same or different than a
`standard sequence or an isolated fragment, to the express
`identification and ordering of each nucleotide of the test
`DNA Not only are such techniques crucial for understand(cid:173)
`ing the function and control of genes and for applying many
`of the basic techniques of molecular biology, but they have
`also become increasingly important as tools in genomic
`analysis and a great many non-research applications, such as
`genetic identification, forensic analysis, genetic counseling,
`medical diagnostics and many others. In these latter
`applications, both techniques providing partial sequence
`information, such as fingerprinting and sequence
`comparisons, and techniques providing full sequence deter(cid:173)
`mination have been employed (Gibbs et al., Proc. Natl.
`Acad. Sci USA 1989; 86:1919-1923; Gyllensten et al., Proc.
`Natl. Acad. Sci USA 1988; 85:7652-7656; Carrano et al.,
`Genomics 1989; 4:129-136; Caetano-Annoles et al., Mal.
`Gen. Genet. 1992; 235:157-165; Brenner and Livak, Proc.
`Natl. Acad. Sci USA 1989; 86:8902-8906; Green et al., PCR
`Methods and Applications 1991; 1:77-90; and Versalovic et
`al., Nucleic Acid Res. 1991; 19:6823-6831).
`DNA sequencing methods currently available require the
`generation of a set of DNA fragments that are ordered by
`length according to nucleotide composition. The generation
`of this set of ordered fragments occurs in one of two ways:
`chemical degradation at specific nucleotides using the
`Maxam Gilbert method (Maxam A M and W Gilbert, Proc
`NatlAcad Sci USA 1977; 74:560-564) or dideoxy nucleotide
`incorporation using the Sanger method (Sanger F, S Nicklen,
`and A R Coulson, Proc Natl Acad Sci USA 1977;
`74:5463-5467) so that the type and number of required steps
`inherently limits both the number of DNA segments that can
`be sequenced in parallel, and the number of operations
`which may be carried out in sequence. Furthermore, both
`methods are prone to error due to the anomalous migration
`of DNA fragments in denaturing gels. Time and space
`limitations inherent in these gel-based methods have fueled
`the search for alternative methods.
`Several methods are under development that are designed
`to sequence DNA in a solid state format without a gel
`resolution step. The method that has generated the most
`interest is sequencing by hybridization. In sequencing by
`hybridization, the DNA sequence is read by determining the
`overlaps between the sequences of hybridized oligonucle(cid:173)
`otides. This strategy is possible because a long sequence can
`be deduced by matching up distinctive overlaps between its
`constituent oligomers (Strezoska Z, T Paunesku, D
`Radosavljevic, I Labat, R Drmanac, R Crkvenjakov, Proc
`Natl Acad Sci USA 1991; 88:10089-10093; Drmanac R, S
`Drmanac, Z Strezoska, T Paunesku, I Labat, M Zeremski, J
`Snoddy, WK Funkhouser, B Koop, L Hood, R Crkvenjakov,
`Science 1993; 260:1649-1652). This method uses hybrid(cid:173)
`ization conditions for oligonucleotide probes that distin(cid:173)
`guish between complete complementarity with the target
`sequence and a single nucleotide mismatch, and does not
`require resolution of fragments on polyacrylamide gels
`(Jacobs KA, R Rudersdorf, S D Neill, J P Dougherty, E L
`Brown, and E F Fritsch, Nucleic Acids Res. 1988;
`16:4637-4650). Recent versions of sequencing by hybrid(cid:173)
`ization add a DNA ligation step in order to increase the
`
`5
`
`25
`
`2
`ability of this method to discriminate between mismatches,
`and to decrease the length of the oligonucleotides necessary
`to sequence a given length of DNA (Broude NE, T Sano, C
`L Smith, C R Cantor, Proc. Natl. Acad. Sci. USA
`1994;91:3072-3076, Drmanac R T, International Business
`Communications, Southborough, Mass.). Significant
`obstacles with this method are its inability to accurately
`position repetitive sequences in DNA fragments, inhibition
`of probe annealing by the formation of internal duplexes in
`10 the DNA fragments, and the influence of nearest neighbor
`nucleotides within and adjacent to an annealing domain on
`the melting temperature for hybridization (Riccelli P V, AS
`Benight, Nucleic Acids Res 1993;21:3785-3788, Williams J
`C, S C Case-Green, K U Mir, E M Southern. Nucleic Acids
`15 Res 1994;22:1365-1367). Furthermore, sequencing by
`hybridization cannot determine the length of tandem short
`repeats, which are associated with several human genetic
`diseases (Warren ST, Science 1996; 271: 1374-1375). These
`limitations have prevented its use as a primary sequencing
`20 method.
`The base addition DNA sequencing scheme uses fluores(cid:173)
`cently labeled reversible terminators of polymerase
`extension, with a distinct and removable fluorescent label
`for each of the four nucleotide analogs (Metzker M L,
`Raghavachari R, Richards S, Jacutin S E, Civitello A,
`Burgess K and R A Gibbs, Nucleic Acids Res. 1994;
`22:4259-4267; Canard B and R S Sarfati, Gene 1994;
`148:1-6). Incorporation of one of these base analogs into the
`growing primer strand allows identification of the incorpo-
`30 rated nucleotide by its fluorescent label. This is followed by
`removal of the protecting fluorescent group, creating a new
`substrate for template-directed polymerase extension. Itera(cid:173)
`tion of these steps is designed to permit sequencing of a
`multitude of templates in a solid state format. Technical
`35 obstacles include a relatively low efficiency of extension and
`deprotection, and interference with primer extension caused
`by single-strand DNA secondary structure. A fundamental
`limitation to this approach is inherent in iterative methods
`that sequence consecutive nucleotides. That is, in order to
`40 sequence more than a handful nucleotides, each cycle of
`analog incorporation and deprotection must approach 100%
`efficiency. Even if the base addition sequencing scheme is
`refined so that each cycle occurs at 95% efficiency, one will
`have <75% of the product of interest after only 6 cycles
`45 (0.95 6=0.735). This will severely limit the ability of this
`method to sequence anything but very short DNA
`sequences. Only one cycle of template-directed analog
`incorporation and deprotection appears to have been dem(cid:173)
`onstrated so far (Metzker M L, Raghavachari R, Richards S,
`50 Jacutin S E, Civitello A, Burgess K and RA Gibbs, Nucleic
`Acids Res. 1994; 22:4259-4267; Canard B and RS Sarfati,
`Gene 1994; 148:1-6). A related earlier method, which is
`designed to sequence only one nucleotide per template, uses
`radiolabeled nucleotides or conventional non-reversible ter-
`55 minators attached to a variety of labels (Sokolov B P,
`Nucleic Acids Research 1989;18:3671; Kuppuswamy M N,
`J W Hoffman, C K Kasper, S G Spitzer, S L Groce, and S P
`Bajaj, Proc. Natl. Acad Sci. USA 1991; 88:1143-1147).
`Recently, this method has been called solid-phase minise-
`60 quencing (Syvanen A C, E Ikonen, T Manninen, M
`Bengstrom, H Soderlund, P Aula, and L Peltonen, Genomics
`1992; 12:590-595; Kobayashi M, Rappaport E, BlasbandA,
`Semeraro A, Sartore M, Surrey S, Fortina P., Molecular and
`Cellular Probes 1995; 9:175-182) or genetic bit analysis
`65 (Nikiforov T T, R B Rendle, P Goelet, Y H Rogers, M L
`Kotewicz, S Anderson, G L Trainor, and M R Knapp,
`Nucleic Acids Research 1994; 22:4167-4175), and it has
`
`
`
`5,858,671
`
`3
`been used to verify the parentage of thoroughbred horses
`(Nikiforov T T, R B Rendle, P Goelet, Y H Rogers, M L
`Kotewicz, S Anderson, G L Trainor, and M R Knapp,
`Nucleic Acids Research 1994; 22:4167-4175).
`An alternative method for DNA sequencing that remains
`in the development phase entails the use of flow cytometry
`to detect single molecules. In this method, one strand of a
`DNA molecule is synthesized using fluorescently labeled
`nucleotides, and the labeled DNA molecule is then digested
`by a processive exonuclease, with identification of the
`released nucleotides over real time using flow cytometry.
`Technical obstacles to the implementation of this method
`include the fidelity of incorporation of the fluorescently
`labeled nucleotides and turbulence created around the
`microbead to which the single molecule of DNA is attached
`(Davis L M, F R Fairfield, CA Harger, J H Jett, RA Keller,
`J H Hahn, LA Krakowski; B L Marrone, J C Martin, H L
`Nutter, R L Ratliff, E B Shera, D J Simpson, S A Soper,
`Genetic Analysis, Techniques, and Applications 1991;
`8: 1-7). Furthermore, this method is not amenable to 20
`sequencing numerous DNA segments in parallel.
`Another DNA sequencing method has recently been
`developed that uses class-HS restriction endonuclease diges(cid:173)
`tion and adaptor ligation to sequence at least some nucle(cid:173)
`otides offset from a terminal nucleotide. Using this method, 25
`four adjacent nucleotides have reportedly been sequenced
`and read following the gel resolution of DNA fragments.
`However, a limitation of this sequencing method is that it
`has built-in product losses, and requires many iterative
`cycles (International Application PCT/US95/03678).
`Another problem exists with currently available technolo(cid:173)
`gies in the area of diagnostic sequencing. An ever widening
`array of disorders, susceptibilities to disorders, prognoses of
`disease conditions, and the like, have been correlated with
`the presence of particular DNA sequences, or the degree of
`variation (or mutation) in DNA sequences, at one or more
`genetic loci. Examples of such phenomena include human
`leukocyte antigen (HLA) typing, cystic fibrosis, tumor pro(cid:173)
`gression and heterogeneity, p53 proto-oncogene mutations,
`and ras proto-oncogene mutations (Gullensten et al., PCR
`Methods and Applications, 1:91-98 (1991); International
`application PCT/US92/01675; and International application
`PCT/CA90/00267). A difficulty in determining DNA
`sequences associated with such conditions to obtain diag(cid:173)
`nostic or prognostic information is the frequent presence of
`multiple subpopulations of DNA, e.g., allelic variants, mul(cid:173)
`tiple mutant forms, and the like. Distinguishing the presence
`and identity of multiple sequences with current sequencing
`technology is impractical due to the amount of DNA
`sequencing required.
`
`SUMMARY OF THE INVENTION
`
`The present invention provides an alternative approach
`for sequencing DNA that does not require high resolution
`separations and that generates signals more amenable to
`analysis. The methods of the present invention can also be
`easily automated. This provides a means for readily analyz(cid:173)
`ing DNA from many genetic loci. Furthermore, the DNA
`sequencing method of the present invention does not require
`the gel resolution of DNA fragments which allows for the
`simultaneous sequencing of cDNA or genomic DNA library
`inserts. Therefore, the full length transcribed sequences or
`genomes can be obtained very rapidly with the methods of
`the present invention. The method of the present invention
`further provides a means for the rapid sequencing of previ(cid:173)
`ously uncharacterized viral, bacterial or protozoan human
`
`30
`
`4
`pathogens, as well as the sequencing of plants and animals
`of interest to agriculture, conservation, and/or science.
`The present invention pertains to methods which can
`sequence multiple DNA segments in parallel, without run-
`s ning a gel. Each DNA sequence is determined without
`ambiguity, as this novel method sequences DNA in discrete
`intervals that start at one end of each DNA segment. The
`method of the present invention is carried out on DNA that
`is almost entirely double-stranded, thus preventing the for-
`10 mation of secondary structures that complicate the known
`sequencing methods that rely on hybridization to single(cid:173)
`stranded templates (e.g., sequencing by hybridization), and
`overcoming obstacles posed by microsatellite repeats, other
`direct repeats, and inverted repeats, in a given DNA seg-
`15 ment. The iterative and regenerative DNA sequencing
`method described herein also overcomes the obstacles to
`sequencing several thousand distinct DNA segments
`attached to addressable sites on a matrix or a chip, because
`it is carried out in iterative steps and in various embodiments
`effectively preserves the sample through a multitude of
`sequencing steps, or creates a nested set of DNA segments
`to which a few steps are applied in common. It is, therefore,
`highly suitable for automation. Furthermore, the present
`invention particularly addresses the problem of increasing
`throughput in DNA sequencing, both in number of steps and
`parallelism of analyses, and it will facilitate the identifica(cid:173)
`tion of disease-associated gene polymorphisms, with par(cid:173)
`ticular value for sequencing entire genomes and for charac(cid:173)
`terizing the multiple gene mutations underlying polygenic
`traits. Thus, the invention pertains to novel methods for
`generating staggered templates and for iterative and regen(cid:173)
`erative DNA sequencing as well as to methods for auto(cid:173)
`mated DNA sequencing.
`Accordingly, the invention features a method for identi-
`35 fying a first nucleotide n and a second nucleotide n+x in a
`double stranded nucleic acid segment. The method includes
`(a) digesting the double stranded nucleic acid segment with
`a restriction enzyme to produce a double stranded molecule
`having a single stranded overhang sequence corresponding
`40 to an enzyme cut site; (b) providing an adaptor having a
`cycle identification tag, a restriction enzyme recognition
`domain, a sequence identification region, and a detectable
`label; ( c) hybridizing the adaptor to the double stranded
`nucleic acid having the single-stranded overhang sequence
`45 to form a ligated molecule; ( d) identifying the nucleotide n
`by identifying the ligated molecule; ( e) amplifying the
`ligated molecule from step ( d) with a primer specific for the
`cycle identification tag of the adaptor; and (f) repeating steps
`(a) through (d) on the amplified molecule from step (e) to
`so yield the identity of the nucleotide n+x, wherein x is less
`than or equal to the number of nucleotides between a
`recognition domain for a restriction enzyme and an enzyme
`cut site.
`In another aspect, the invention features a method for
`ss sequencing an interval within a double stranded nucleic acid
`segment by identifying a first nucleotide n and a second
`nucleotide n+x in a plurality of staggered double stranded
`molecules produced from the double stranded nucleic acid
`segment. The method includes (a) attaching an enzyme
`60 recognition domain to different positions along the double
`stranded nucleic acid segment within an interval no greater
`than the distance between a recognition domain for a restric(cid:173)
`tion enzyme and an enzyme cut site, such attachment
`occurring at one end of the double stranded nucleic acid
`65 segment; (b) digesting the double stranded nucleic acid
`segment with a restriction enzyme to produce a plurality of
`staggered double stranded molecules each having a single
`
`
`
`5,858,671
`
`5
`stranded overhang sequence corresponding to the cut site;
`(c) providing an adaptor having a restriction enzyme recog(cid:173)
`nition domain, a sequence identification region, and a detect(cid:173)
`able label; ( d) hybridizing the adaptor to the double standard
`nucleic acid having the single-stranded overhang sequence
`to form a ligated molecule; ( e) identifying a nucleotide n
`within a staggered double stranded molecule by identifying
`the ligated molecule; (t) repeating steps (b) through ( e) to
`yield the identity of the nucleotide n+ x in each of the
`staggered double stranded molecules having the single
`strand overhang sequence thereby sequencing an interval
`within the double stranded nucleic acid segment, wherein x
`is greater than one and no greater than the number of
`nucleotides between a recognition domain for a restriction
`enzyme and an enzyme cut site.
`In another aspect, the invention features a method for
`identifying a first nucleotide n and a second nucleotide n+x
`in a double stranded nucleic acid