`
`(19) World Intellectual Property
`Organization
`International Bureau
`
`(43) lntemational Publication Date
`30 June 2005 (30.06.2005)
`
`
`
`International Patent Classification7:
`
`C12N
`
`(81)
`
`International Application Number:
`PCT/US2004/O41478
`
`International Filing Date:
`10 December 2004 (10.12.2004)
`
`Filing Language:
`
`Publication Language:
`
`English
`
`English
`
`Priority Data:
`0/733,847
`
`10 December 2003 (10.12.2003)
`
`US
`
`(84)
`
`Applicant fior all designated States except US): lVIASS-
`ACHUSETTS INSTITUTE OF TECHNOLOGY
`
`[US/US]; 77 Massachusetts Avenue, Cambridge, MA
`02139 (US).
`
`Inventors; and
`Inventors/Applicants (for US only): CARR, Peter, A.
`[US/US]; 33 Wobum Street, Medford, MA 02155 (US).
`CHOW, Brian, Y. [US/US]; 12 Chatham Street, Apt. 2,
`Cambridge, MA 02139 (US). JACOBSON, Joseph, M.
`[US/US]; 233 Grant Avenue, Newton, MA 02459 (US).
`
`(10) International Publication Number
`
`WO 2005/059097 A2
`
`Designated States (unless otherwise indicated, for every
`kind of national protection available): AE, AG, AL, AM,
`AT, AU, AZ, BA, BB, BG, BR, BW, BY, BZ, CA, CH, CN,
`CO, CR, CU, CZ, DE, DK, DM, DZ, EC, EE, EG, ES, FI,
`GB, GD, GE, GH, GM, HR, HU, ID, IL, IN, IS, JP, KE,
`KG, KP, KR, KZ, LC, LK, LR, LS, LT, LU, LV, MA, MD,
`MG, MK, MN, MW, MX, MZ, NA, NI, NO, NZ, OM, PG,
`PH, PL, PT, RO, RU, SC, SD, SE, SG, SK, SL, SY, TJ, TM,
`TN, TR, TT, TZ, UA, UG, US. UZ, VC, VN, YU, ZA, ZM,
`ZW.
`
`Designated States (unless otherwise indicated, for ever
`kind of regional protection available): ARIPO (BW, GH,
`GM, KE, LS, MW, MZ, NA, SD, SL, SZ, TZ, UG, ZM,
`ZW), Eurasian (AlVI, AZ, BY, KG, KZ, MD, RU, TJ, TM),
`European (AT, BE, BG, CH, CY, CZ, DE, DK, EE, ES, FI,
`FR, GB, GR, HU, IE, IS, IT, LT, LU, MC, NL, PL, PT, RO,
`SE, SI, SK, TR), OAPI (BF, BJ, CF, CG, CI, CM, GA, GN,
`GQ, GW, ML, MR, NE, SN, TD, TG).
`
`Published:
`without international search report and to be republished
`upon receipt of that report
`
`Agent: HENDERSON, Norma, E.; Hinckley, Allen, &
`Snyder L.L.P., 2nd Floor, 43 North Main Street, Concord,
`NH 0330174934 (US).
`
`For two—letter codes and other abbreviations, refer to the ”Guid—
`ance Notes on Codes and Abbreviations ” appearing at the begin—
`ning of each regular issue of the PCT Gazette.
`
`(51)
`
`(21)
`
`(22)
`
`(25)
`
`(26)
`
`(30)
`
`(71)
`
`(72)
`(75)
`
`(74)
`
`(54) Title: METHODS FOR HIGH FIDELITY PRODUCTION OF LONG NUCLEIC ACID MOLECULES
`
`(57) Abstract: This invention generally relates to nucleic acid synthesis, in particular DNA synthesis. More particularly, the inven—
`tion relates to the production of long nucleic acid molecules with precise user control over sequence content. This invention also
`relates to the prevention and/or removal of erros Within nucleic acid molecules.
`
`
`
`W02005/059097A2|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
`
`
`
`WO 2005/059097
`
`PCT/U52004/041478
`
`METHODS FOR HIGH FIDELITY PRODUCTION OF LONG NUCLEIC
`
`ACID MOLECULES
`
`CROSS-REFERENCE TO RELATED PATENT APPLICATIONS
`
`[0001]
`
`This application is related to the copending application titled “Methods for High
`
`Fidelity Production of Long Nucleic Acid Molecules with Error Control” by Carr et al., and
`
`filed concurrently herewith and US. Patent Application Serial Number 10/733,855, filed
`
`12/ 10/2003.
`
`[0002]
`
`This application claims priority to US. Patent Application Serial Number
`
`10/733,847, filed 12/10/2003.
`
`FIELD OF THE INVENTION
`
`[0003]
`
`This invention generally relates to nucleic acid synthesis, in particular DNA
`
`synthesis. More particularly, the invention relates to the production of long nucleic acid
`
`molecules with precise user control over sequence content. This invention also relates to the
`
`prevention and/or removal of errors within nucleic acid molecules.
`
`BACKGROUND OF THE INVENTION
`
`[0004]
`
`The availability of synthetic DNA sequences has fueled major revolutions in genetic
`
`engineering and the understanding of human genes, making possible such techniques as site-
`
`directed mutagenesis, the polymerase chain reaction (PCR), high—throughput DNA
`
`sequencing, gene synthesis, and gene expression analysis using DNA microarrays.
`
`[0005] DNA produced from a user-specified sequence is typically synthesized chemically
`
`in the form of short oligonucleotides, often ranging in length from 20 to 70 bases. For
`
`methods and materials known in the art related to the chemical synthesis of nucleic acids see,
`
`e.g., Beaucage, S.L., Caruthers, M.H., The Chemical Synthesis ofDNA/RNA, which is hereby
`
`incorporated by reference. Syntheses of longer
`
`
`
`WO 2005/059097
`
`PCT/US2004/041478
`
`oligonucleotides are possible, but the intrinsic error rate of each coupling step (typically
`
`l-2%) is such that preparations of longer oligonucleotides are increasingly likely to be
`
`riddled with errors, and that the pure desired product will be numerically overwhelmed
`
`by sequences containing errors. Thus to produce longer DNA sequences, the molecule is
`
`not synthesized as a single long piece. Rather, current methods involve combining many
`
`shorter oligonucleotides to build the larger desired sequence, a process often referred to
`
`as “gene synthesis” (though the product need not be confined to a single gene).
`
`[0006] Linear synthesis of nucleic acids may be accomplished using biological
`
`molecules and protecting groups The most common linear synthesis techniques are based
`
`on solid—phase phosphoramidite chemistry. The 3’—phosphate is affixed to solid—phase
`
`support (typically controlled-pore glass beads, silicon substrates, or glass substrates), and
`
`an individual nucleotide of choice is added to a chain growing in the 3’-5’ direction by
`
`means of a 5’-protecting group (typically an acid-labile or photo-cleavable protecting
`
`group). In linear syntheses based on phosphoramidite chemistry, there are many potential
`
`sources of sequence error and oligonucleotide damage that are well documented. Most
`
`notably, the removal of the 5’-protecting group usually involves an acidic treatment that
`
`can remove the base, or in the case of photo—labile 5’—protecting group, require ultraviolet
`
`irradiation that can damage the nucleotide. The nucleotide may fail to incorporate into
`
`the growing strand because of insufficient reaction time. Nearly all organic and inorganic
`
`solvents and reagents employed in the process can chemically damage the growing
`
`nucleotide. Such sources of error ultimately limit the fidelity and length of the
`
`oligonucleotide, and fiirthermore, limit the fidelity and length of larger nucleic acids
`
`assembled from linearly synthesized strands. For methods and materials known in the art
`
`related to phosphorarnidite nucleic acid synthesis see, e.g., Sierzchala, A.B., Dellinger,
`
`D.J., Betley, J.R., Wyrzykiewicz, Yamada, C.M., Caruthers, M.H., Solid-Phase
`
`Oligodeoaynucleotide Synthesis: A Two-Step Cycle Using Peroxy Anion Deprotection, J.
`
`AM. CHEM. SOC., 125, 13427-13441 (2003), which is hereby incorporated by reference.
`
`[0007] Errors in gene synthesis are typically controlled in two ways: 1) the individual
`
`oligonucleotides can each be purified to remove error sequences; 2) the final cloned
`
`
`
`WO 2005/059097
`
`PCT/US2004/041478
`
`products are sequenced to discover if errors are present.
`
`In this latter case, the errors are
`
`dealt with by either sequencing many clones until an error-free sequence is found, using
`
`mutagenesis to specifically fix an error, or choosing and combining specific error-free
`
`sub~sequences to build an error free full length sequence.
`
`[0008]
`
`Synthesizing a single gene has become commonplace enough that many
`
`companies exist to perform this task for a researcher. Single genes up to about 1000 base
`
`pairs (bp) are typically offered, and larger sequences are feasible, up to about 10,000 bp,
`
`for the construction of a single large gene, or a set of genes together. A recent
`
`benchmark was the production of the entire poliovirus genome, 7500 bp, capable of
`
`producing functional viral particles. These syntheses of long DNA products employ the
`
`methods described above, often aided by the large-scale production of oli gonucleotides,
`
`such as with mutiplexed 48—, 96— or 384— column synthesizers, and using sample-handling
`
`robots to speed manipulations. For methods and materials known in the art related to
`
`gene synthesis, see e.g., Au., L., Yang, W., Lo., 8., Kao, C., Gene Synthesis by a LCR~
`
`Based Approach: High-Level Production ofLeptin—L45 Using Synthetic Gene in
`
`Escherichia Cali, BIOCHEM. & BlOPHYS. RESEARCH COMM., 248, 200—203 (1998);
`
`Baedeker, M., Schulz, G.E., Overexpression ofa Designed 2.2 kb Gene ofEukaryotic
`
`Phenylalanine Ammonia—Lyase in Escherichia coli, FEBS LETTERS 475, 57—60 (1999),
`
`Casimiro, D.R., Wright, RE, Dyson, H.J., PCR-based Gene Synthesis and Protein NMR
`
`Spectroscopy, STRUCTURE, Vol. 5, No. 11, 1407-1412 (1997); Cello, 1., Paul, A.V.,
`
`Wimmer, B, Chemical Synthesis ofPoliovirus cDNA: Generation ofInfectious Virus in
`
`the Absence ofNatural Template, SCIENCE, 297, 1016—1018 (2002); Kneidinger, B.,
`
`Graninger, M., Messner, P., Scaling Up the Ligase Chain Reaction-Based Approach to
`
`Gene Synthesis, BIOTECHNIQUES, 30, 249-252 (2001); Dietrich, R., Wirsching, F., Opitz,
`
`T., Schwienhorst, A., Gene Assembly Based on Blunt-Ended Double-Stranded DNA-
`
`Molecules, BIOTECH. TECHNIQUES, Vol. 12, No. 1, 49-54 (1998); Hoover, D.M.,
`
`Lubkowski, J., DNA Works: An Automated Methodfor Designing Oligonucleotides for
`
`PCR-based Gene Synthesis, NUCLEIC ACIDS RESEARCH, Vol. 30, No. 10, 1-7 (2002);
`
`Stemmer, W.P.C., Crameri, A., Ha, K.D., Brennan, T.M., Heyneker, H.L., Single-Step
`
`
`
`WO 2005/059097
`
`PCT/US2004/041478
`
`Assembly ofa Gene and Entire Plasmidfrom Large Numbers of
`
`Oligodeoxyribonucleotides, GENE, 164, 49-53 (1995); Withers-Martinez, C., Carpenter,
`
`E.P., Hackett, F., Ely, B., Sajid, M., Grainger, M., Blackman, M.J., PCR-Based Gene
`
`Synthesis as an Ejj’icz’ent Approach for Expression of the A+T-Rich Malaria Genome,
`
`PROTEIN ENG, Vol. 12, No. 12, 1113-1120 (1999); and Venter Cooks Up a Synthetic
`
`Genome in Record Time, SCIENCE, 302, 1307 (2003) all of which are hereby incorporated
`
`by reference. For patents and patent applications related to gene synthesis, see e.g., US.
`
`Pat. 6,521,453 and 6,521,427, and US. Pat. App. Pub. Nos. 20030165946, 20030138782,
`
`and 2003 0087238, all hereby incorporated by reference.
`
`[0009] As the goals of genetic engineers become more complex and larger in scale,
`
`these methods become prohibitive in terms of the cost, time, and effort involved to
`
`produce longer sequences and correct the subsequent errors. For example, a fee may be
`
`$5 per bp for a 500 bp sequence, with a waiting time of 2-4 weeks, whereas even the
`
`most rapid portion of the poliovirus synthesis required several months and tens of
`
`thousands of dollars (the project overall required two years and over $100,000). A
`
`technology which makes this process both faster and more affordable would be a
`
`tremendous aid to researchers in need of very long DNA molecules.
`
`[0010] Some examples of work which would benefitv
`
`[0011]
`
`1) Vaccine trials (modest DNA length, but many variants): in producing proteins
`
`for use in vaccine trials, 3 large number of variant protein sequences are ofien examined.
`
`The number of options explored is typically limited by the number of variants that can be
`
`produced. The lengths of the DNA molecules encoding such proteins might be in the
`
`range of about 100 bp to about 2000 bp, or longer, depending on the protein. One of
`
`ordinary skill in the art will understand that the length of a DNA molecule may vary
`
`greatly depending on the protein product desired.
`
`[0012]
`
`2) Gene therapy (intermediate DNA length): retroviral vectors used for gene
`
`therapy might range from about 20,000 to about 50,000 bp. The process of constructing
`
`these vectors also limits the number and complexity of variants which can be tested in
`
`the laboratory.
`
`
`
`WO 2005/059097
`
`PCT/US2004/041478
`
`3) Bacterial engineering (greatest DNA length, genomic synthesis): currently,
`[0013]
`changes made to a bacterial organism are attempted one gene at a time, a painstaking
`process when several changes are desired.
`In the case of engineering a bacterium to
`perform a task, such as waste detoxification or protein production, a large number of
`intricate changes may be required. If the complete genome of the desired bacterium
`could be generated easily de novo, a great deal of time and effort could be saved, and
`
`new areas of research would be made possible. Bacterial genomes range from several
`
`hundred kilobases to many megabases. One of ordinary skill in the art will understand
`
`that the size of bacterial genomes varies greatly depending on the bacterium in question.
`
`[0014] The fundamental challenges of the current technology:
`[0015]
`l) Scaling: as the size of the desired sequence grows, the production time and
`costs involved grow linearly, or worse. An ideal method would involve smaller amounts
`
`of reagents, shorter cycle times for oligonucleotide synthesis, a greatly improved
`parallelization of the synthesis process used to provide the oligonucleotides, and/or an
`improved process for the assembly of oligonucleotides into larger molecules.
`[0016]
`2) Errors: with the production of larger DNA sequences, expected per base error
`rates will essentially guarantee that conventional methods will yield sequences containing
`
`errors. These errors will require more effective techniques than the current control
`
`procedures described above.
`
`SUMMARY OF THE INVENTION
`
`[0017] The present invention provides methods for the error-fi‘ee production of long
`nucleic acid molecules with precise user control over sequence content. In a preferred
`
`embodiment of the invention, long error-free nucleic acid molecules can be generated in
`
`parallel from oligonucleotides immobilized on a surface, such as an oligonucleotide
`microarray. The movement of the growing nucleic acid molecule can be controlled
`through the stepwise repositioning of the growing molecule. Stepwise repositioning
`refers to the position of the growing molecule as it interacts with the oligonucleotides
`immobilized on the surface. One aspect of the invention allows for the synthesis of
`
`
`
`WO 2005/059097
`
`PCT/US2004/041478
`
`nucleic acids in a parallel format through the use ofa ligase or polymerase reaction. In
`
`another aspect of the invention, the oligonucleotides may also be detached from their
`
`support and manipulated by, for example, a microfluidic device for the purpose of
`
`assembly into larger molecules. Regarding parallel DNA arrays, it is important to note
`
`that a single nucleotide may be synthesized using the parallel arrays, and then amplified
`
`by techniques well known in the art, such as but not limited to, polymerase chain
`
`reaction.
`
`[0018]
`
`In another aspect of the invention, the synthesis of a long nucleotide chain may
`
`be accomplished in parallel starting from a set of many redundantly overlapped
`
`oligonucleotides. Synthesis relies on annealing complementary pairs of oligonucleotides
`
`and extending them to produce longer oligonucleotide segments, until the full-length
`
`sequence is produced. The majority of the oligonucleotide sequence is used to generate
`
`the complementary overlap, improving the chance of the two strands annealing. This
`
`approach guards against the failed synthesis of any one distinct oligonucleotide sequence,
`
`as a less complementary pair of oligonucleotides may still anneal under the appropriate
`
`conditions and produce a full length nucleotide sequence. In another aspect of the
`
`invention, long nucleotide sequences may contain one or more regions containing sites
`
`specifically designed to facilitate the joining of separate molecules. These sequences can
`
`be sites for specific endonuclease restriction and subsequent ligation, homologous
`
`recombination, site-specific recombination, or transposition.
`
`[0019] A preferred embodiment of the invention provides a method for the synthesis of
`
`single-stranded DNA with various 3’-phosphate protecting groups, such as but not
`
`limited to, peptide, carbohydrate, diphosphate, or phosphate derivative 3 ’-phosphate
`
`protecting groups. Afier an addition to the nascent DNA strand by a capped nucleotide or
`
`oligonucleotide, a protease or phosphotase cleaves the bond between the capping group
`
`and the most recently added nucleotide. DNA polymerase or nucleotide ligase can be
`
`used to add a 3’ capped nucleotide or oligonucleotide to the 3’ end of the nascent strand.
`
`DNA ligase can also be used to add a 5’ capped nucleotide or oligonucleotide to the 5’
`
`end of the nascent strand.
`
`
`
`WO 2005/059097
`
`PCT/US2004/041478
`
`[0020] Another preferred embodiment of the invention provides a method for the
`
`synthesis of a double-stranded DNA with an oligonucleotides capping group. The
`
`capping group is comprised of a nucleotide or short oligonucleotide that can be cleaved
`
`from the nascent strand by a restriction enzyme. After the addition of a capped
`
`nucleotide or oligonucleotide, a restriction enzyme which recognizes the capping
`
`nucleotide sequence will cleave the fragment 3’ to the newly added nucleotide. Thus, the
`
`desired nucleotide will remain on the nascent strand. This procedure is repeated to create
`
`a specific oligonucleotide sequence. Different restriction enzymes and corresponding
`
`capping nucleotides or sequence redesign may be required for the creation of desired
`
`oligonucleotides in order to prevent sequence recognition in the nascent strand.
`
`[0021] Yet another preferred embodiment of the invention provides a method for the
`
`synthesis of single-stranded and/or double-stranded DNA using oligonucleotide hairpin-
`
`loops as heat-removable protecting groups and/or PCR primers. Oligonucleotides with
`
`secondary conformational structures, such as DNA hairpin-loops (also termed stem-
`
`loops, and molecular beacons), can also be used as protecting groups. Gentle heating is
`
`an improved method of deprotection over enzymatic removal because heat distributes
`
`more quickly and uniformly than enzymes because the enzymatic removal rate is
`
`diffusion-limited, and gentle heating is a lower-cost resource than restriction enzymes.
`
`[0022]
`
`[0023] The present invention also provides methods for detecting and correcting errors
`
`that arise in the process of constructing long nucleic acid molecules. A preferred
`
`embodiment of the invention utilizes a force-feedback system using magnetic and/or
`
`optical tweezers, either separately or in combination. Using this system, double or
`
`single-stranded DNA is grown off a solid—phase support using one or a combination of
`
`the aforementioned DNA synthesis methods. The solid-phase support is magnetic in
`
`nature and held in a fixed equilibrium position by applying an electric field and magnetic
`
`field gradient created by the magnetic tweezers that opposes the electrophoretic force. As
`
`oligonucleotides are annealed to the growing strand, the negatively charged phosphate
`
`backbone adds charge to the bead-strand complex. However, the added oligonucleotide
`
`
`
`WO 2005/059097
`
`PCT/US2004/041478
`
`adds essentially no mass or surface area to the complex. Assuming the zeta-potential of
`
`the dielectric bead is constant, the addition of an oligonucleotide strand is the only
`
`contribution to the increase in electrophoretic force felt by the particle. The increased
`
`electrophoretic moves the bead from its equilibrium position, and the magnetic field
`
`gradient must be increased to restore the head to its equilibrium position. Optically
`
`determined bead velocity and restoration force correspond to the number of bases added.
`
`Therefore, the length of the added strand can be ensured to be correct. Optical detection
`
`can be by way of a CCD or split-photodiode. This scheme in can also be modified to
`
`employ optical tweezers to apply an optical force rather than a magnetic force.
`
`Furthermore, this method can utilize coupled magneto—optical tweezers. The optical and
`
`magnetic forces can be created simultaneously or independently of one another.
`
`[0024] Another preferred embodiment of the invention also provides methods for
`
`detecting and correcting errors that arise in the process of constructing long nucleic acid
`
`molecules. A preferred embodiment of the invention utilizes electrophoresis as a force-
`
`feedback system. In this scheme, a single strand of DNA is synthesized on a fluorescent
`
`bead functionalized with a single phosphate group, and electrophoretically passed
`
`through a medium with excess ATP, kinase, and ligase. The rate of motion of the bead is
`
`monitored and used as the feedback mechanism. First, excess ATP is passed through the
`
`medium simultaneously (with the bead). Excess ATP will pass through the medium
`
`much faster than the bead. The kinase will catalyze the formation of a triphosphate on
`
`the bead using ATP. When this occurs, the rate of motion of the bead will change, due to
`
`a change in the charge/mass ratio. The measurement of this change thus serves to
`
`indicate a successful reaction. Once the triphosphate has formed on the bead, excess free
`
`nucleotide is passed through the medium. These small molecules will pass through the
`
`medium much faster than the bead. DNA ligase will catalyze the addition of the
`
`nucleotide, releasing a diphosphate. The rate of motion of the bead is reduced because
`
`the loss of the diphosphate decreases the charge/mass ratio. This serves as feedback for
`
`base addition. Multiple-nucleotide addition in this step should not occur because after
`
`one addition, there is no triphosphate present in the system, which DNA ligase needs to
`
`
`
`WO 2005/059097
`
`PCT/US2004/041478
`
`add the base. Once a successful nucleotide addition is detected, more ATP is introduced
`
`into the system and the described cycle repeats.
`
`[0025] Another preferred embodiment of the invention uses heat as an additional
`
`feedback and error correction mechanism in force feedback systems. Prior to enzymatic
`
`ligation, the melting point of the small oligonucleotide in contact with the growing
`
`nucleic acid strand will be lowered if base-pair mismatches occur. The controlled
`
`application of heat after detected annealing can provide additional feedback about base-
`
`pair mismatches. If the oligonucleotide dehybridizes from the growing strand as the
`
`melting point is approached, but not reached, a base-pair mismatch is detected when a
`
`decrease in magnetophoretic force, or increase in electrophoretic force is required to keep
`
`the head in equilibrium. Because the erroneous strand is removed by heat, this feedback
`
`process is also an error-correction mechanism.
`
`[0026] Another preferred embodiment of the invention utilizes exonuclease activity for
`
`nucleotide removal for error-correction in force-feedback systems. This type of error-
`
`correction is particular useful for correcting errors after enzymatic ligation of an
`
`erroneous strand. Whereas it would be extremely difficult to control the exact number of
`
`nucleotides that exonuclease removes from the 3’-end of a growing strand of nucleic
`
`acid, that level of control is not required in the methods reported herein because the
`
`feedback systems allow for the length of the strand to be determined after the error-
`
`correction steps. Therefore, if too many nucleotides are initially removed, they may be
`
`added back later.
`
`[0027] A novel aspect of the invention accounts for the potential that an error may
`
`occur that cannot be detected or corrected by the use of parallel detection. The
`
`parallelization of single-molecule systems is desirable to ensure that the process is
`
`successful and also allows for various nucleic acids of different sequences to be
`
`synthesized simultaneously. Parallel single-molecule systems may use arrays of light
`
`sources and detectors. Parallel single-molecule systems using only one light source and
`
`detector are also possible.
`
`
`
`WO 2005/059097
`
`PCT/US2004/041478
`
`[0028]
`
`Parallel detection may also be performed without the use of arrays. Single-
`
`molecule systems in which the solid-phase supports have negligible interactions can be
`
`parallelized without the use of arrays. For example, optical tweezers may be employed in
`
`the single—molecule system as described in figure 9B. Multiple beads in the same
`
`microscope field of View are trapped by rastering the laser beam using an acoustical-
`
`optical modulator (AOM).
`
`In another example, multiple beads may be tracked using only
`
`one CCD camera. The ability to control beads independently is not available in this
`
`system. However, beads with erroneous nucleic acids can be tracked and discarded after
`
`the entire process is complete.
`
`[0029] Another novel aspect of this invention provides methods for the
`
`microfabrication of electromagnet arrays. The area density of electromagnet arrays is
`
`maximized if the electromagnets are fabricated by bulk-microfabrication techniques.
`
`First, a layer of diagonal metal wires are lithographically defined and deposited on a
`
`silicon substrate. Bond pads are also defined in this first step. Then, a film of soft
`
`magnetic material is lithographically designed and deposited over a portion of the metal
`
`lines. A second layer of metal lines are lithographically defined and deposited over the
`
`magnetic film layer to complete the microfabrication of in-plane microelectromagnets.
`
`[0030] A preferred embodiment of the invention provides a method for error detection
`
`and correction using a nanopore device for single-molecule synthesis with feedback using
`
`fluorescent 5’ protecting groups. DNA is synthesized on a non-fluorescent solid support
`
`and passes through a sub-micron size opening, known as a nanopore, with a fluorescence
`
`detector. The head can be directed to one of two channels by a switch, depending on
`
`whether a successful addition has occurred. Afier the coupling step and removal of
`
`excess reagents, the bead is passed through the pore. If no fluorescence is detected, either
`
`the coupling reaction was unsuccessful, or it was successful but not detected. The bead is
`
`directed back into the device for another coupling step. Because the 5’ end of the
`
`growing strand is protected, a redundant coupling step will not result in multiple—base
`
`addition.
`
`10
`
`
`
`WO 2005/059097
`
`PCT/US2004/041478
`
`[0031] Another preferred embodiment of the invention provides a method for error
`
`detection and correction using uses a nanopore device for single-molecule synthesis with
`
`feedback using fluorescent 5’ protecting groups. Monitoring the deprotection of the 5’
`
`group is necessary to eliminate deletion errors. In this device, the growing strand is
`
`deprotected, and the wash is flowed through the nanopore, not the bead, and the nanopore
`
`only leads to one channel. If no fluorescence is detected in the wash, then the strand was
`
`not deprotected, or it was successfully deprotected but the fluorescent protecting group
`
`was not detected. The wash is constantly recycled until a fluorescent group is detected.
`
`Because there are no free nucleotides (only the growing strand) in this device, no addition
`
`error can occur by redundant 5’ deprotection steps.
`
`[0032] A novel aspect of the invention allows for independent control of a cluster of
`
`superparamagnetic beads by an electric field and opposing magnetic field gradient. The
`
`electrophoretic force moves the beads in one direction, and the magnetic field gradient
`
`moves the beads in the opposite direction.
`
`[0033] The present invention provides methods utilizing biological molecules for
`
`detecting and correcting errors that arise in the process of constructing long nucleic acid
`
`molecules. In one preferred embodiment of the invention, mismatch recognition can be
`
`used to control the errors generated during oligonucleotide synthesis, gene assembly, and
`
`the construction of nucleic acids of different sizes. One of ordinary skill in the art will
`
`understand mismatch to mean a single error at the sequence position on one strand which
`
`gives rise to a base mismatch (non-complementary bases aligned opposite one another in
`
`the oligonucleotide), causing a distortion in the molecular structure of the molecule. In
`
`one aspect of the invention, mismatch recognition is achieved through the use of
`
`mismatch binding proteins (MMBP). The MMBP binds to a mismatch in a DNA duplex;
`
`the MMBP-bound DNA complex is then removed using methods of protein purification
`
`well known to those having ordinary skill in the art. Another aspect of the invention
`allows for separation of the MNIBP-bound DNA complex using a difference in mobility,
`
`such as by size-exclusion column chromatography or gel electrophoresis. For methods
`
`and materials known in the art related to DNA mismatch detection, see e. g., Biswas, I.,
`
`ll
`
`
`
`WO 2005/059097
`
`PCT/U52004/041478
`
`Hsieh, P., Interaction ofMutS Protein with the Major and Minor Grooves ofa
`
`Heteroduplex DNA, JOURNAL OF BIO. CHEMISTRY, Vol. 272, No. 20, 13355-13364
`
`(1997); Eisen, J .A., A Phylogenomic Study ofthe MutS Family ofProteins, NUCLEIC
`
`ACIDS RESEARCH, Vol. 26, No. 18, 4291-4300 (1998); Beaulieu, M., Larson, G.P., Geller,
`
`L., Flanagan, S.D., Krontiris, T.G., PCR Candidate Region Mismatch Scanning:
`
`Adaption to Quantitative, High-Throughput Genotyping, NUCLEIC ACIDS RESEARCH, Vol.
`
`29, No. 5, 1 1 14—1 124 (2001); Smith, J ., Modrich, P., Removal ofPolymerase—Produced
`
`Mutant Sequencesfrom PCR Products, PROC. NATL. ACAD. SCI., 94, 6847—6850 (1997);
`
`Smith, J., Modrich, P., Mutation Detection with MutH, MutL, and MutS Mismatch Repair
`
`Proteins, PROC. NATL. ACAD. SCI., 93, 4374-4379 (1996); and Bjornson, K.P., Modrich,
`
`P., Differential and Simultaneous Adenosine Di- and Triphosphate Binding by MutS,
`
`JOURNAL OF BIO. CHEMISTRY, Vol. 278, No. 20, 18557-18562 (2003), all of which are
`
`hereby incorporated by reference. For patents relating to DNA mismatch repair systems,
`
`see e.g., US. Pat. 6,008,031, 5,922,539, 5,861,482, 5,858,754, 5,702,894, 5,679,522,
`
`5,556,750, 5,459,039, all hereby incorporated by reference.
`
`[0034]
`
`In another aspect of the invention, a MMBP can be irreversibly complexed to an
`
`error containing DNA sequence by the action of a chemical crosslinking agent. The pool
`
`of DNA sequences is then amplified, but those containing errors are blocked from
`
`amplification, and quickly become outnumbered by the increasing error-free sequences.
`
`In another aspect of the invention, DNA methylation may be used for strand-specific
`
`error correction. Methylation and site-specific demethylation are employed to produce
`
`DNA strands that are selectively herni-methylated. A methylase is used to uniformly
`
`methylate all potential target sites on each strand, which are then dissociated and allowed
`
`to re-anneal with new partner strands. A MMBP with demethylase complex is applied,
`
`which binds only to the mismatch. The demethylase portion of the complex removes
`
`methyl groups only near the site of the mismatch. A subsequent cycle of dissociation and
`
`annealing allows the demethylated error-containing strand to associate with a methylated
`
`error free strand. The hemi-methylated DNA duplex now contains all the information
`
`12
`
`
`
`WO 2005/059097
`
`PCT/US2004/041478
`
`needed to direct the repair of the error, employing the components of a DNA mismatch
`
`repair system.
`
`[0035]
`
`In another aspect of the invention, local DNA on both strands at the site of a
`
`mismatch may be removed and resynthesized to replace the mismatch error. For
`
`example, a MMBP fusion to a non-specific nuclease (N) can bind to a mismatch site on
`
`DNA, forming a MIVIBP-nuclease DNA complex. The complex can then direct the
`
`action of the nuclease to the mismatch site, and cleave both strands. Once the break is
`
`generated, homologous recombination can be employed to use other, error—free strands as
`
`template to replace the excised DNA. Other mechanisms of DNA synthesis well known
`
`in the art, such as strand invasion and branch migration, may also be used to replace the
`
`excised DNA. Alternatively, a polymerase can be employed to allow broken strands to
`
`reassociate with new full-length partner strands, synthesizing new DNA to replace the
`
`error.
`
`In another aspect of the invention, the MMBP-nuclease—excised DNA complex can
`
`be physically separated from the remaining, error free DNA using various techniques
`
`well known in the art. For methods and materials known in the art related to nucleases
`
`and filSlOl‘l proteins, see e.g., Kim, Y., Chandrasegaran, S., Chimeric Restriction
`
`Endonucleases, PROC. NATL. ACAD. SCI., 91, 883-887 (1994); Kim, Y., Shi, Y., Berg,
`
`J .M., Chandrasegaran, S., Site-Specific Cleavage ofDNA-RNA Hybrids by Zinc
`
`Finger/Fokl Cleavage Domain Fusions, GENE, 203, 43-49 (1997); Li, L., Wu, L.P.,
`
`Chandrasegaran,