`
`(19) World Intellectual Property
`Organization
`International Bureau
`
`(43) International Publication Date
`15 November 2012 (15.11.2012)
`
`:3
`
`Wt
`
`WIPOI PCT
`
`(51)
`
`International Patent Classification:
`CIZQ 1/68 (2006.01)
`
`(21)
`
`International Application Number:
`
`PCT/U820] 1/057075
`
`(72)
`(75)
`
`(22)
`
`International Filing Date:
`
`20 October 2011 (20.10.2011)
`
`(10) International Publication Number
`
`WO 2012/154201 A1
`
`[US/US]; 17 Quincy Street,
`02138 (US),
`
`Cambridge, Massachusetts
`
`Inventors; and
`Inventors/Applicants MN" US only): CHURCH, George
`M. [US/US]; 218 Kent Street, Brookline, Massachusetts
`02446 (US). KOSURI. Sriram [US/US]; 92 Sciarappa
`Street,
`Cambridge, Massachusetts
`02 14 1
`(US).
`EROSHENKO, Nikolai [US/US]; 90 Kilsy‘th Road, Bo-
`ston, Massachusetts 02135 (US).
`
`(74)
`
`Agent: IWANICKI, John P.; Banner & Witcoff, Ltd., 28
`State Street, Suite 1800, Boston, Massachusetts 02109
`(US).
`
`Filing Language:
`
`Publication Language:
`
`English
`
`linglish
`
`Priority Data:
`61/405,801
`
`22 October 2010 (22.10.2010)
`
`US
`
`(25)
`
`(26)
`
`(30)
`
`(71)
`
`Applicant (for all designated States except US): PRESID-
`ENT AND FELLOWS OF HARVARD COLLEGE
`
`(81)
`
`Designated States (unless otherwise indicated, for every
`kind of national protection available): AE, AG, AL, AM,
`AO, AT, AU, AZ, BA, BB, BG, BH, BR, BW, BY, BZ,
`
`[Continued on nextpagej
`
`(54) Title: ORTHOGONAL AMPLIFICATION AND ASSEMBLY OF NUCLEIC ACID SEQUENCES
`
`(57) Abstract: Methods and compositions for synthesizing nucleic
`acid sequences of interest from heterogeneous mixtures of oligonuc-
`leotide sequences are provided.
`
`
`
`A semsty Sebpmateit
`illé
`AAAAAA
`1 Aacesseésanmms
`m m
`(are
`m m
`
`
`m m
`
`f
`
`Assembiee Gases
`
`figure i
`
`
`
`wo2012/154201A1|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
`
`
`
`WO 2012/154201 A1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
`
`CA, CH, CL, CN, CO, CR, CU, CZ, DE, DK, DM, DO,
`DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT,
`HN, HR, HU, ID, IL, IN, IS, JP, KE, KG, KM, KN, KP,
`KR, Kz, LA, LC, LK, LR, LS, LT, LU, LY, MA, MD,
`ME, MG, MK, MN, MW, MX, MY, MZ, NA, NG, N1,
`N0, NZ, OM, PE, PG, PH, PL, PT, QA, R0, RS, RU,
`RW, SC, SD, SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ,
`TM, TN, TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA,
`ZM, zw,
`
`(84)
`
`Designated States (unless otherwise indicated, for every
`kind of regional protection available): ARIPO (BW, GH,
`GM, KE, LR, LS, MW, MZ, NA, RW', SD, SL, SZ, TZ,
`UG, ZM, ZW), Eurasian (AM, AZ, BY, KG, KZ, MD,
`
`RU, TJ, TM), European (AL, AT, BE, BG, CH, CY, CZ,
`DE, DK, EE, ES, FI, FR, GB, GR, HR, HU, IE, IS, IT,
`LT, LU, LV, MC, MK, MT, NL, N0, PL, PT, Ro, RS,
`SE, S1, SK, SM, TR), 0AP1(BF, BJ, CF, CG, CI, CM,
`GA, GN, GQ, GW, ML, MR, NE, SN, TD, TG).
`Published:
`
`with international search report (Art. 21(3))
`
`before the expiration of the time limit for amending the
`claims and to be republished in the event of receipt of
`amendments (Rule 48.2(h))
`
`with sequence listing part oj'deseription (Rule 52(0))
`
`
`
`WO 2012/154201
`
`PCT/USZOll/057075
`
`ORTHOGONAL AMPLIFICATION AND ASSEMBLY OF NUCLEIC ACID
`
`SEQUENCES
`
`RELATED APPLICATION DATA
`
`[001]
`
`This application claims priority to US. Provisional Patent Application No.
`
`61/405,801 filed on October 22, 2010 and is hereby incorporated herein by reference
`
`in its entirety for all purposes.
`
`STATEMENT OF GOVERNMENT INTERESTS
`
`[002]
`
`This invention was made with government support under N000141010144 awarded
`
`by the Office of Naval Research, FG02-02ER63445 awarded by the department of
`
`Energy, W911NF—08-1—0254 awarded by the Defense Advanced Research Projects
`
`Agency, and HG003170 awarded by the National Institutes of Health.
`
`The
`
`government has certain rights in the invention.
`
`BACKGROUND
`
`Field of the Invention
`
`[003]
`
`Embodiments of the present invention relate in general to methods and compositions
`
`for amplifying and assembling nucleic acid sequences.
`
`Description of Related Art
`
`[004]
`
`The development of inexpensive, high-throughput and reliable gene synthesis
`
`methods will broadly stimulate progress in biology and biotechnology (Carr & Church
`
`(2009) Nat. Biotechnol. 27:1151). Currently,
`
`the reliance on column-synthesized
`
`oligonucleotides as a source of DNA limits further cost reductions in gene synthesis
`
`(Tian et al. (2009) Mol. BioSyst. 5:714). Oligonucleotides from DNA microchips can
`
`reduce costs by at least an order of magnitude, yet efforts to scale microchip use have
`
`been largely unsuccessful due to the high error
`
`rates and complexity of the
`
`1
`
`
`
`WO 2012/154201
`
`PCT/US2011/057075
`
`oligonucleotide mixtures (Tian et a1. (2004) Nature 432: 1050; Richmond et a1. (2004)
`
`Nucleic Acids Res. 3225011; Zhou et a1. (2004) Nucleic Acids Res. 32:5409).
`
`[005]
`
`The synthesis of novel DNA encoding regulatory elements, genes, pathways, and
`
`entire genomes provides powerful ways to both test biological hypotheses as well as
`
`harness biology for humankind’s use.
`
`For example,
`
`since the initial use of
`
`oligonucleotides in deciphering the genetic code, DNA synthesis has engendered
`
`tremendous progress in biology with the recent complete synthesis of a viable
`
`bacterial genome (Nirenberg et al. (1961) Proc. Natl. Acad. Sci. USA 47:1588; Still et
`
`al. (1965) Proc. Natl. Acad. Sci. USA 54:1378; Gibson et al. (2010) Science 329:52).
`
`Currently, almost all DNA synthesis relies on the use of phosphoramidite chemistry
`
`on controlled-pore glass (CPG) substrates. CPG oligonucleotides synthesized in this
`
`manner are effectively limited to approximately 100 bases by the yield and accuracy
`
`of the process. Thus, the synthesis of gene-sized fragments relies on assembling many
`
`oligonucleotides together using a variety of techniques termed gene synthesis (Tian
`
`(2009) (supra); Gibson (supra); Gibson (2009) Nucleic Acids Res. 37:6984; Li &
`
`Elledge (2007) Nat. Methods 4:251; Bang & Church (2008) Nat. Methods 5:37; Shao
`
`et a1. (2009) Nucleic Acids Res. 37:e16).
`
`[006]
`
`The price of gene synthesis has reduced drastically over the last decade as the process
`
`has become increasingly industrialized. However, the current commercial price of
`
`gene synthesis, approximately $0.40—l.00/bp, has begun to approach the relatively
`
`stable cost of the CPG oligonucleotide precursors (approximately $0.10-O.20/bp)
`
`(Carr (supra)). At these prices, the construction of large gene libraries and synthetic
`
`genomes is out of reach to most. To achieve further cost reductions, many current
`
`efforts focus on smaller volume synthesis of oligonucleotides in order to minimize
`
`reagent costs. For example, microfluidic oligonucleotide synthesis can reduce reagent
`
`cost by an order of magnitude (Lee et a1. (2010) Nucleic Acids Res. 38:2514).
`
`[007]
`
`Another route is to harness existing DNA microchips, which can produce up to a
`
`million different oligonucleotides on a single chip, as a source of DNA for gene
`
`synthesis. Previous efforts have demonstrated the ability to synthesize genes from
`
`2
`
`
`
`WO 2012/154201
`
`PCT/U52011/057075
`
`DNA microchips. Tian et al. described the assembly of 14.6 kb of novel DNA from
`
`292 oligonucleotides synthesized on an Atactic/Xeotron chip (Tian (2004) (supra)).
`
`The process involved using 584 short oligonucleotides synthesized on the same chip
`
`for hybridization-based error correction. The resulting error rates were approximately
`
`1/160 basepairs (bp) before error correction and approximately 1/ 1400 bp after. Using
`
`similar chips, Zhou et al. constructed approximately 12 genes with an error rate as low
`
`as 1/625 bp (Zhou (supra)). Richardson et al. showed the assembly of an 180 bp
`
`construct from eight oligonucleotides synthesized on a microarray using maskless
`
`photolithographic deprotection (Nimblegen) (Richmond (supra)). Though the error
`
`rates were not determined in that study, a follow-up construction of a 742 bp green
`
`fluorescent protein (GFP) sequence using the same process showed an error rate of
`
`1/20 bp — 1/70 bp (Kim et al.
`
`(2006) Microelectronic Eng. 83:1613).
`
`These
`
`approaches have thus far failed to scale for at least two reasons. First, the error rates
`
`of chip-based oligonucleotides from DNA microchips are higher than traditional
`
`column-synthesized oligonucleotides.
`
`Second,
`
`the assembly of gene fragments
`
`becomes increasingly difficult as the diversity of the oligonucleotide mixture becomes
`
`larger.
`
`SUMMARY
`
`[008]
`
`The present invention provides methods and compositions to enrich one or more
`
`oligonucleotide sequences (e.g., DNA and/or RNA sequences) and assemble large
`
`nucleic acid sequences of interest (e.g., DNA and/or RNA sequences (e.g., genes,
`
`genomes and the like)) from complex mixtures of oligonucleotide sequences. The
`
`present invention fiirther provides methods for generating oligonucleotide primers
`
`(e.g., orthogonal primers) that are useful for synthesizing one or more nucleic acid
`
`sequences of interest (e.g., gene(s), genome(s) and the like).
`
`[009]
`
`In certain exemplary embodiments, microarrays including at
`
`least 5,000 different
`
`oligonucleotide sequences are provided.
`
`Each oligonucleotide sequence of the
`
`microarray is a member of one of a plurality of oligonucleotide sets, and each
`
`oligonucleotide set is specific for a nucleic acid sequence of interest (e.g., a single
`
`nucleic acid sequence of interest). Each oligonucleotide sequence that is a member of
`
`3
`
`
`
`WO 2012/154201
`
`PCT/US2011/057075
`
`a particular oligonucleotide set includes a pair of orthogonal primer binding sites
`
`having a sequence that
`
`is unique to said oligonucleotide set. The nucleic acid
`
`sequence of interest is at least 500 nucleotides in length.
`
`In certain aspects, at least
`
`50, at least 100, or more oligonucleotide sets are provided wherein each set is specific
`
`for a unique nucleic acid sequence of interest.
`
`In other aspects, the oligonucleotide
`
`sequence of interest is at least 1,000, at least 2,500, at least 5,000, or more nucleotides
`
`in length.
`
`In still other aspects, the nucleic acid sequence of interest is a DNA
`
`sequence, e.g., a regulatory element, a gene, a pathway and/or a genome. In still other
`
`aspects, the microarray includes at least 10,000 different oligonucleotide sequences
`
`attached thereto.
`
`[010]
`
`In certain exemplary embodiments, a microarray comprising at least 10,000 different
`
`oligonucleotide sequences attached thereto is provided.
`
`Each oligonucleotide
`
`sequence of the microarray is a member of one of at least 50 oligonucleotide sets, and
`
`each oligonucleotide set is specific for a nucleic acid sequence of interest. Each
`
`oligonucleotide sequence that is a member of a particular oligonucleotide set includes
`
`a pair of orthogonal primer binding sites having a sequence that is unique to said
`
`oligonucleotide set.
`
`Each nucleic acid sequence of interest
`
`is at
`
`least 2,500
`
`nucleotides in length.
`
`[011]
`
`In certain exemplary embodiments, methods of synthesizing a nucleic acid sequence
`
`of interest are provided. The methods include the steps of providing at least 5,000
`
`different oligonucleotide sequences, wherein each oligonucleotide sequence is a
`
`member of one of a plurality of oligonucleotide sets, and each oligonucleotide set is
`
`specific for a nucleic acid sequences of interest. Each oligonucleotide sequence
`
`includes a pair of orthogonal primer binding sites having a sequence that is unique to a
`
`single oligonucleotide set.
`
`The methods
`
`includes the step of amplifying an
`
`oligonucleotide set using orthogonal primers that hybridize to the orthogonal primer
`
`binding sites unique to the set, and removing the orthogonal primer binding sites from
`
`the amplified oligonucleotide set. The methods further include the step of assembling
`
`the amplified oligonucleotide set into a nucleic acid sequence of interest that is at least
`
`500 nucleotides in length. In certain aspects, the nucleic acid sequence of interest is at
`
`
`
`WO 2012/154201
`
`PCT/US2011/057075
`
`least 1,000, at least 2,500, at least 5,000, or more nucleotides in length.
`
`In other
`
`aspects, the nucleic acid sequence of interest is a DNA sequence, e.g., a regulatory
`
`element, a gene, a pathway and/or a genome.
`
`In yet other aspects, 50, 100, 500, 750,
`
`1,000 or more oligonucleotide sets are provided, wherein each set is specific for a
`
`unique nucleic acid sequence of interest.
`
`In still other aspects, the 5,000 different
`
`oligonucleotide sequences are provided on a microarray and, optionally, the 5,000
`
`different oligonucleotide sequences can be removed from the microarray prior to the
`
`step of amplifying.
`
`[012]
`
`Further features and advantages of certain embodiments of the present invention will
`
`become more fully apparent in the following description of the embodiments and
`
`drawings thereof, and fiom the claims.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[013]
`
`The patent or application file contains at least one drawing executed in color. Copies
`
`of this patent or patent application publication with color drawing(s) will be provided
`
`by the Office upon request and payment of the necessary fee. The foregoing and other
`
`features and advantages of the present invention will be more fully understood from
`
`the following detailed description of illustrative embodiments taken in conjunction
`
`with the accompanying drawings in which:
`
`[014]
`
`Figures 1A-1F schematically depict scalable gene synthesis platform schematic for
`
`OLS Pool 2. Pre-designed oligonucleotides (no distinction is made between dsDNA
`
`and ssDNA in the figure) are synthesized on a DNA microchip (A) and then cleaved
`
`to make a pool of oligonucleotides (B). Plate-specific primer sequences (shades of
`
`yellow) are used to amplify separate plate subpools (C) (only two are shown), which
`
`contain DNA to assemble different genes (only three are shown for each plate
`
`subpool). Assembly specific sequences (shades of blue) are used to amplify assembly
`
`subpools (D) that contain only the DNA required to make a single gene. The primer
`
`sequences are cleaved (E) using either Type 118 restriction enzymes (resulting in
`
`dsDNA)
`
`or by DpnII/USER/Qt
`
`exonuclease processing (producing ssDNA).
`
`Construction primers (shown as white and black sites flanking the full assembly) are
`
`5
`
`
`
`WO 2012/154201
`
`PCT/U52011/057075
`
`then used in an assembly PCR reaction to build a gene from each assembly subpool
`
`(F). Depending on the downstream application the assembled products are then
`
`cloned either before or after an enzymatic error correction step.
`
`[015]
`
`Figures 2A-2D depict gene synthesis products. GFPmut3 was PCR assembled (A)
`
`from two different assembly subpools (GFP42 and GFP35) that were amplified from
`
`OLS Pool 1. Because the majority of the products were of the wrong size, the full-
`
`length assemblies were gel purified and re—amplified (B). Using the longer
`
`oligonucleotides in OLS Pool 2 a PCR assembly protocol was developed that did not
`
`require gel-isolation. This protocol was used to build three different fluorescent
`
`proteins (C). The building of 42 scFv regions that contained challenging GC—rich
`
`linkers was then attempted. Of the 42 assemblies (D), 40 resulted in strong bands of
`
`the correct size. The two that did not assemble (7 and 24) were gel isolated and re-
`
`amplified, resulting in bands of the correct size (see Supplementary Fig. 8b online).
`
`The antibody that corresponds to each number is given in Table 3. The sequences
`
`above each assembly represent the amino acid linker sequence used to link heavy and
`
`light chains in the scFv fragments.
`
`[016]
`
`Figures 3A-5'B graphically depict products obtained from OLS Pool 1 and OLS Pool
`
`2. The percentage of fluorescent cells resulting flom synthesis products derived from
`
`column-synthesized oligonucleotides (black), OLS Chip 1 subpools GFP43 and
`
`GFP35 (green) and the three fluorescent proteins produced on OLS Chip 2 with and
`
`without ErrASE treatment (blue, yellow, and orange) are shown (A). The error bars
`
`correspond to the range of replicates from separate ligations. The error rates (average
`
`bp of correct sequence per error) from various synthesis products are shown (B).
`
`Error bars show the expected Poisson error based on the number of errors found
`
`(iVn). Deletions of more than 2 consecutive bases are counted as a single error (no
`
`such errors were found in OLS Pool 1).
`
`[017]
`
`Figure
`
`4A-4B depict
`
`the
`
`amplification and processing of OLS Pool
`
`1
`
`oligonucleotides. Two assembly subpools and two control subpools were amplified
`
`from OLS Pool 1, which contained a total of 13,000 nucleotides (A). Because the
`
`
`
`WO 2012/154201
`
`PCT/U52011/057075
`
`oligonucleotides in the two GFP subpools had sizes distinct from all other nucleotides
`
`on the chip, and since no oligonucleotides of the incorrect size were detected, these
`
`data indicate that the oligonucleotides from other subpools did not amplify. The
`
`dsDNA subpools were then processed using Dan/USER/lambda exonuclease (B).
`
`After processing, three types of products were obtained. First, there were the products
`
`of the expected size. Second, there were the high molecular weight products that
`
`corresponded to oligonucleotides that retained one or both of the assembly-specific
`
`primer sites.
`
`Last,
`
`there were the low molecular weight products that, without
`
`intending to be bound by scientific theory, were likely produced by Dan cleavage at
`
`double stranded recognition sites
`
`formed by the overlapping regions of the
`
`oligonucleotides. The same dsDNA ladder (Low Molecular Weight, New England
`
`Biolabs, Ipswich, MA) was used in both the non-denaturing (A) and the denaturing
`
`(B) 10% PAGE gels, where the denaturing agent produced the extra bands in the
`
`ladder (B).
`
`[018]
`
`Figure 5 depicts GFP assembly from OLS Pool 1. The two OLS Pool
`
`1 GFP
`
`assembly subpools were amplified, processed and PCR assembled (See Figure 3A).
`
`The bands corresponding to full-length assembly products were then gel-isolated and
`
`re-amplified. The re-amplification products shown contained low molecular weight
`
`products that, without intending to be bound by scientific theory, likely remained in
`
`trace amounts afier gel isolation. These significantly greatly increased the number of
`
`clones that needed to be sequences in order to identify a perfect GFPmut3 construct.
`
`The control GFP was amplified from a cloned GFP. GFP20 was an assembly made
`
`from a hand mixed pool of oligonucleotides synthesized using a column-based
`
`method. GFP20 was not gel isolated or re-amplified.
`
`[019]
`
`Figures 6A-6C graphically depict screening error rates of GFP assemblies. Error
`
`rates from the first set (gel-isolated and re-amplified) (A), the second set (gel-isolated
`
`without re-amplification) (B), and the error-corrected second set of GFP assemblies
`
`from OLS Pool 1 (C) were determined using flow cytometry, by counting colonies on
`
`agar plates, and by sequencing individual clones. Error bars give the range of the
`
`observed values.
`
`11 corresponds to the number of electroporated cultures from one or
`
`
`
`WO 2012/154201
`
`PCT/U52011/057075
`
`more ligation reactions performed on a single assembly reaction, with n = 3-4 in (A) n
`
`= 3 in (B), and n = 2 in (C).
`
`[020]
`
`Figure 7 graphically depicts the dynamic range of the flow cytometry screen. The
`
`relationship between the fluorescent fraction observed with flow cytometry is shown
`
`as a function of the fraction of perfect assemblies predicted from the sequencing data,
`
`with each data point corresponding to individual samples constructs built from the
`
`OLS Pool 1 (the same data are shown in Figure 6). The black line indicates the result
`
`expected if the sequencing and fluorescent data predicted each other perfectly.
`
`[021]
`
`Figures 8A-8C depict processing of OLS 2 assembly subpools. Assembly-specific
`
`primers were used to amplify the subpools that were designed to build three different
`
`fluorescent proteins (A). A BtsI restriction enzyme was used to remove the priming
`
`sites (B). The same protocol was followed in processing the antibody assembly
`
`subpools, with (C) depicting the subpools after the Btsl digest. The gel in (C) depicts
`
`only 38 subpools because four antibody subpools evaporated from the reaction tubes
`
`during PCR, and had to be re-amplified in a separate experiment.
`
`[022]
`
`Figures 9A-9B graphically depict optimization of enzymatic synthesis error removal
`
`with ErrASE (Novici Biotech, Vacaville, CA). mCitrine synthesized from OLS Pool
`
`2 was treated with ErrASE, and the fluorescent fraction was quantified with flow
`
`cytometry (A). The different ErrASE reactions corresponded to varying quantities of
`
`error-removing enzymes, with ErrASE 1 having the most and ErrASE 6 the least.
`
`Error bars give the range of the data points, with n = 2 or 4 for the control and the
`
`mCitrine constructs, respectively.
`
`Increasing both the length of ErrASE treatment
`
`from 1
`
`to 2 hours did not
`
`lead to a major decrease in error rates (B).
`
`“NO
`
`PRODUCT” indicates that the post-ErrASE amplification did not produce a product
`
`of the correct size. Without intending to be bound by scientific theory, this was most
`
`likely because the ErrASE error removing enzymes over-digested the assembly. Each
`
`value is an average of independent flow cytometry runs performed on five (A) or three
`
`(B) aliquots of the cloned assemblies.
`
`
`
`WO 2012/154201
`
`PCT/U52011/057075
`
`[023]
`
`Figures 10A-101 depict optimization of the antibody assembly protocol. First, each
`
`antibody assembly subpool was subjected to 15 PCR cycles in the presence of KOD
`
`DNA polymerase, but in the absence of construction primers. Next, the construction
`
`primers and each assembly was diluted in another PCR mix. Shown are the 2%
`
`agarose gels of the following assembly protocols:
`
`(A) KODl;
`
`(B) KOD2;
`
`(C)
`
`KODXL60; (D) KODXL65; (E) Phusion62; (F) Phusion 67; (G) Phusion 72; (H)
`
`Phusion 62B; (I) Phusion67B. A 1 kb Plus DNA Ladder (Invitrogen. Carlsbad, CA)
`
`was used as a size marker in all experiments.
`
`[024]
`
`Figure 1] depicts antibody assemblies that were screened. Here, eight of the 42
`
`assembled scFv fragments were error-corrected with ErrASE, gel isolated, and re-
`
`amplified, generating the products shown. The constructs were subsequently cloned
`
`and sequenced (Table 3).
`
`[025]
`
`Figures 12A-12B depicts gels showing antibody assemblies.
`
`(A) The first assembly
`
`reaction resulted in 29 out of 42 antibody assembly reactions yielding products of the
`
`correct size. The antibody that corresponds to each number is listed in Table 3.
`
`Increasing the assembly subpool concentration used in the assembly reaction increased
`
`the number of successful assemblies to 40 (see Figure 2D). The two failures from the
`
`second set of assembly reactions were gel-isolated and re-amplified, yielding products
`
`of the correct size (B).
`
`[026]
`
`Figures 13A-13B graphically depict the use of betaine during the ErrASE melt and re-
`
`anneal step. A set of synthesized antibodies (synthesis products shown in Figure 2D)
`
`was treated with ErrASE, with betaine either included or left out of the melting and
`
`re-annealing step. The resulting error rate (A) and the probability of a synthesized
`
`molecule being either misassembled or having a large (3+ consecutive bp) deletion
`
`(B) was quantified. Error bars indicate the expected Poisson error (Vn, with n being
`
`the number of errors observed).
`
`[027]
`
`Figure 14 schematically depicts a full synthesis workflow according to certain aspects
`
`of the invention. The workflow was dependent on whether USER/DpnlI processing
`
`
`
`WO 2012/154201
`
`PCT/US2011/057075
`
`(left branch after oligo synthesis) or type HS enzymes (right branch) was used for
`
`removing the amplification sites. The process outlines a final optimized form of the
`
`optimized protocols. The times given in parentheses are estimates that account for
`
`both the time involved in setting up reactions and the time to complete the reaction.
`
`[028]
`
`Figure 15 schematically depicts OLS Pool
`
`1 assembly subpool amplification, and
`
`USER/Dpnll processing. Assembly subpools were amplified from OLS Pool 1 using
`
`20 bp priming sites that were shared by all primers in any particular assembly. A PCR
`
`reaction resulted in a pool of dsDNA with the oligos fi'om other assemblies still in
`
`ssDNA form and at trace concentrations. The forward subpool amplification primers
`
`incorporates two sequential phosphorothioate linkages on the 5' end,
`
`and a
`
`deoxyuridine its 3' end, while the reverse primer had a phosphate at its 5' end.
`
`Lambda exonuclease is a 5' to 3' exonuclease that degrades 5' phosphorylated DNA
`
`and is blocked by phosphorothioate. This property was used to selectively degrade the
`
`remove strand of the amplified products. USER (Uracil-Specific Excision Reagent)
`
`Enzyme (New England Biolabs,
`
`Ipswich, MA) removed the 5' priming site by
`
`excising the uracil and cutting 3' and 5' of the resulting apyrimidinic site. Next, the 3'
`
`end was annealed to the reverse amplification primer, forming a double-stranded
`
`Dan recognition site (5‘ GATC). The 3' priming site was then removed With a Dan
`
`digest.
`
`DETAILED DESCRIPTION
`
`[029]
`
`The present
`
`invention is based in part on the discovery that high-fidelity DNA
`
`microchips,
`
`selective oligonucleotide
`
`amplification, optimized gene
`
`assembly
`
`protocols, and enzymatic error correction can be used to develop a highly parallel
`
`nucleic acid sequence (e.g., gene) synthesis platform. Assembly of 47 genes,
`
`including 42 challenging therapeutic antibody sequences, encoding a total of
`
`approximately 35 kilobasepairs of DNA has been surprisingly achieved using the
`
`compositions and methods described herein. These assemblies were created fi'om a
`
`complex background containing 13,000 oligonucleotides encoding approximately 2.5
`
`megabases of DNA, which is at least 50 times larger than previous attempts known in
`
`the art. A number of features were discovered to play an important role to the
`
`10
`
`
`
`WO 2012/154201
`
`PCT/US2011/057075
`
`functionality of nucleic acid synthesis platform described herein, including the use of
`
`low-error starting material, well-chosen orthogonal primers, subpool amplification of
`
`individual assemblies, optimized assembly methods, and enzymatic error correction.
`
`[030]
`
`The present invention provides methods and compositions for the assembly of one or
`
`more nucleic acid sequences of interest
`
`from a large pool of oligonucleotide
`
`sequences.
`
`In certain exemplary embodiments, a nucleic acid sequence of interest is
`
`at least about 100, 200, 300, 400, 500 600, 700, 800, 900, 1,000, 1,500, 2,000, 2,500,
`
`3,000, 3,500, 4,000, 4,500, 5,000, 5,500, 6,000, 6,500, 7,000, 7,500, 8,000, 8,500,
`
`9,000, 9,500, 1,000,000, 2,000,000, 3,000,000, 4,000,000, 5,000,000, 6,000,000,
`
`7,000,000, 8,000,000, 9,000,000, 10,000,000 or more nucleic acids in length.
`
`In other
`
`exemplary embodiments, a nucleic acid sequence of interest is between 100 and
`
`10,000,000 nucleic acids in length,
`
`including any ranges therein.
`
`In yet other
`
`exemplary embodiments, a nucleic acid sequence of interest is between 100 and
`
`20,000 nucleic acids in length, including any ranges therein.
`
`In still other exemplary
`
`embodiments, a nucleic acid sequence of interest is between 100 and 25,000 nucleic
`
`acids in length, including any ranges therein. In other aspects, a nucleic acid sequence
`
`of interest is a DNA sequence such as, e.g., a regulatory element (e.g., a promoter
`
`region, an enhancer region, a coding region, a non-coding region and the like), a gene,
`
`a genome, a pathway (e.g., a metabolic pathway (e.g., nucleotide metabolism,
`
`carbohydrate metabolism, amino acid metabolism,
`
`lipid metabolism, co-factor
`
`metabolism, vitamin metabolism, energy metabolism and the like), a signaling
`
`pathway, a biosynthetic pathway, an immunological pathway, a developmental
`
`pathway and the like) and the like.
`
`In yet other aspects, a nucleic acid sequence of
`
`interest
`
`is the length of a gene, e.g., between about 500 nucleotides and 5,000
`
`nucleotides in length.
`
`In still other aspects, a nucleic acid sequence of interest is the
`
`length of a genome (e.g., a phage genome, a Viral genome, a bacterial genome, a
`
`fungal genome, a plant genome, an animal genome or the like).
`
`[031]
`
`Embodiments of the present
`
`invention are directed to oligonucleotide sequences
`
`having two or more orthogonal primer binding sites that each hybridizes to an
`
`orthogonal primer. As used herein, the term “orthogonal primer binding site” is
`
`11
`
`
`
`WO 2012/154201
`
`PCT/US2011/057075
`
`intended to include, but is not limited to, a nucleic acid sequence located at the 5'
`
`and/or 3' end of the oligonucleotide sequences of the present
`
`invention which
`
`hybridizes a complementary orthogonal primer. An “orthogonal primer pair” refers to
`
`a set of two primers of identical sequence that bind to both orthogonal primer binding
`
`sites at the 5' and 3' ends of each oligonucleotide sequence of an oligonucleotide set.
`
`Orthogonal primer pairs are designed to be mutually non-hybridizing to other
`
`orthogonal primer pairs, to have a low potential to cross-hybridize with one another or
`
`with oligonucleotide sequences, to have a low potential to form secondary structures,
`
`and to have similar melting temperatures (Tms) to one another. Orthogonal primer
`
`pair design and software useful for designing orthogonal primer pairs is discussed
`
`further herein.
`
`[032]
`
`As used herein, the term “oligonucleotide set” refers to a set of oligonucleotide
`
`sequences that has identical orthogonal pair primer sites and is specific for a nucleic
`
`acid sequence of interest.
`
`In certain aspects, a nucleic acid sequence of interest is
`
`synthesized from a plurality of oligonucleotide sequences
`
`that make up an
`
`oligonucleotide set.
`
`In other aspects, the plurality of oligonucleotide sequences that
`
`make up an oligonucleotide set are retrieved from a large pool of heterogeneous
`
`oligonucleotide sequences Via common orthogonal primer binding sites.
`
`In certain
`
`aspects, an article of manufacture (e.g., a microchip, a test tube, a kit or the like) is
`
`provided that includes a plurality of oligonucleotide sequences encoding a mixture of
`
`oligonucleotide sets.
`
`[033]
`
`In certain exemplary embodiments, at least 100, 200, 300, 400, 500, 600, 700, 800,
`
`900, 1,000, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 11,000,
`
`12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000, 19,000, 20,000, 21,000,
`
`22,000, 23,000, 24,000, 25,000,
`
`30,000, 35,000, 40,000, 45,000, 50,000, 55,000,
`
`60,000, 65,000, 70,000, 75,000, 80,000, 85,000, 90,000, 95,000, 100,000 or more
`
`different oligonucleotide sequences are provided.
`
`In certain aspects, between about
`
`2,000 and about 80,000 different oligonucleotide sequences are provided.
`
`In other
`
`aspects, between about 5,000 and about 60,000 different oligonucleotide sequences
`
`12
`
`
`
`WO 2012/154201
`
`PCT/US2011/057075
`
`are provided.
`
`In still other aspects, about 55,000 different oligonucleotide sequences
`
`are provided.
`
`[034]
`
`In certain exemplary embodiments, the oligonucleotide sequences are at least 20, 30,
`
`40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210,
`
`220, 230, 240, 250, 260, 270, 280, 290, 300, 350, 400, 450, 500, 550, 600, 650, 700,
`
`750, 800, 850, 900, 950, 1000 or more nucleotides in length.
`
`In certain aspects, the
`
`oligonucleotide sequences are between about 50 and about 500 nucleotides in length.
`
`In other aspects, the oligonucleotide sequences are between about 100 and about 300
`
`nucleotides in length.
`
`In other aspects, the‘oligonucleotide sequences are about 130
`
`nucleotides in length.
`
`In still other aspects, the oligonucleotide sequences are about
`
`200 nucleotides in length.
`
`[035]
`
`In certain exemplary embodiments, the oligonucleotide sequences encode at least 5,
`
`10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 200, 300,
`
`400, 500, 600, 700, 800, 900, 1,000, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000,
`
`9,000, 10,000 or more different oligonucleotide sets.
`
`[036]
`
`In certain exemplary embodiments, at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55,
`
`60, 65, 70, 75, 80, 85, 90, 95, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000,
`
`2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000 different orthogonal
`
`primer pairs are provided.
`
`[037]
`
`In certain exemplary embodiments, assembly PCR is used to produce a nucleic acid
`
`sequence of interest from a plurality of oligonucleotide sequences that are members of
`
`a particular oligonucleotide set.
`
`“Assembly PCR” refers to the synthesis of long,
`
`double stranded nucleic acid sequences by performing PCR on a pool of
`oligonucleotides having overlapping segments. Assembly PCR is discussed further in
`
`Stemmer et al. (1995) Gene 164: