(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT)
`
`(19) World Intellectual Property
`Organization
`International Bureau
`
`(43) International Publication Date
`15 November 2012 (15.11.2012)
`
`:3
`
`Wt
`
`WIPOI PCT
`
`(51)
`
`International Patent Classification:
`CIZQ 1/68 (2006.01)
`
`(21)
`
`International Application Number:
`
`PCT/U820] 1/057075
`
`(72)
`(75)
`
`(22)
`
`International Filing Date:
`
`20 October 2011 (20.10.2011)
`
`(10) International Publication Number
`
`WO 2012/154201 A1
`
`[US/US]; 17 Quincy Street,
`02138 (US),
`
`Cambridge, Massachusetts
`
`Inventors; and
`Inventors/Applicants MN" US only): CHURCH, George
`M. [US/US]; 218 Kent Street, Brookline, Massachusetts
`02446 (US). KOSURI. Sriram [US/US]; 92 Sciarappa
`Street,
`Cambridge, Massachusetts
`02 14 1
`(US).
`EROSHENKO, Nikolai [US/US]; 90 Kilsy‘th Road, Bo-
`ston, Massachusetts 02135 (US).
`
`(74)
`
`Agent: IWANICKI, John P.; Banner & Witcoff, Ltd., 28
`State Street, Suite 1800, Boston, Massachusetts 02109
`(US).
`
`Filing Language:
`
`Publication Language:
`
`English
`
`linglish
`
`Priority Data:
`61/405,801
`
`22 October 2010 (22.10.2010)
`
`US
`
`(25)
`
`(26)
`
`(30)
`
`(71)
`
`Applicant (for all designated States except US): PRESID-
`ENT AND FELLOWS OF HARVARD COLLEGE
`
`(81)
`
`Designated States (unless otherwise indicated, for every
`kind of national protection available): AE, AG, AL, AM,
`AO, AT, AU, AZ, BA, BB, BG, BH, BR, BW, BY, BZ,
`
`[Continued on nextpagej
`
`(54) Title: ORTHOGONAL AMPLIFICATION AND ASSEMBLY OF NUCLEIC ACID SEQUENCES
`
`(57) Abstract: Methods and compositions for synthesizing nucleic
`acid sequences of interest from heterogeneous mixtures of oligonuc-
`leotide sequences are provided.
`
`
`
`A semsty Sebpmateit
`illé
`AAAAAA
`1 Aacesseésanmms
`m m
`(are
`m m
`
`
`m m
`
`f
`
`Assembiee Gases
`
`figure i
`
`
`
`wo2012/154201A1|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
`
`

`

`WO 2012/154201 A1 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
`
`CA, CH, CL, CN, CO, CR, CU, CZ, DE, DK, DM, DO,
`DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT,
`HN, HR, HU, ID, IL, IN, IS, JP, KE, KG, KM, KN, KP,
`KR, Kz, LA, LC, LK, LR, LS, LT, LU, LY, MA, MD,
`ME, MG, MK, MN, MW, MX, MY, MZ, NA, NG, N1,
`N0, NZ, OM, PE, PG, PH, PL, PT, QA, R0, RS, RU,
`RW, SC, SD, SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ,
`TM, TN, TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA,
`ZM, zw,
`
`(84)
`
`Designated States (unless otherwise indicated, for every
`kind of regional protection available): ARIPO (BW, GH,
`GM, KE, LR, LS, MW, MZ, NA, RW', SD, SL, SZ, TZ,
`UG, ZM, ZW), Eurasian (AM, AZ, BY, KG, KZ, MD,
`
`RU, TJ, TM), European (AL, AT, BE, BG, CH, CY, CZ,
`DE, DK, EE, ES, FI, FR, GB, GR, HR, HU, IE, IS, IT,
`LT, LU, LV, MC, MK, MT, NL, N0, PL, PT, Ro, RS,
`SE, S1, SK, SM, TR), 0AP1(BF, BJ, CF, CG, CI, CM,
`GA, GN, GQ, GW, ML, MR, NE, SN, TD, TG).
`Published:
`
`with international search report (Art. 21(3))
`
`before the expiration of the time limit for amending the
`claims and to be republished in the event of receipt of
`amendments (Rule 48.2(h))
`
`with sequence listing part oj'deseription (Rule 52(0))
`
`

`

`WO 2012/154201
`
`PCT/USZOll/057075
`
`ORTHOGONAL AMPLIFICATION AND ASSEMBLY OF NUCLEIC ACID
`
`SEQUENCES
`
`RELATED APPLICATION DATA
`
`[001]
`
`This application claims priority to US. Provisional Patent Application No.
`
`61/405,801 filed on October 22, 2010 and is hereby incorporated herein by reference
`
`in its entirety for all purposes.
`
`STATEMENT OF GOVERNMENT INTERESTS
`
`[002]
`
`This invention was made with government support under N000141010144 awarded
`
`by the Office of Naval Research, FG02-02ER63445 awarded by the department of
`
`Energy, W911NF—08-1—0254 awarded by the Defense Advanced Research Projects
`
`Agency, and HG003170 awarded by the National Institutes of Health.
`
`The
`
`government has certain rights in the invention.
`
`BACKGROUND
`
`Field of the Invention
`
`[003]
`
`Embodiments of the present invention relate in general to methods and compositions
`
`for amplifying and assembling nucleic acid sequences.
`
`Description of Related Art
`
`[004]
`
`The development of inexpensive, high-throughput and reliable gene synthesis
`
`methods will broadly stimulate progress in biology and biotechnology (Carr & Church
`
`(2009) Nat. Biotechnol. 27:1151). Currently,
`
`the reliance on column-synthesized
`
`oligonucleotides as a source of DNA limits further cost reductions in gene synthesis
`
`(Tian et al. (2009) Mol. BioSyst. 5:714). Oligonucleotides from DNA microchips can
`
`reduce costs by at least an order of magnitude, yet efforts to scale microchip use have
`
`been largely unsuccessful due to the high error
`
`rates and complexity of the
`
`1
`
`

`

`WO 2012/154201
`
`PCT/US2011/057075
`
`oligonucleotide mixtures (Tian et a1. (2004) Nature 432: 1050; Richmond et a1. (2004)
`
`Nucleic Acids Res. 3225011; Zhou et a1. (2004) Nucleic Acids Res. 32:5409).
`
`[005]
`
`The synthesis of novel DNA encoding regulatory elements, genes, pathways, and
`
`entire genomes provides powerful ways to both test biological hypotheses as well as
`
`harness biology for humankind’s use.
`
`For example,
`
`since the initial use of
`
`oligonucleotides in deciphering the genetic code, DNA synthesis has engendered
`
`tremendous progress in biology with the recent complete synthesis of a viable
`
`bacterial genome (Nirenberg et al. (1961) Proc. Natl. Acad. Sci. USA 47:1588; Still et
`
`al. (1965) Proc. Natl. Acad. Sci. USA 54:1378; Gibson et al. (2010) Science 329:52).
`
`Currently, almost all DNA synthesis relies on the use of phosphoramidite chemistry
`
`on controlled-pore glass (CPG) substrates. CPG oligonucleotides synthesized in this
`
`manner are effectively limited to approximately 100 bases by the yield and accuracy
`
`of the process. Thus, the synthesis of gene-sized fragments relies on assembling many
`
`oligonucleotides together using a variety of techniques termed gene synthesis (Tian
`
`(2009) (supra); Gibson (supra); Gibson (2009) Nucleic Acids Res. 37:6984; Li &
`
`Elledge (2007) Nat. Methods 4:251; Bang & Church (2008) Nat. Methods 5:37; Shao
`
`et a1. (2009) Nucleic Acids Res. 37:e16).
`
`[006]
`
`The price of gene synthesis has reduced drastically over the last decade as the process
`
`has become increasingly industrialized. However, the current commercial price of
`
`gene synthesis, approximately $0.40—l.00/bp, has begun to approach the relatively
`
`stable cost of the CPG oligonucleotide precursors (approximately $0.10-O.20/bp)
`
`(Carr (supra)). At these prices, the construction of large gene libraries and synthetic
`
`genomes is out of reach to most. To achieve further cost reductions, many current
`
`efforts focus on smaller volume synthesis of oligonucleotides in order to minimize
`
`reagent costs. For example, microfluidic oligonucleotide synthesis can reduce reagent
`
`cost by an order of magnitude (Lee et a1. (2010) Nucleic Acids Res. 38:2514).
`
`[007]
`
`Another route is to harness existing DNA microchips, which can produce up to a
`
`million different oligonucleotides on a single chip, as a source of DNA for gene
`
`synthesis. Previous efforts have demonstrated the ability to synthesize genes from
`
`2
`
`

`

`WO 2012/154201
`
`PCT/U52011/057075
`
`DNA microchips. Tian et al. described the assembly of 14.6 kb of novel DNA from
`
`292 oligonucleotides synthesized on an Atactic/Xeotron chip (Tian (2004) (supra)).
`
`The process involved using 584 short oligonucleotides synthesized on the same chip
`
`for hybridization-based error correction. The resulting error rates were approximately
`
`1/160 basepairs (bp) before error correction and approximately 1/ 1400 bp after. Using
`
`similar chips, Zhou et al. constructed approximately 12 genes with an error rate as low
`
`as 1/625 bp (Zhou (supra)). Richardson et al. showed the assembly of an 180 bp
`
`construct from eight oligonucleotides synthesized on a microarray using maskless
`
`photolithographic deprotection (Nimblegen) (Richmond (supra)). Though the error
`
`rates were not determined in that study, a follow-up construction of a 742 bp green
`
`fluorescent protein (GFP) sequence using the same process showed an error rate of
`
`1/20 bp — 1/70 bp (Kim et al.
`
`(2006) Microelectronic Eng. 83:1613).
`
`These
`
`approaches have thus far failed to scale for at least two reasons. First, the error rates
`
`of chip-based oligonucleotides from DNA microchips are higher than traditional
`
`column-synthesized oligonucleotides.
`
`Second,
`
`the assembly of gene fragments
`
`becomes increasingly difficult as the diversity of the oligonucleotide mixture becomes
`
`larger.
`
`SUMMARY
`
`[008]
`
`The present invention provides methods and compositions to enrich one or more
`
`oligonucleotide sequences (e.g., DNA and/or RNA sequences) and assemble large
`
`nucleic acid sequences of interest (e.g., DNA and/or RNA sequences (e.g., genes,
`
`genomes and the like)) from complex mixtures of oligonucleotide sequences. The
`
`present invention fiirther provides methods for generating oligonucleotide primers
`
`(e.g., orthogonal primers) that are useful for synthesizing one or more nucleic acid
`
`sequences of interest (e.g., gene(s), genome(s) and the like).
`
`[009]
`
`In certain exemplary embodiments, microarrays including at
`
`least 5,000 different
`
`oligonucleotide sequences are provided.
`
`Each oligonucleotide sequence of the
`
`microarray is a member of one of a plurality of oligonucleotide sets, and each
`
`oligonucleotide set is specific for a nucleic acid sequence of interest (e.g., a single
`
`nucleic acid sequence of interest). Each oligonucleotide sequence that is a member of
`
`3
`
`

`

`WO 2012/154201
`
`PCT/US2011/057075
`
`a particular oligonucleotide set includes a pair of orthogonal primer binding sites
`
`having a sequence that
`
`is unique to said oligonucleotide set. The nucleic acid
`
`sequence of interest is at least 500 nucleotides in length.
`
`In certain aspects, at least
`
`50, at least 100, or more oligonucleotide sets are provided wherein each set is specific
`
`for a unique nucleic acid sequence of interest.
`
`In other aspects, the oligonucleotide
`
`sequence of interest is at least 1,000, at least 2,500, at least 5,000, or more nucleotides
`
`in length.
`
`In still other aspects, the nucleic acid sequence of interest is a DNA
`
`sequence, e.g., a regulatory element, a gene, a pathway and/or a genome. In still other
`
`aspects, the microarray includes at least 10,000 different oligonucleotide sequences
`
`attached thereto.
`
`[010]
`
`In certain exemplary embodiments, a microarray comprising at least 10,000 different
`
`oligonucleotide sequences attached thereto is provided.
`
`Each oligonucleotide
`
`sequence of the microarray is a member of one of at least 50 oligonucleotide sets, and
`
`each oligonucleotide set is specific for a nucleic acid sequence of interest. Each
`
`oligonucleotide sequence that is a member of a particular oligonucleotide set includes
`
`a pair of orthogonal primer binding sites having a sequence that is unique to said
`
`oligonucleotide set.
`
`Each nucleic acid sequence of interest
`
`is at
`
`least 2,500
`
`nucleotides in length.
`
`[011]
`
`In certain exemplary embodiments, methods of synthesizing a nucleic acid sequence
`
`of interest are provided. The methods include the steps of providing at least 5,000
`
`different oligonucleotide sequences, wherein each oligonucleotide sequence is a
`
`member of one of a plurality of oligonucleotide sets, and each oligonucleotide set is
`
`specific for a nucleic acid sequences of interest. Each oligonucleotide sequence
`
`includes a pair of orthogonal primer binding sites having a sequence that is unique to a
`
`single oligonucleotide set.
`
`The methods
`
`includes the step of amplifying an
`
`oligonucleotide set using orthogonal primers that hybridize to the orthogonal primer
`
`binding sites unique to the set, and removing the orthogonal primer binding sites from
`
`the amplified oligonucleotide set. The methods further include the step of assembling
`
`the amplified oligonucleotide set into a nucleic acid sequence of interest that is at least
`
`500 nucleotides in length. In certain aspects, the nucleic acid sequence of interest is at
`
`

`

`WO 2012/154201
`
`PCT/US2011/057075
`
`least 1,000, at least 2,500, at least 5,000, or more nucleotides in length.
`
`In other
`
`aspects, the nucleic acid sequence of interest is a DNA sequence, e.g., a regulatory
`
`element, a gene, a pathway and/or a genome.
`
`In yet other aspects, 50, 100, 500, 750,
`
`1,000 or more oligonucleotide sets are provided, wherein each set is specific for a
`
`unique nucleic acid sequence of interest.
`
`In still other aspects, the 5,000 different
`
`oligonucleotide sequences are provided on a microarray and, optionally, the 5,000
`
`different oligonucleotide sequences can be removed from the microarray prior to the
`
`step of amplifying.
`
`[012]
`
`Further features and advantages of certain embodiments of the present invention will
`
`become more fully apparent in the following description of the embodiments and
`
`drawings thereof, and fiom the claims.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`[013]
`
`The patent or application file contains at least one drawing executed in color. Copies
`
`of this patent or patent application publication with color drawing(s) will be provided
`
`by the Office upon request and payment of the necessary fee. The foregoing and other
`
`features and advantages of the present invention will be more fully understood from
`
`the following detailed description of illustrative embodiments taken in conjunction
`
`with the accompanying drawings in which:
`
`[014]
`
`Figures 1A-1F schematically depict scalable gene synthesis platform schematic for
`
`OLS Pool 2. Pre-designed oligonucleotides (no distinction is made between dsDNA
`
`and ssDNA in the figure) are synthesized on a DNA microchip (A) and then cleaved
`
`to make a pool of oligonucleotides (B). Plate-specific primer sequences (shades of
`
`yellow) are used to amplify separate plate subpools (C) (only two are shown), which
`
`contain DNA to assemble different genes (only three are shown for each plate
`
`subpool). Assembly specific sequences (shades of blue) are used to amplify assembly
`
`subpools (D) that contain only the DNA required to make a single gene. The primer
`
`sequences are cleaved (E) using either Type 118 restriction enzymes (resulting in
`
`dsDNA)
`
`or by DpnII/USER/Qt
`
`exonuclease processing (producing ssDNA).
`
`Construction primers (shown as white and black sites flanking the full assembly) are
`
`5
`
`

`

`WO 2012/154201
`
`PCT/U52011/057075
`
`then used in an assembly PCR reaction to build a gene from each assembly subpool
`
`(F). Depending on the downstream application the assembled products are then
`
`cloned either before or after an enzymatic error correction step.
`
`[015]
`
`Figures 2A-2D depict gene synthesis products. GFPmut3 was PCR assembled (A)
`
`from two different assembly subpools (GFP42 and GFP35) that were amplified from
`
`OLS Pool 1. Because the majority of the products were of the wrong size, the full-
`
`length assemblies were gel purified and re—amplified (B). Using the longer
`
`oligonucleotides in OLS Pool 2 a PCR assembly protocol was developed that did not
`
`require gel-isolation. This protocol was used to build three different fluorescent
`
`proteins (C). The building of 42 scFv regions that contained challenging GC—rich
`
`linkers was then attempted. Of the 42 assemblies (D), 40 resulted in strong bands of
`
`the correct size. The two that did not assemble (7 and 24) were gel isolated and re-
`
`amplified, resulting in bands of the correct size (see Supplementary Fig. 8b online).
`
`The antibody that corresponds to each number is given in Table 3. The sequences
`
`above each assembly represent the amino acid linker sequence used to link heavy and
`
`light chains in the scFv fragments.
`
`[016]
`
`Figures 3A-5'B graphically depict products obtained from OLS Pool 1 and OLS Pool
`
`2. The percentage of fluorescent cells resulting flom synthesis products derived from
`
`column-synthesized oligonucleotides (black), OLS Chip 1 subpools GFP43 and
`
`GFP35 (green) and the three fluorescent proteins produced on OLS Chip 2 with and
`
`without ErrASE treatment (blue, yellow, and orange) are shown (A). The error bars
`
`correspond to the range of replicates from separate ligations. The error rates (average
`
`bp of correct sequence per error) from various synthesis products are shown (B).
`
`Error bars show the expected Poisson error based on the number of errors found
`
`(iVn). Deletions of more than 2 consecutive bases are counted as a single error (no
`
`such errors were found in OLS Pool 1).
`
`[017]
`
`Figure
`
`4A-4B depict
`
`the
`
`amplification and processing of OLS Pool
`
`1
`
`oligonucleotides. Two assembly subpools and two control subpools were amplified
`
`from OLS Pool 1, which contained a total of 13,000 nucleotides (A). Because the
`
`

`

`WO 2012/154201
`
`PCT/U52011/057075
`
`oligonucleotides in the two GFP subpools had sizes distinct from all other nucleotides
`
`on the chip, and since no oligonucleotides of the incorrect size were detected, these
`
`data indicate that the oligonucleotides from other subpools did not amplify. The
`
`dsDNA subpools were then processed using Dan/USER/lambda exonuclease (B).
`
`After processing, three types of products were obtained. First, there were the products
`
`of the expected size. Second, there were the high molecular weight products that
`
`corresponded to oligonucleotides that retained one or both of the assembly-specific
`
`primer sites.
`
`Last,
`
`there were the low molecular weight products that, without
`
`intending to be bound by scientific theory, were likely produced by Dan cleavage at
`
`double stranded recognition sites
`
`formed by the overlapping regions of the
`
`oligonucleotides. The same dsDNA ladder (Low Molecular Weight, New England
`
`Biolabs, Ipswich, MA) was used in both the non-denaturing (A) and the denaturing
`
`(B) 10% PAGE gels, where the denaturing agent produced the extra bands in the
`
`ladder (B).
`
`[018]
`
`Figure 5 depicts GFP assembly from OLS Pool 1. The two OLS Pool
`
`1 GFP
`
`assembly subpools were amplified, processed and PCR assembled (See Figure 3A).
`
`The bands corresponding to full-length assembly products were then gel-isolated and
`
`re-amplified. The re-amplification products shown contained low molecular weight
`
`products that, without intending to be bound by scientific theory, likely remained in
`
`trace amounts afier gel isolation. These significantly greatly increased the number of
`
`clones that needed to be sequences in order to identify a perfect GFPmut3 construct.
`
`The control GFP was amplified from a cloned GFP. GFP20 was an assembly made
`
`from a hand mixed pool of oligonucleotides synthesized using a column-based
`
`method. GFP20 was not gel isolated or re-amplified.
`
`[019]
`
`Figures 6A-6C graphically depict screening error rates of GFP assemblies. Error
`
`rates from the first set (gel-isolated and re-amplified) (A), the second set (gel-isolated
`
`without re-amplification) (B), and the error-corrected second set of GFP assemblies
`
`from OLS Pool 1 (C) were determined using flow cytometry, by counting colonies on
`
`agar plates, and by sequencing individual clones. Error bars give the range of the
`
`observed values.
`
`11 corresponds to the number of electroporated cultures from one or
`
`

`

`WO 2012/154201
`
`PCT/U52011/057075
`
`more ligation reactions performed on a single assembly reaction, with n = 3-4 in (A) n
`
`= 3 in (B), and n = 2 in (C).
`
`[020]
`
`Figure 7 graphically depicts the dynamic range of the flow cytometry screen. The
`
`relationship between the fluorescent fraction observed with flow cytometry is shown
`
`as a function of the fraction of perfect assemblies predicted from the sequencing data,
`
`with each data point corresponding to individual samples constructs built from the
`
`OLS Pool 1 (the same data are shown in Figure 6). The black line indicates the result
`
`expected if the sequencing and fluorescent data predicted each other perfectly.
`
`[021]
`
`Figures 8A-8C depict processing of OLS 2 assembly subpools. Assembly-specific
`
`primers were used to amplify the subpools that were designed to build three different
`
`fluorescent proteins (A). A BtsI restriction enzyme was used to remove the priming
`
`sites (B). The same protocol was followed in processing the antibody assembly
`
`subpools, with (C) depicting the subpools after the Btsl digest. The gel in (C) depicts
`
`only 38 subpools because four antibody subpools evaporated from the reaction tubes
`
`during PCR, and had to be re-amplified in a separate experiment.
`
`[022]
`
`Figures 9A-9B graphically depict optimization of enzymatic synthesis error removal
`
`with ErrASE (Novici Biotech, Vacaville, CA). mCitrine synthesized from OLS Pool
`
`2 was treated with ErrASE, and the fluorescent fraction was quantified with flow
`
`cytometry (A). The different ErrASE reactions corresponded to varying quantities of
`
`error-removing enzymes, with ErrASE 1 having the most and ErrASE 6 the least.
`
`Error bars give the range of the data points, with n = 2 or 4 for the control and the
`
`mCitrine constructs, respectively.
`
`Increasing both the length of ErrASE treatment
`
`from 1
`
`to 2 hours did not
`
`lead to a major decrease in error rates (B).
`
`“NO
`
`PRODUCT” indicates that the post-ErrASE amplification did not produce a product
`
`of the correct size. Without intending to be bound by scientific theory, this was most
`
`likely because the ErrASE error removing enzymes over-digested the assembly. Each
`
`value is an average of independent flow cytometry runs performed on five (A) or three
`
`(B) aliquots of the cloned assemblies.
`
`

`

`WO 2012/154201
`
`PCT/U52011/057075
`
`[023]
`
`Figures 10A-101 depict optimization of the antibody assembly protocol. First, each
`
`antibody assembly subpool was subjected to 15 PCR cycles in the presence of KOD
`
`DNA polymerase, but in the absence of construction primers. Next, the construction
`
`primers and each assembly was diluted in another PCR mix. Shown are the 2%
`
`agarose gels of the following assembly protocols:
`
`(A) KODl;
`
`(B) KOD2;
`
`(C)
`
`KODXL60; (D) KODXL65; (E) Phusion62; (F) Phusion 67; (G) Phusion 72; (H)
`
`Phusion 62B; (I) Phusion67B. A 1 kb Plus DNA Ladder (Invitrogen. Carlsbad, CA)
`
`was used as a size marker in all experiments.
`
`[024]
`
`Figure 1] depicts antibody assemblies that were screened. Here, eight of the 42
`
`assembled scFv fragments were error-corrected with ErrASE, gel isolated, and re-
`
`amplified, generating the products shown. The constructs were subsequently cloned
`
`and sequenced (Table 3).
`
`[025]
`
`Figures 12A-12B depicts gels showing antibody assemblies.
`
`(A) The first assembly
`
`reaction resulted in 29 out of 42 antibody assembly reactions yielding products of the
`
`correct size. The antibody that corresponds to each number is listed in Table 3.
`
`Increasing the assembly subpool concentration used in the assembly reaction increased
`
`the number of successful assemblies to 40 (see Figure 2D). The two failures from the
`
`second set of assembly reactions were gel-isolated and re-amplified, yielding products
`
`of the correct size (B).
`
`[026]
`
`Figures 13A-13B graphically depict the use of betaine during the ErrASE melt and re-
`
`anneal step. A set of synthesized antibodies (synthesis products shown in Figure 2D)
`
`was treated with ErrASE, with betaine either included or left out of the melting and
`
`re-annealing step. The resulting error rate (A) and the probability of a synthesized
`
`molecule being either misassembled or having a large (3+ consecutive bp) deletion
`
`(B) was quantified. Error bars indicate the expected Poisson error (Vn, with n being
`
`the number of errors observed).
`
`[027]
`
`Figure 14 schematically depicts a full synthesis workflow according to certain aspects
`
`of the invention. The workflow was dependent on whether USER/DpnlI processing
`
`

`

`WO 2012/154201
`
`PCT/US2011/057075
`
`(left branch after oligo synthesis) or type HS enzymes (right branch) was used for
`
`removing the amplification sites. The process outlines a final optimized form of the
`
`optimized protocols. The times given in parentheses are estimates that account for
`
`both the time involved in setting up reactions and the time to complete the reaction.
`
`[028]
`
`Figure 15 schematically depicts OLS Pool
`
`1 assembly subpool amplification, and
`
`USER/Dpnll processing. Assembly subpools were amplified from OLS Pool 1 using
`
`20 bp priming sites that were shared by all primers in any particular assembly. A PCR
`
`reaction resulted in a pool of dsDNA with the oligos fi'om other assemblies still in
`
`ssDNA form and at trace concentrations. The forward subpool amplification primers
`
`incorporates two sequential phosphorothioate linkages on the 5' end,
`
`and a
`
`deoxyuridine its 3' end, while the reverse primer had a phosphate at its 5' end.
`
`Lambda exonuclease is a 5' to 3' exonuclease that degrades 5' phosphorylated DNA
`
`and is blocked by phosphorothioate. This property was used to selectively degrade the
`
`remove strand of the amplified products. USER (Uracil-Specific Excision Reagent)
`
`Enzyme (New England Biolabs,
`
`Ipswich, MA) removed the 5' priming site by
`
`excising the uracil and cutting 3' and 5' of the resulting apyrimidinic site. Next, the 3'
`
`end was annealed to the reverse amplification primer, forming a double-stranded
`
`Dan recognition site (5‘ GATC). The 3' priming site was then removed With a Dan
`
`digest.
`
`DETAILED DESCRIPTION
`
`[029]
`
`The present
`
`invention is based in part on the discovery that high-fidelity DNA
`
`microchips,
`
`selective oligonucleotide
`
`amplification, optimized gene
`
`assembly
`
`protocols, and enzymatic error correction can be used to develop a highly parallel
`
`nucleic acid sequence (e.g., gene) synthesis platform. Assembly of 47 genes,
`
`including 42 challenging therapeutic antibody sequences, encoding a total of
`
`approximately 35 kilobasepairs of DNA has been surprisingly achieved using the
`
`compositions and methods described herein. These assemblies were created fi'om a
`
`complex background containing 13,000 oligonucleotides encoding approximately 2.5
`
`megabases of DNA, which is at least 50 times larger than previous attempts known in
`
`the art. A number of features were discovered to play an important role to the
`
`10
`
`

`

`WO 2012/154201
`
`PCT/US2011/057075
`
`functionality of nucleic acid synthesis platform described herein, including the use of
`
`low-error starting material, well-chosen orthogonal primers, subpool amplification of
`
`individual assemblies, optimized assembly methods, and enzymatic error correction.
`
`[030]
`
`The present invention provides methods and compositions for the assembly of one or
`
`more nucleic acid sequences of interest
`
`from a large pool of oligonucleotide
`
`sequences.
`
`In certain exemplary embodiments, a nucleic acid sequence of interest is
`
`at least about 100, 200, 300, 400, 500 600, 700, 800, 900, 1,000, 1,500, 2,000, 2,500,
`
`3,000, 3,500, 4,000, 4,500, 5,000, 5,500, 6,000, 6,500, 7,000, 7,500, 8,000, 8,500,
`
`9,000, 9,500, 1,000,000, 2,000,000, 3,000,000, 4,000,000, 5,000,000, 6,000,000,
`
`7,000,000, 8,000,000, 9,000,000, 10,000,000 or more nucleic acids in length.
`
`In other
`
`exemplary embodiments, a nucleic acid sequence of interest is between 100 and
`
`10,000,000 nucleic acids in length,
`
`including any ranges therein.
`
`In yet other
`
`exemplary embodiments, a nucleic acid sequence of interest is between 100 and
`
`20,000 nucleic acids in length, including any ranges therein.
`
`In still other exemplary
`
`embodiments, a nucleic acid sequence of interest is between 100 and 25,000 nucleic
`
`acids in length, including any ranges therein. In other aspects, a nucleic acid sequence
`
`of interest is a DNA sequence such as, e.g., a regulatory element (e.g., a promoter
`
`region, an enhancer region, a coding region, a non-coding region and the like), a gene,
`
`a genome, a pathway (e.g., a metabolic pathway (e.g., nucleotide metabolism,
`
`carbohydrate metabolism, amino acid metabolism,
`
`lipid metabolism, co-factor
`
`metabolism, vitamin metabolism, energy metabolism and the like), a signaling
`
`pathway, a biosynthetic pathway, an immunological pathway, a developmental
`
`pathway and the like) and the like.
`
`In yet other aspects, a nucleic acid sequence of
`
`interest
`
`is the length of a gene, e.g., between about 500 nucleotides and 5,000
`
`nucleotides in length.
`
`In still other aspects, a nucleic acid sequence of interest is the
`
`length of a genome (e.g., a phage genome, a Viral genome, a bacterial genome, a
`
`fungal genome, a plant genome, an animal genome or the like).
`
`[031]
`
`Embodiments of the present
`
`invention are directed to oligonucleotide sequences
`
`having two or more orthogonal primer binding sites that each hybridizes to an
`
`orthogonal primer. As used herein, the term “orthogonal primer binding site” is
`
`11
`
`

`

`WO 2012/154201
`
`PCT/US2011/057075
`
`intended to include, but is not limited to, a nucleic acid sequence located at the 5'
`
`and/or 3' end of the oligonucleotide sequences of the present
`
`invention which
`
`hybridizes a complementary orthogonal primer. An “orthogonal primer pair” refers to
`
`a set of two primers of identical sequence that bind to both orthogonal primer binding
`
`sites at the 5' and 3' ends of each oligonucleotide sequence of an oligonucleotide set.
`
`Orthogonal primer pairs are designed to be mutually non-hybridizing to other
`
`orthogonal primer pairs, to have a low potential to cross-hybridize with one another or
`
`with oligonucleotide sequences, to have a low potential to form secondary structures,
`
`and to have similar melting temperatures (Tms) to one another. Orthogonal primer
`
`pair design and software useful for designing orthogonal primer pairs is discussed
`
`further herein.
`
`[032]
`
`As used herein, the term “oligonucleotide set” refers to a set of oligonucleotide
`
`sequences that has identical orthogonal pair primer sites and is specific for a nucleic
`
`acid sequence of interest.
`
`In certain aspects, a nucleic acid sequence of interest is
`
`synthesized from a plurality of oligonucleotide sequences
`
`that make up an
`
`oligonucleotide set.
`
`In other aspects, the plurality of oligonucleotide sequences that
`
`make up an oligonucleotide set are retrieved from a large pool of heterogeneous
`
`oligonucleotide sequences Via common orthogonal primer binding sites.
`
`In certain
`
`aspects, an article of manufacture (e.g., a microchip, a test tube, a kit or the like) is
`
`provided that includes a plurality of oligonucleotide sequences encoding a mixture of
`
`oligonucleotide sets.
`
`[033]
`
`In certain exemplary embodiments, at least 100, 200, 300, 400, 500, 600, 700, 800,
`
`900, 1,000, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 11,000,
`
`12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000, 19,000, 20,000, 21,000,
`
`22,000, 23,000, 24,000, 25,000,
`
`30,000, 35,000, 40,000, 45,000, 50,000, 55,000,
`
`60,000, 65,000, 70,000, 75,000, 80,000, 85,000, 90,000, 95,000, 100,000 or more
`
`different oligonucleotide sequences are provided.
`
`In certain aspects, between about
`
`2,000 and about 80,000 different oligonucleotide sequences are provided.
`
`In other
`
`aspects, between about 5,000 and about 60,000 different oligonucleotide sequences
`
`12
`
`

`

`WO 2012/154201
`
`PCT/US2011/057075
`
`are provided.
`
`In still other aspects, about 55,000 different oligonucleotide sequences
`
`are provided.
`
`[034]
`
`In certain exemplary embodiments, the oligonucleotide sequences are at least 20, 30,
`
`40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210,
`
`220, 230, 240, 250, 260, 270, 280, 290, 300, 350, 400, 450, 500, 550, 600, 650, 700,
`
`750, 800, 850, 900, 950, 1000 or more nucleotides in length.
`
`In certain aspects, the
`
`oligonucleotide sequences are between about 50 and about 500 nucleotides in length.
`
`In other aspects, the oligonucleotide sequences are between about 100 and about 300
`
`nucleotides in length.
`
`In other aspects, the‘oligonucleotide sequences are about 130
`
`nucleotides in length.
`
`In still other aspects, the oligonucleotide sequences are about
`
`200 nucleotides in length.
`
`[035]
`
`In certain exemplary embodiments, the oligonucleotide sequences encode at least 5,
`
`10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 200, 300,
`
`400, 500, 600, 700, 800, 900, 1,000, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000,
`
`9,000, 10,000 or more different oligonucleotide sets.
`
`[036]
`
`In certain exemplary embodiments, at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55,
`
`60, 65, 70, 75, 80, 85, 90, 95, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1,000,
`
`2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000 different orthogonal
`
`primer pairs are provided.
`
`[037]
`
`In certain exemplary embodiments, assembly PCR is used to produce a nucleic acid
`
`sequence of interest from a plurality of oligonucleotide sequences that are members of
`
`a particular oligonucleotide set.
`
`“Assembly PCR” refers to the synthesis of long,
`
`double stranded nucleic acid sequences by performing PCR on a pool of
`oligonucleotides having overlapping segments. Assembly PCR is discussed further in
`
`Stemmer et al. (1995) Gene 164:

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.

We are unable to display this document.

HTTP Error 500: Internal Server Error

Refresh this Document
Go to the Docket