`(19) World Intellectual Property
`Organization
`International Bureau
`
`(10) International Publication Number
`(43) International Publication Date
`WO 2015/017527 A2
`5 February 2015 (05.02.2015) WIPO! PCT
`
`
`Ss
`
`G1)
`
`International Patent Classification:
`£E21B 34/08 (2006.01)
`
`Qy)
`
`International Application Number:
`
`PCT/US2014/048867
`
`@2)
`
`International Filing Date:
`
`(25)
`
`Filing Language:
`
`(26)
`
`Publication Language:
`
`30 July 2014 (80.07.2014)
`
`English
`
`English
`
`US
`US
`
`(30)
`
`(7)
`
`(72)
`
`(74)
`
`(81)
`
`Priority Data:
`61/859,946
`61/909,526
`
`30 July 2013 (30.07.2013)
`27 November 2013 (27.11.2013)
`
`Applicant: GEN9, INC. [US/US]; 840 Memorial Drive,
`Cambridge, Massachusetts 02139 (US).
`
`JACOBSON, Joseph; 223 Grant Avenue,
`Inventors:
`Newton, Massachusetts 02459 (US). HUDSON, Michael
`E.; 21 Crestwood Drive, Framingham, Massachusetts
`01701 (US). KUNG, Li-yun; 4 Knowles Farm Road, Ar-
`lington, Massachusetts 02474 (US).
`
`Agent: SALEM, Natalie; Greenberg Traurig, LLP, One
`International Place, Boson, Massachusetts 02110 (US).
`
`Designated States (unless otherwise indicated, for every
`kind of national protection available): AE, AG, AL, AM,
`
`AO, AT, AU, AZ, BA, BB, BG, BH, BN, BR, BW, BY,
`BZ, CA, CH, CL, CN, CO, CR, CU, CZ, DE, DK, DM,
`DO, DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT,
`HN, HR, HU,ID,IL, IN, IR, IS, JP, KE, KG, KN, KP, KR,
`KZ, LA, LC, LK, LR, LS, LT, LU, LY, MA, MD, ME,
`MG, MK, MN, MW, MX, MY, MZ, NA, NG, NIL NO, NZ,
`OM,PA, PE, PG, PH, PL, PT, QA, RO, RS, RU, RW, SA,
`SC, SD, SE, SG, SK, SL, SM, ST, SV, SY, TH, TJ, TM,
`TN, TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA, ZM,
`ZW.
`
`(84)
`
`Designated States (unless otherwise indicated, for every
`kind of regional protection available): ARIPO (BW, GH,
`GM, KE, LR, LS, MW, MZ, NA, RW, SD, SL, SZ, TZ,
`UG, ZM, ZW), Eurasian (AM, AZ, BY, KG, KZ, RU, TJ,
`TM), European (AL, AT, BE, BG, CH, CY, CZ, DE, DK,
`EE, ES, FL FR, GB, GR, HR, HU,IE, IS, IT, LT, LU, LV,
`MC, MK,MT, NL, NO, PL, PT, RO, RS, SE, SI, SK, SM,
`TR), OAPI (BF, BJ, CF, CG, CI, CM, GA, GN, GQ, GW,
`KM, ML, MR, NE, SN, TD, TG).
`Published:
`
`without international search report and to be republished
`upon receipt ofthat report (Rule 48.2(g))
`
`with sequencelisting part ofdescription (Rule 5.2(a))
`
`(54) Title: METHODS FOR THE PRODUCTION OF LONG LENGTH CLONAL SEQUENCE VERIFIED NUCLEIC ACID
`CONSTRUCTS
`
`70
`OO bea
`
`71
`
`™s be310
`Si
`x
`a0
`GLA
`PR a “i WH
`72
`
`FIG. aA
`
`guest
`
`30
`“i
`
`a
`
`10
`
`.
`
`.
`
`. +
`5
`>
`
`“
`
`7 bcO41
`5 GER
`3) Mile
`
`5 bc801
`POLED
`3’ ie. 2
`\
`
`60
`
`61
`
`62
`
`#
`AK
`
`
`
`38
`
`52
`i
`
`:
`
`Y
`80
`
`bco19
`LLG
`CDE:
`}
`
`(57) Abstract: Methods and compositions relate to the production ofhigh fidelity nucleic acids using high throughput sequencing.
`
`
`
`
`
`WO2015/017527A2{INTIMINMIIMTIANATANATAANA
`
`
`
`WO 2015/017527
`
`PCT/US2014/048867
`
`METHODS FOR THE PRODUCTION OF LONG LENGTH CLONAL SEQUENCE
`
`VERIFIED NUCLEIC ACID CONSTRUCTS
`
`RELATED APPLICATIONS
`
`[0001]
`
`This application claims the benefit of and priority to U.S. provisional application
`
`serial number 61/859,946, filed July 30, 2013 and U.S. provisional application serial number
`
`61/909,526, filed November 27, 2013, each of which is incorporated herein by reference in its
`
`entirety.
`
`FIELD OF THE INVENTION
`
`[0002]
`
`Methods and compositions of the invention relate to nucleic acid assembly, and
`
`particularly to methods for sorting and cloning nucleic acids having a predetermined sequence.
`
`BACKGROUND
`
`[0003]
`
`Recombinant and synthetic nucleic acids have many applications in research,
`
`industry, agriculture, and medicine. Recombinant and synthetic nucleic acids can be used to
`
`express and obtain large amounts of polypeptides, including enzymes, antibodies, growth factors,
`
`receptors, and other polypeptides that may be used for a variety of medical,
`
`industrial, or
`
`agricultural purposes. Recombinant and synthetic nucleic acids also can be used to produce
`
`genctically modificd organisms including modificd bactcria, ycast, mammals, plants, and othcr
`
`organisms. Genetically modified organisms may be used in research (e.g., as animal models of
`
`disease, as tools for understanding biological processes, etc.), in industry (e.g., as host organisms
`
`for protein expression,
`
`as bioreactors
`
`for generating industrial products,
`
`as
`
`tools
`
`for
`
`environmental remediation,
`
`for
`
`isolating or modifying natural compounds with industrial
`
`applications, etc.),
`
`in agriculture (e.g., modified crops with increased yield or increased
`
`resistance to disease or environmental stress, etc.), and for other applications. Recombinant and
`
`synthetic nucleic acids also may be used as therapeutic compositions (e.g., for modifying gene
`
`expression, for gene therapy, etc.) or as diagnostic tools (e.g., as probes for disease conditions,
`
`etc.).
`
`
`
`WO 2015/017527
`
`PCT/US2014/048867
`
`[0004]
`
`Numerous techniques have been developed for modifying existing nucleic acids
`
`(e.g., naturally occurring nucleic acids) to generate recombinant nucleic acids. For example,
`
`combinations of nucleic acid amplification, mutagenesis, nuclease digestion, ligation, cloning
`
`and other techniques may be used to produce many different recombinant nucleic acids.
`
`Chemically synthesized polynucleotides are often used as primers or adaptors for nucleic acid
`
`amplification, mutagenesis, and cloning.
`
`[0005]
`
`Techniques also are being developed for de novo nucleic acid assembly whereby
`
`nucleic acids are made (e.g., chemically synthesized) and assembled to produce longer target
`
`nucleic acids of interest. For example, different multiplex assembly techniques are being
`
`developed for assembling oligonucleotides into larger synthetic nucleic acids that can be used in
`
`research, industry, agriculture, and/or medicine. However, one limitation of currently available
`
`assembly techniques is the relatively high error rate. As such, low cost production methods of
`
`long length high fidelity nucleic acids are needed.
`
`SUMMARY OF THE INVENTION
`
`[0006]
`
`Aspects of the invention relate to methods and compositions for the production of
`
`nucleic acid molecules having a predetermined sequence. In some embodiments, methods and
`
`compositions for the production of nuclcic acid moleculcs having a length of 1kbasc or more are
`
`provided.
`
`[0007]
`
`In some aspects of the invention, the methods comprise providing a pool of
`
`nucleic acid molecules comprising at least two populations of nucleic acid molecules, each
`
`population of nucleic acid molecule having at least one unique target nucleic acid sequence, the
`
`target nucleic acid sequence having an oligonucleotide tag sequenceat its 5’ end andatits 3’ end.
`
`The oligonucleotide tag sequence can comprise a unique nucleotide tag. The nucleic acid
`
`molecules can be subjected to fragmentation to generate nucleic acid fragments, wherein the
`
`nucleic acid fragments comprise oligonucleotide tag sequencesat their 5’ end and at their 3’ end,
`
`the oligonucleotide tag sequence comprising a unique nucleotide tag. The sequence of the
`
`tagged nucleic acid fragments can be determined.
`
`[0008]
`
`In some embodiments, the pool of nucleic acid molecules comprises error-free
`
`and error-containing nucleic acid molecules.
`
`In some embodiments,
`
`the method further
`
`comprises isolating the nucleic acid molecules having the predetermined sequence.
`
`2
`
`
`
`WO 2015/017527
`
`PCT/US2014/048867
`
`[0009]
`
`In some embodiments, a pool of nucleic acid molecules comprising at least two
`
`populations of nucleic acid molecules is provided. Each population of nucleic acid molecules
`
`can have a unique target nucleic acid sequence,the target nucleic acid sequence having a 5’ end
`
`and a 3’ end. The 5’ end andthe 3’ end of the target nucleic acid molecules can be tagged with
`
`an oligonucleotide tag sequence, wherein the oligonucleotide tag sequence comprises a unique
`
`nucleotide tag. In some embodiments, the nucleic acid molecules can be assembled de novo.
`
`[0010]
`
`In some embodiments, the step of determining the sequence comprises producing
`
`a sequenceread using a next generation sequencing platform.
`
`In some embodiments, the nucleic
`
`acid molecules can have a length greater than a sequence read length limit Lmax imposedby the
`
`next generation sequencing platform.
`
`In some embodiments, each nucleic acid fragment
`
`generated can have on averagea finite probability of being less than the length Lmax.
`
`[0011]
`
`In some embodiments, fragmentation results in the generation of a plurality of
`
`junction breaks wherein each side of junction breaks is tagged with correlated oligonucleotide
`
`barcodes sufficient to identify an upstream side and downstream side of the junction break.
`
`[0012]
`
`In some embodiments, the nucleic acid molecules have a length greater than 1
`
`kbasesor greater than 2 kbases.
`
`[0013]
`
`In some aspects of the invention, the method for producing nucleic acid molecules
`
`having a predetermined sequence comprises providing a pool of nucleic acid molecules
`
`comprising at Icast two populations of nuclcic acid molecules, cach population of nucleic acid
`
`molecules having a unique target nucleic acid scqucncc, and one or more transpososomes,
`
`wherein each transpososome has a different unique double-stranded oligonucleotide barcode.
`
`In
`
`some embodiments, the method further comprises allowing the transpososomes to generate one
`
`or more nucleic acid junction breaks thereby generating a plurality of nucleic acid fragments
`
`comprising an oligonucleotide tag sequence at their 5’ end and at their 3’ end, wherein the
`
`oligonucleotide tag sequence comprises a unique nucleotide tag. The method can further
`
`comprise determining the sequence ofthe tagged nucleic acid fragments.
`
`In some embodiments,
`
`each transpososome can introduce separate correlated barcodes upstream and downstream of the
`
`junction break.
`
`[0014]
`
`In some embodiments, the method comprises contacting a pool of nucleic acid
`
`molecules with at
`
`least one transpososome, wherein the transpososome introduces a unique
`
`double-stranded oligonucleotide sequence comprising two correlated barcodes separated by one
`
`3
`
`
`
`WO 2015/017527
`
`PCT/US2014/048867
`
`or morecleavagesites to the nucleic acid molecules and cleaving the nucleic acid molecules. The
`
`cleavage sites can be, for example without limitation, restriction sites for restriction nucleases or
`
`meganucleases, or CRISPRsites, and the nucleic acid molecules can be cleaved with a nuclease.
`
`[0015]
`
`In some embodiments, the method comprises contacting a pool of nucleic acid
`
`molecules with at least one transpososome, wherein the transpososome introduces a unique
`
`double-stranded oligonucleotide sequence comprising two correlated barcodes separated by one
`
`or more dU bases to the nucleic acid molecules and cleaving the nucleic acid molecules. The
`
`nucleic acid molecules can be cleaved with a Uracil-Specific Excision Reagent.
`
`[0016]
`
`In some embodiments, the method comprises providing a pool of nucleic acid
`
`molecules comprising at least two populations of nucleic acid molecules, each population of
`
`nucleic acid molecules having a unique target nucleic acid sequence, the target nucleic acid
`
`sequence having a 5’ end and a 3’ end,and tagging the 5’ end and the 3’ end of the target nucleic
`
`acid molecules with an oligonucleotide tag sequence, wherein the oligonucleotide tag sequence
`
`comprises a unique nucleotide tag.
`
`[0017]
`
`In some embodiments, the nucleic acid molecules are assembled de novo.
`
`In
`
`some embodiments, the nucleic acid molecules are synthetic nucleic acid molecules.
`
`[0018]
`
`In some embodiments, the step of determining the sequence comprises producing
`
`a sequence read using a next generation sequencing platform.
`
`In some embodiments, the nucleic
`
`acid molecules can have a length greater than a sequence read length limit Lmax imposed by the
`
`next generation scquencing platform.
`
`In some embodiments, cach fragment can have on average
`
`a finite probability of being less than the length Lmax.
`
`[0019]
`
`In some embodiments, the method comprises generating a plurality of junction
`
`breaks wherein each side of junction breaks is tagged with correlated oligonucleotide barcodes
`
`sufficient to identify an upstream and downstream side of the junction breaks.
`
`[0020]
`
`In some embodiments, the nucleic acid molecules have a length greater than 1
`
`kbases or greater than 2 kbases.
`
`[0021]
`
`In some embodiments, the pool of nucleic acid molecules can comprise error-free
`
`and error-containing nucleic acid molecules.
`
`In some embodiments,
`
`the method comprises
`
`isolating the error-free nucleic acid molecules having the predetermined. sequence.
`
`[0022]
`
`In some embodiments,
`
`the method comprises amplifying the nucleic acid
`
`fragments.
`
`
`
`WO 2015/017527
`
`PCT/US2014/048867
`
`[0023]
`
`In some embodiments, the method comprises amplifying error-free nucleic acid
`
`molecules having the predetermined sequence using primers having a sequence complementary
`
`to a sequence of the 5’ end and the 3’ end oligonucleotide tags.
`
`In some embodiments, the
`
`methods further comprise isolating error-free nucleic acid molecules having the predetermined
`
`sequence.
`
`[0024]
`
`In some embodiments, the method for producing nucleic acid molecules having a
`
`predetermined sequence comprises the steps of providing a pool of nucleic acid molecules
`
`comprising at least two populations of nucleic acid molecules, the pool of nucleic acid molecules
`
`comprising error-free and error-containing nucleic acid molecules and wherein each population
`
`of nucleic acid molecule has a unique target nucleic acid sequence having a 5’ end and a 3’ end;
`
`tagging the 5’ end and the 3’ end of the target nucleic acid molecules with an oligonucleotide tag
`
`sequence, wherein the oligonucleotide tag sequence comprises a unique nucleotide tag, thereby
`
`forming tagged target nucleic acid molecules; and diluting the tagged target nucleic acid
`
`molecules to generate a pool diluted tagged target molecules comprising an error-free tagged
`
`target nucleic acid molecules,. The method can further comprises providing one or more
`
`transpososomes, wherein each transpososome has
`
`a different unique double-stranded
`
`oligonucleotide barcode, adding the transpososomesto the pool of tagged nucleic acid molecules
`
`and allowing the transpososomes to generate one or more nucleic acid junction breaks thereby
`
`gencrating a plurality of nuclcic acid fragments comprising an oligonucleotide tag sequence at
`
`their 5’ end and at their 3’ end, wherein the oligonucleotide tag sequence comprises a unique
`
`nucleotide tag. The sequence of the tagged nucleic acid fragments can then be determined, and
`
`the error-free nucleic acid molecules having the predetermined sequence can be isolated.
`
`[0025]
`
`In some embodiments, following the diluting step, the tagged target nucleic acid
`
`molecules can be amplified.
`
`In some embodiments, the tagged target nucleic acid molecules can
`
`be diluted and re-amplified.
`
`[0026]
`
`Aspects of the invention relate to methods for preparing nucleic acid molecules.
`
`In some embodiments, the method comprises the step of providing one or more transpososomes
`
`and a pool of different synthetic nucleic acid molecules, each synthetic nucleic acid molecule
`
`having a unique target nucleic acid sequence, each transpososome having a different unique
`
`double-stranded oligonucleotide barcode.
`
`
`
`WO 2015/017527
`
`PCT/US2014/048867
`
`[0027]
`
`The transpososomes and synthetic nucleic acid molecules can be contacted under
`
`conditions sufficient
`
`to generate one or more nucleic acid junction breaks, wherein each
`
`transpososomeintroduces separate correlated barcodes upstream and downstream of the junction
`
`break, thereby generating a plurality of nucleic acid fragments comprising a barcodeat the 5' end,
`
`the 3' end or the 5' end and the 3' end. The sequence of the barcoded nucleic acid fragments can
`
`then be determined.
`
`In some embodiments, the tagged nucleic acid of interest having the
`
`predetermined sequence can beisolated.
`
`[0028]
`
`In some embodiments, each target nucleic acid sequence has an oligonucleotide
`
`tag sequenceat the 5’ end, the 3' end, or the 5' end and 3' end, the oligonucleotide tag sequence
`
`comprising a unique nucleotide tag.
`
`In some embodiments, the method generates a plurality of
`
`nucleic acid fragments comprising a barcode or an oligonucleotide tag sequence at the 5' end and
`
`the 3' end of the fragments.
`
`[0029]
`
`In some aspects of the invention, the method comprises (a) providing a pool of
`
`synthetic nucleic acid molecules comprising at least two different nucleic acid molecules, the
`
`pool of nucleic acid molecules comprisingerror-free and error-containing nucleic acid molecules
`
`and wherein each population of nucleic acid molecule has a unique target nucleic acid sequence
`
`having a 5' cnd and a 3' end, (b) tagging the 5' end and the 3' end of the target nuclcic acid
`
`molecules with an oligonucleotide tag sequence, wherein the oligonucleotide tag sequence
`
`compriscs a unique nucleotide tag, thcrcby forming tagged target nucleic acid molccules, (c)
`
`diluting the tagged target nuclcic acid molecules to gencrate a pool of diluted tagged target
`
`molecules comprising at least one error-free tagged target nucleic acid molecule, (d) providing
`
`one or more transpososomes, wherein each transpososome has a different unique double-
`
`stranded oligonucleotide barcode,
`
`(ec) adding the one or more transpososomes to the pool of
`
`tagged nucleic acid molecules, (f) allowing the one or more transpososomes to generate one or
`
`more nucleic acid junction breaks thereby generating a plurality of nucleic acid fragments
`
`comprising a barcode or an oligonucleotide tag sequence at the 5' end and at the 3' end, and (g)
`
`determining the sequence of the tagged nucleic acid fragments.
`
`In some embodiments, the
`
`tagged nucleic acid of interest having the predetermined sequencecan be isolated.
`
`[0030]
`
`In someaspects of the invention, the method for preparing nucleic acid molecules
`
`comprises providing a pool of different synthetic nucleic acid molecules, each synthetic nucleic
`
`acid molecules having a unique target nucleic acid sequence, wherein each target nucleic acid
`
`6
`
`
`
`WO 2015/017527
`
`PCT/US2014/048867
`
`sequence has an oligonucleotide tag sequence at the 5' end and the 3' end, and wherein the
`
`oligonucleotide tag sequence comprises a unique nucleotide tag, subjecting the synthetic nucleic
`
`acid molecules to fragmentation to generate nucleic acid fragments, wherein each nucleic acid
`
`fragment comprises an oligonucleotide tag sequenceat the 5' end, the 3' end, or the 5' end and the
`
`3' end; and determining the sequence of the tagged nucleic acid fragments.
`
`In some
`
`embodiments, the tagged nucleic acid of interest having the predetermined sequence can be
`
`isolated.
`
`BRIEF DESCRIPTION OF THE FIGURES
`
`[0031]
`
`FIGs 1A-1D illustrate a schematic representation of a non-limiting exemplary
`
`method for production of
`
`long sequence verified polynucleotide
`
`constructs using a
`
`transpososome with two unconnected polynucleotide barcodes to create and tag break junctions
`
`in a long polynucleotide construct. The “x” designates incorrect or undesired sequence site. FIG.
`
`1A illustrates the addition of transpososomes (10, 30) to tagged nucleic acids (50, 51, 52)
`
`according to some embodiments of the invention. FIG. 1B illustrates the mixture of tagged
`
`nucleic acids and transpososomes according to some embodiments of the invention. FIG. 1C
`
`illustrates the fragmentation of the nucleic acids according to some embodiments of the
`
`invention. FIG. 1D illustrates the tagged nucleic acid fragments according to some embodiments
`
`of the invention.
`
`[0032]
`
`FIGS. 2A-2D illustrate a schematic representation of a non-limiting exemplary
`
`method for production of
`
`long sequence verified polynucleotide
`
`constructs using a
`
`transpososome with a single polynucleotide construct comprising two co-joined barcodes with a
`
`cleavage site (RS) in between the two barcodes. The “x’’ designates incorrect or undesired
`
`sequence site. FIG. 2A illustrates the addition of transpososomes (110, 130) to tagged nucleic
`
`acids (150, 151, 152) according to some embodiments of the invention. FIG. 2B illustrates the
`
`mixture of tagged nucleic acids and transpososomes according to some embodiments of the
`
`invention.
`
`FIG. 2C illustrates the fragmentation of the nucleic acids according to some
`
`embodiments of the invention. FIG. 2D illustrates the tagged nucleic acid fragments according
`
`to some cmbodiments of the invention.
`
`[0033]
`
`FIGS. 3A-3D illustrate a schematic representation of a non-limiting exemplary
`
`method for production of long sequence verified polynucleotide constructs using a random
`7
`
`
`
`WO 2015/017527
`
`PCT/US2014/048867
`
`cutting of polynucleotides that have been labeled with 5’ end (left) and 3” end (right) barcodes
`
`such that
`
`the random cuts act as unique or semi-unique identifying markers
`
`for
`
`the
`
`polynucleotide. FIG. 3A illustrates a first polynucleotide (SEQ ID NO: 1) with the position of
`
`the cut sites. FIG. 3B illustrates a second polynucleotide (SEQ ID NO: 2) with the position of
`
`the cut sites. FIG. 3C illustrates a third polynucleotide (SEQ ID NO: 3) with the position of the
`
`cut sites. FIG. 3D illustrates a fourth polynucleotide (SEQ ID NO: 4) with the position of the cut
`
`sites.
`
`[0034]
`
`FIG. 4 illustrates a non-limiting representation of a process flow according to
`
`some embodiments.
`
`[0035]
`
`FIGS. 5A-SE is a non-limiting schematic representation of steps of the process
`
`flow. The symbol “x” in a sequence denotes a sequenceerror in the nucleic acid molecule. FIG.
`
`5A illustrates the barcoding of constructs 1, 2, ..., N using random endcap barcodes , (bc). FIG.
`
`5B illustrates the dilution step to an average of M clones. FIG. 5C illustrates the amplification
`
`step and the split out of an aliquot for fishout. FIG. 5D illustrates the barcoding of constructs
`
`using transpososomes loaded barcodes. FIG. 5E illustrates the sequencingstep.
`
`DETAILED DESCRIPTION OF THE INVENTION
`
`[0036]
`
`Techniques have been developed for de novo nucleic acid assembly whereby
`
`nucleic acids are made (c.g., chemically synthesized) and assembled to produce longer target
`
`nucleic acids of intcrest.
`
`For cxample, different multiplex assembly techniques are being
`
`developed for assembling oligonucleotides into larger synthetic nucleic acids. Currently there is
`
`significant interest in the synthesis of long polynucleotides in the range of more than 1Kb, 2 Kb
`
`or greater. However, one limitation of currently available assembly techniquesis the relatively
`
`high error rate. Once synthesized there is a need to verify that the final nucleic acid construct
`
`has the correct sequence and in many cases to guarantee that the final construct is clonal. There
`
`is therefore a need to isolate error free nucleic acid constructs having a predetermined sequence
`
`and discarding constructs having nucleic acid errors.
`
`[0037]
`
`Conventional methods for such verification comprise cloning the construct
`
`followed by sequencing. Recently methods have been described for producing sequence verified
`
`clonal short polynucleotide constructs (<~ 1 kB) without the need for cloning. The methods
`
`described in U.S. Applications Serial Number 13/986,366 and 13/986,368 (which are
`
`8
`
`
`
`WO 2015/017527
`
`PCT/US2014/048867
`
`incorporated herein by reference in their entirety), use unique barcodes, at the 5’ and/or 3’ ends
`
`of multiple candidate constructs. The pool of candidate constructs is then amplified and
`
`sequenced using next generation sequencing (NGS). Constructs which are verified to be
`
`sequence perfect can then be amplified up out of the pool based on their unique barcodes. This
`
`techniqueis efficient and low cost but may be limited to constructs which are of a length shorter
`
`or equal to the upper limit Lmax of the amplification technique employed by the next generation
`
`sequencing technique being used to sequence the candidate construct pool. As an example a
`
`leading NGS platform (Illumina) is based on bridge amplification and is limited to constructs of
`
`less than Linax ~ 1 Kb.
`
`Preparative in vitro cloning (IVC) methods
`
`[0038]
`
`Provided herein are preparative in vitro cloning methods orstrategies for de novo
`
`high fidelity nucleic acid synthesis.
`
`In some embodiments, the in vitro cloning methods can use
`
`oligonucleotide tags. Yet in other embodiments, the in vitro cloning methods do not necessitate
`
`the use of oligonucleotide tags.
`
`[0039]
`
`In some embodiments, the methods described herein allow for the cloning of
`
`nucleic acid sequences having a desired or predetermined sequence from a pool of synthetic
`
`nucleic acid molecules.
`
`In some embodiments, the methods may include analyzing the sequence
`
`of target nucleic acids for parallel preparative cloning of a plurality of target nucleic acids. For
`
`example, the methods described herein can include a quality control step and/or quality control
`
`readout to identify the nucleic acid molecules having the correct sequence.
`
`[0040]
`
`Oneskilled in the art will appreciate that the methods described herein can bypass
`
`the need for cloning via the transformation of cells with nucleic acid constructs in propagatable
`
`vectors (i.e. in vivo cloning). In addition, the methods described herein can eliminate the need to
`
`amplify candidate constructs separately before identifying the target nucleic acids having the
`
`desired sequences.
`
`[0041]
`
`It should be appreciated that after oligonucleotide assembly, the assembly product
`
`may contain a pool of sequences containing correct and incorrect assembly products. The errors
`
`may result from sequence errors introduced during the oligonucleotide synthesis, or during the
`
`assembly of oligonucleotides into longer nucleic acids.
`
`In some instances, up to 90% of the
`
`nucleic acid sequences may contain sequence errors and be unwanted sequences. Devices and
`
`methods to selectively isolate nucleic acids having a correct predetermined sequence from
`
`9
`
`
`
`WO 2015/017527
`
`PCT/US2014/048867
`
`nucleic acids having an incorrect sequence are provided herein. The nucleic acids having a
`
`correct sequence may be isolated by selectively isolating the nucleic acids having the correct
`
`sequence(s) from the nucleic acid having the incorrect sequences as by selectively moving or
`
`transferring the desired assembled polynucleotide of predefined sequence to a different feature of
`
`the support, or to another support (e.g. plate). Alternatively, polynucleotides having an incorrect
`
`sequence can be selectively removed from the feature comprising the polynucleotide of interest
`
`having the correct sequence.
`
`[0042]
`
`In some embodiments, each nucleic acid molecule can be tagged by adding a
`
`unique barcode or pair of unique barcodes to each end of the molecule.
`
`In some embodiments,
`
`diluting the nucleic acid molecules prior to attaching the oligonucleotide tags can allow for a
`
`reduction of the complexity of the pool of nucleic acid molecules thereby enabling the use of a
`
`library of barcodes of reduced complexity.
`
`In some embodiments, the tagged molecules can be
`
`amplified before fragmentation. Yet in other embodiments, the tagged molecules are amplified
`
`after fragmentation.
`
`In some embodiments, the oligonucleotide tag sequence can comprise a
`
`primer bindingsite for amplification.
`
`[0043]
`
`In some embodiments, the oligonucleotide tag sequence can be used as a primer-
`
`binding sitc. Amplified tagged molecules can be subjected to fragmentation and subjected to
`
`paired-read sequencing to associate barcodes with the desired target sequence. The barcodes can
`
`be used as primers to recover the sequence clones having the desired sequence. Amplification
`
`methods are well known in the art. Examples of enzymes with polymerase activity which can be
`
`used for amplification by PCR are NA polymerase (Klenow fragment, T4 DNA polymerase),
`
`heat stable DNA polymerases from a variety of thermostable bacteria (Taq, VENT, Pfu or Tfl
`
`DNApolymerases) as well as their genetically modified derivatives (TaqGold, VENTexo, Pfu
`
`exo), or KOD Hifi DNA polymerases.
`
`In some embodiments, amplification by chimeric PCR
`
`can be used to reduce signal to noise of barcode association.
`
`[0044]
`
`In some embodiments,
`
`the methods further comprise fragmenting the tagged
`
`source molecules and sequencing using MiSeq®, HiSeq® or higher throughput next generation
`
`sequencing platforms. With high throughput sequencing, enough coverage can be generated to
`
`reconstruct the consensus sequence of each tag pair construct and determine if the sequence is
`
`correct (i.e. error-free sequence).
`
`10
`
`
`
`WO 2015/017527
`
`PCT/US2014/048867
`
`[0045]
`
`In some embodiments, one read of each read pair is used for sequencing barcoded
`
`end. The read pairs without any barcodes can be filtered out. Sequencing error rate can be
`
`removed by consensus calling. Nucleic acid molecules having the desired sequence can be
`
`isolated, for example, using the barcodes as primers.
`
`[0046]
`
`As used herein, a “clonal nucleic acid” or “clonal population” or “clonal
`
`polynucleotide” are used interchangeably and refer to a clonal molecular population of nucleic
`
`acids, i.e. to nucleic acids that are substantially or completely identical to each other.
`
`In some
`
`embodiments,
`
`the nucleic acid sequences
`
`(construction oligonucleotides, polynucleotide
`
`constructs, assembly intermediates or assembled nucleic acid of interest) may first be diluted in
`
`order to obtain a clonal population of target nucleic aids (i.e. a population containing a single
`
`target nucleic acid sequence). Accordingly, the dilution based protocol provides a population of
`
`nucleic acid molecules being substantially identical or identical
`
`to each other.
`
`In some
`
`embodiments, the polynucleotides can be diluted serially. The concentration and the number of
`
`molecules can be assessed prior to the dilution step and a dilution ratio can be calculated in order
`
`to produce a clonal population.
`
`[0047]
`
`In some embodiments,
`
`the tagged molecules are diluted down to an average
`
`number of clones for cach construct. The diluted tagged molecules can be amplificd.
`
`In some
`
`embodiments,
`
`the amplified tagged molecules are diluted prior to be subjected to internal
`
`barcoding, as described herein.
`
`Production of long sequence verified polynucleotide constructs
`
`[0048]
`
`Aspects of the invention can be used for the clone free production of clonal
`
`sequence verified long length (> 1 Kb) polynucleotides.
`
`[0049]
`
`In some aspects of the invention, methods for the clone free production of clonal
`
`sequence verified long length (> 1 Kb) polynucleotides are provided.
`
`In some embodiments,
`
`polynucleotide constructs of length greater than Lmax can be assembled.
`
`In some embodiments,
`
`the methods comprise labeling the ends of each assembled polynucleotide or construct with
`
`unique barcodes.
`
`In some embodiments,
`
`the barcoded constructs can be fragmented into
`
`fragments having a size infcrior to Lmax in a manner in which cach side of the break junctions is
`
`labeled with additional correlated polynucleotide barcodes. One of skill in the art will appreciate
`
`that using such approach,it is possible to determine the sequence ofthe original long sequence.
`
`11
`
`
`
`WO 2015/017527
`
`PCT/US2014/048867
`
`In some embodiments, the polynucleotides which are determined to have the correct sequence
`
`may be amplified out of a pool by meansof their unique 5’ and 3’ barcodes.
`
`[0050]
`
`Yet in some embodiments, after assembling polynucleotide constructs of length
`
`greater than Lmax, the ends of each assembled polynucleotide can be labeled with unique
`
`barcodes.
`
`In some embodiments, the barcoded long construct can be fragmented into fragments
`
`having a size inferior to Lmax in which the internal break points are at random locations. The
`
`particular random point of breakage can act as a unique (or semi-unique) label for a particular
`
`molecule such that the fragments can be sequencedstarting from either the left or right barcode.
`
`Using such approachit is possible to determine the sequence of the original long sequence. Ina
`
`subsequent step, the polynucleotides which are determined to have the correct sequence may be
`
`amplified out of a pool by means of their unique 5’ end (left) and 3’ end (right) barcodes.
`
`[0051]
`
`Aspects of the invention can be used to isolate nucleic acid molecules from large
`
`numbers of nucleic acid fragments efficiently, and/or to reduce the number of steps required to
`
`generate large nucleic acid products, while reducing error rate. Aspects of the invention can be
`
`incorporated into nucleic assembly procedures to increase assembly fidelity, throughput and/or
`
`efficiency, decrease cost, and/or reduce assembly time.
`
`In some embodiments, aspects of the
`
`invention may be automated and/or i