`
`(19) World Intellectual Property
`Organization
`International Bureau
`
`(43) International Publication Date
`8 April 2004 (08.04.2004)
`
`
`
`(51)
`
`(21)
`
`(22)
`
`International Patent Classification’:
`
`C12N
`
`(74)
`
`International Application Number:
`PCT/US2003/030940
`
`(81)
`
`International Filing Date:
`26 September 2003 (26.09.2003)
`
`(25)
`
`Filing Language:
`
`(26)
`
`Publication Language:
`
`English
`
`English
`
`(30)
`
`Priority Data:
`60/414,085
`
`26 September 2002 (26.09.2002)
`
`US
`
`(71)
`
`(72)
`(75)
`
`Applicant (for all designated States except US): KOSAN
`BIOSCIENCES,INC. [US/US]; 3832 Bay Center Place,
`Hayward, CA 94545 (US).
`
`Inventors; and
`Inventors/Applicants (for US only): SANTI, Daniel, V.
`[IN/US]; 211 Belgrave Avenue, San Francisco, CA 94117
`(US). REID, Ralph, C. [US/US]; 600 Galerita Way, San
`Rafael, CA 94903 (US). KODUMAL,Sarah,J. [US/US];
`3933 Harrison Street, Apartment # 102, Oakland, CA
`94611 (US). JAYARAJ, Sebastian [IN/US]; 1709 Shat-
`tuck Avenue, Apartment # 214, Berkeley, CA 94709 (US).
`
`(10) International Publication Number
`WO 2004/029220 A2
`
`Agents: APPLE, Randolph,Ted et al.; Morrison & Foer-
`ster LLP, 755 Page Mill Road, Palo Alto, CA 94304 (US).
`
`Designated States (national): AE, AG, AL, AM, Al, AU,
`AZ, BA, BB, BG, BR, BY, BZ, CA, CH, CN, CO, CR, CU,
`CZ, DE, DK, DM, DZ, EC, EE, EG, ES, FI, GB, GD, GE,
`GH, GM, HR, HU,ID,IL, IN, IS, JP, KE, KG, KP, KR,
`KZ, LC, LK, LR, LS, LT, LU, LV, MA, MD, MG, MK,
`MN, MW, MX, MZ, NI, NO, NZ, OM, PG, PH, PL, PT,
`RO, RU, SC, SD, SE, SG, SK, SL, SY, TJ, TM, TN, TR,
`TT, TZ, UA, UG, US, UZ, VC, VN, YU, ZA, ZM, ZW.
`
`(84)
`
`Designated States (regional): ARIPO patent (GH, GM,
`KE, LS, MW, MZ, SD, SL, SZ, TZ, UG, ZM, ZW),
`Eurasian patent (AM, AZ, BY, KG, KZ, MD, RU, TJ, T),
`Luropean patent (AT, BE, BG, CH, CY, CZ, DE, DK, EE,
`ES, FI, FR, GB, GR, HU, IK, IT, LU, MC, NL, PT, RO,
`SE, SI, SK, TR), OAPI patent (BF, BJ, CF, CG, CI, CM,
`GA, GN, GQ, GW, ML, MR,NE, SN, TD, TG).
`
`Published:
`without international search report and to be republished
`upon receipt of that report
`
`For two-letter codes and other abbreviations, refer to the "Guid-
`ance Notes on Codes and Abbreviations" appearing at the begin-
`ning of each regular issue of the PCT Gazette.
`
`(54)
`
`Title: SYNTHETIC GENES
`
`(57) Abstract: The invention providesstrategies, methods, vectors, reagents, and systems for production of synthetic genes, produc-
`tion of libraries of such genes, and manipulation and characterization of the genes and corresponding encoded polypeptides. In one
`aspect, the synthetic genes can encode polyketide synthase polypeptides and facilitate production of therapeutically or commercially
`important polyketide compounds.
`
`
`
`WO2004/029220A2[IITNIMNINIIINAITINTTTAAMINOTAM
`
`
`
`WO 2004/029220
`
`PCT/US2003/030940
`
`SYNTHETIC GENES
`
`STATEMENT CONCERNING GOVERNMENT SUPPORT
`
`Subject matter disclosed in this application was made, in part, with government
`[0001]
`support under NationalInstitute of Standards and Technology ATP Grant No. 7ONANB2H3014.
`As such, the United States government may havecertain rights in this invention.
`
`CROSS-REFERENCE TO RELATED APPLICATIONS
`
`This application claims benefit under 35 U.S.C. § 119(e) ofprovisional application
`[0002]
`No.60/414,085, filed 26 September 2002, the contents of which are incorporated herein by
`
`reference.
`
`FIELD OF THE INVENTION
`
`The invention provides strategies, methods, vectors, reagents, and systems for
`[0003]
`production of synthetic genes, production oflibraries of such genes, and manipulation and
`characterization of the genes and corresponding encoded polypeptides. In one aspect, the
`synthetic genes can encode polyketide synthase polypeptides and facilitate production of
`therapeutically or commercially important polyketide compounds. The invention finds
`application in the fields ofhuman and veterinary medicine, pharmacology, agriculture, and
`molecular biology.
`
`BACKGROUND
`
`Polyketides represent a large family of compounds produced by fungi, mycelial
`[0004]
`bacteria, and other organisms. Numerous polyketides have therapeutically relevant and/or
`commercially valuable activities. Examples of useful polyketides include erythromycin, FK-
`506, FK-520, megalomycin, narbomycin, oleandomycin, picromycin, rapamycin, spinocyn, and
`
`tylosin.
`Polyketides are synthesized in nature from 2-carbon units through a series of
`[0005]
`condensations and subsequent modifications by polyketide synthases (PKSs). Polyketide
`
`
`
`WO 2004/029220
`
`PCT/US2003/030940
`
`synthases are multifunctional enzyme complexes composed of multiple large polypeptides. Each
`of the polypeptide components of the complex is encoded by a separate open reading frame, with
`the open reading frames corresponding to a particular PKS typically being clustered together on
`the chromosome. Thestructure of PKSs and the mechanismsof polyketide synthesis are
`reviewed in Caneet al., 1998, “Harnessing the biosynthetic code: combinations, permutations,
`and mutations” Science 282:63-8.
`
`PKSpolypeptides comprise numerous enzymatic and carrier domains, including
`[0006]
`acyltransferase (AT), acy! carrier protein (ACP), and beta-ketoacylsynthase (KS)activities,
`involved in loading and condensation steps; ketoreductase (KR), dehydratase (DH), and
`enoylreductase (ER) activities, involved in modification at 8-carbon positions ofthe growing
`chain, and thioesterase (TE) activities involved in release ofthe polyketide from the PKS.
`Various combinations of these domains are organized in units called “modules.” For example,
`the 6-deoxyerythronolide B synthase ("DEBS"), which is involved in the production of
`erythromycin, comprises 6 modules on three separate polypeptides (2 modules per polypeptide).
`The number, sequence, and domain content of the modules of a PKS determinethe structure of
`the polyketide product of the PKS.
`[0007]
`Given the importanceofpolyketides, the difficulty in producing polyketide
`compoundsbytraditional chemical methods, and the typically low production of polyketides in
`wild-type cells, there has been considerable interest in finding improvedor alternate means for
`producing polyketide compounds. This interest has resulted in the cloning, analysis and
`manipulation by recombinant DNAtechnology of genes that encode PKS enzymes. The resulting
`technology allows one to manipulate a known PKSgenecluster to produce the polyketide
`synthesized by that PKS at higher levels than occur in nature, or in hosts that otherwise do not
`producethe polyketide. The technologyalso allows one to produce molecules that are
`structurally related to, but distinct from, the polyketides produced from known PKSgeneclusters
`by inactivating a domain in the PKS and/or by adding a domain not normally found in the PKS
`though manipulation of the PKS gene.
`[0008]
`While the detailed understanding of the mechanisms by which PKS enzymes function
`and the development of methods for manipulating PKS genes have facilitated the creation of
`novel polyketides, there are presently limits to the creation ofnovel polyketides by genetic
`engineering. One suchlimit is the availability ofPKS genes. Many polyketides are known but
`
`
`
`WO 2004/029220
`
`PCT/US2003/030940
`
`only a relatively small portion of the corresponding PKS genes have been cloned and are
`available for manipulation. Moreover, in many instances the organism producing an interesting
`polyketide is obtainable only with greatdifficulty and expense, and techniquesfor its growth in
`the laboratory and, production of the polyketideit produces are unknown ordifficult or time-
`consuming to practice. Also, even ifthe PKS genes for a desired polyketide have been cloned,
`those genes maynotserve to drive the level of production desired in a particular hostcell.
`[0009]
`Ifthere was a method to produce a desired polyketide without having to access the
`genes that encode the PKS that produces the polyketide, then manyofthese difficulties could be
`ameliorated or avoided altogether. The present invention meets this and other needs.
`
`BRIEF SUMMARY OF THE INVENTION
`In one aspect, the invention provides a synthetic gene encoding a polypeptide
`.[0010}
`segment that correspondsto a reference polypeptide segment encoded by a naturally occurring
`gene. The polypeptide segment-encoding sequence of the synthetic gene is different from the
`polypeptide segment-encoding sequenceof the naturally occurring gene. In one aspect, the :
`polypeptide segment-encoding sequenceof the synthetic geneis less than about 90% identical to
`the polypeptide segment-encoding sequenceofthe naturally occurring gene, or in some
`embodiments, less than about 85% or Jess than about 80% identical. In one aspect, the
`polypeptide segment-encoding sequence ofthe synthetic gene comprisesat least one (and in
`other embodiments, more than one,e.g., at least two,at least three, or at least four) unique
`restriction sites that are not present or are not unique in the polypeptide segment-encoding
`sequence ofthe naturally occurring gene. In an aspect, the polypeptide segment-encoding
`sequence of the synthetic gene is free from at least onerestriction site that is present in the
`polypeptide segment-encoding sequence ofthe naturally occurring gene. In an embodiment of
`the invention, the polypeptide segment encodedby the synthetic gene correspondsto at least 50
`contiguous amino acid residues encoded by the naturally occurring gene.
`[0011]
`in an embodiment, the polypeptide segmentis from a polyketide synthase (PKS) and
`maybe or include a PKS domain(e.g., AT, ACP, KS, KR, DH, ER, and/or TE) or one or more
`PKS modules. In some embodiments, the synthetic PKS genehas, at most, one copy per
`module-encoding sequence ofa restriction enzymerecognitionsite selected from the group
`consisting of Spe I, Mfe I, Afi Il, Bsi WI, Sac II, Ngo MIV, Nhe I, Kpn I, MscI, Bgl I, Bss HO,
`
`
`
`WO 2004/029220
`
`PCT/US2003/030940
`
`Sac II, Age I, Pst I, Kas I, Mlu I, Xba I, Sph I, Bsp E, and Ngo MIV recognition sites. In an
`embodiment, the polypeptide segment-encoding sequence of the synthetic geneis free from at
`
`least one Type IIS enzymerestriction site (e.g., Bci VI, Bmr I, Bpm I, Bpu EI, Bse RI, BsgI, Bsr
`Di, BtsI, Eci I, Ear I, Sap I, Bsm BI, Bsp MI, BsaI, Bbs I, Bfu AI, Fok I and Alw J) present in
`the polypeptide segment-encoding sequenceofthe naturally occurring gene.
`[0012]
`In a related embodiment, the invention provides a synthetic gene encoding a
`polypeptide segmentthat corresponds to a reference polypeptide segment encodedbya naturally
`occurring PKS gene, where the polypeptide segment-encoding sequenceofthe synthetic geneis
`different from the polypeptide segmentencoding sequenceofthe naturally occurring PKS gene
`and comprises at least two of (a) a Spe I site near the sequence encoding the amino-terminus of
`the module; (b) a Mfe I site near the sequence encoding the amino-terminus of a KS domain;(c)
`a KpnIsite near the sequence encoding the carboxy-terminus of a KS domain; (d) a MscIJsite
`near the sequence encoding the amino-terminus of an AT domain;(e) a Pst I site near the
`
`sequence encoding the carboxy-terminus of an AT domain; (f) a Bsr BI site near the sequence
`encoding the amino-terminus of an ER domain; (g) an Age I site near the sequence encoding the
`amino-terminus of a KR domain; and(h) an XbaI site near the sequence encoding the amino-
`
`terminus of an ACP domain.
`
`[0013]
`In related aspects, the invention provides a vector (e.g., cloning or expression vector)
`comprising a synthetic gene ofthe invention. In an embodiment, the vector comprises an open
`reading frame encoding a first PKS module and one or more of (a) a PKS extension module; (b)
`‘a PKS loading module; (c) a releasing (e.g., thioesterase) domain; and (d) an interpolypeptide
`linker.
`
`Cells that comprise or express a gene or vector of the invention are provided, as well
`[0014]
`as a cell comprising a polypeptide encodedbythe vectoror, a functional polyketide synthase,
`wherein the PKS comprises a polypeptide encoded by the vector. In one aspect, a PKS .
`polypeptide having a non-natural amino sequenceis provided, such as a polypeptide
`characterized by a KS domain comprising the dipeptide Leu-Gin at the carboxy-terminal edge of
`the domain; and/or an ACP domain comprising the dipeptide Ser-Ser at the carboxy-terminal
`edge of the domain. A method is provided for making a polyketide comprising culturing a cell
`comprising a synthetic DNA ofthe invention under conditions in which a polyketide is
`produced, wherein the polyketide would not be produced bythe cell in the absence of the vector.
`
`
`
`WO 2004/029220
`
`PCT/US2003/030940
`
`In oneaspect, the invention provides a method for high throughput synthesis of a
`[0015]
`plurality of different DNA units comprising different polypeptide encoding sequences
`comprising: for each DNA unit, performing polymerase chain reaction (PCR) amplification ofa
`plurality of overlapping oligonucleotides to generate a DNA unit encoding a polypeptide
`segment and adding UDG-containing linkers to the 5’ and 3’ ends of the DNA unit by PCR
`amplification, thereby generating a linkered DNA unit, wherein the same UDG-containing
`linkers are added to said different DNA units. In embodiments, the plurality comprises more
`than 50 different DNA units, more than 100 different DNA units, or more than 500 different
`
`DNAunits (synthons). In a related aspect, the invention provides a method for producing a
`vector comprising a polypeptide encoding sequence comprising cloning the linkered DNA unit
`into a vector using a ligation-independent-cloning method.
`
`The invention provides gene libraries. In one embodiment, a genelibrary is provided
`[0016]
`that contains a plurality of different PKS module-encoding genes, where the module-encoding
`genes in the library have at least one (or more than one, such as at least 3, at least 4, at least 5 or
`
`at least 6) restriction site(s) in common,the restriction site is found no more than one time in
`each module, and the modules encoded in the library correspond to modules from five or more
`different polyketide synthase proteins. Vectors for gene libraries include cloning and expression
`
`vectors. In some embodiments, a library includes open reading frames that contain an extension
`
`module andat least one of a second PKS extension module, a PKS loading module, a
`thioesterase domain, and an interpolypeptidelinker.
`
`In a related aspect, the invention provides a method for synthesis of an expression
`[0017]
`library ofPKS module-encoding genes by making a plurality of different PKS module-encoding
`genes as described above and cloning each gene into an expression vector. The library may
`include, for example, at least about 50 or at least about 100 different module-encoding genes.
`[0018]
`The invention provides a variety of cloning vectors useful for stitching (e.g., a vector
`comprising, in the order shown, SM4 — SIS — SM2 — R; or L— SIS — SM2 ~ R, where SIS isa
`
`synthon insertion site, SM2 is a sequence encodinga first selectable marker, SM4 is a sequence
`encoding a second selectable marker different from thefirst, R; is a recognition site for a
`
`restriction enzyme, and L is a recognition site for a different restriction enzyme. The invention
`further provides vectors comprising synthon sequences, e.g. comprising, in the order shown,
`SM4 — 2S — Sy; —2S2 —SM2~R, or L— 28, —Sy2—2S2 ~SM2 —R,where 2S; is a
`
`
`
`WO 2004/029220
`
`PCT/US2003/030940
`
`recognitionsite for first Type IIS restriction enzyme, 2S2is a recognition site for a different Type
`IIS restriction enzyme, and Sy is synthon coding region. Also provided are compositions of a
`vector and a TypeIIS orotherrestriction enzymethat recognizes a site on the vector,
`compositions comprising cognate pairs of vectors,kits, and thelike.
`[0019]
`In one embodiment, the invention provides a vector comprisinga first selectable
`marker, a restriction site (R;) recognized bya first restriction enzyme, and a synthon coding
`region that is flanked by a restriction site recognized by a first Type IIS restriction enzyme and a
`restriction site recognized by a second TypeIISrestriction enzyme, wherein digestion of the
`vector with the first restriction enzyme andthefirst Type IIS restriction enzyme produces a
`fragment comprising the first selectable marker and the synthon coding region, and digestion of
`the vector with the first restriction enzyme and the second Type IISrestriction enzyme produces
`
`a fragment comprising the synthon codingregion and not comprising the first selectable marker.
`In an embodiment, the vector comprising a second selectable marker wherein digestion of the
`
`vector with the first restriction enzyme andthe firstType IS restriction enzyme produces a
`fragment comprising the first selectable marker and the synthon coding region, and not
`comprising the second selectable marker, digestion of the vector with the first restriction enzyme-
`and the second TypeIISrestriction enzyme produces a fragment comprising the second
`selectable marker and the synthon coding region, and not comprising the first selectable marker.
`The invention provides methodsofstitching adjacent DNA units (synthons) to synthesize a
`larger unit. For example, the invention provides a method for making a synthetic gene encoding
`a PKS module by producinga plurality (i.e., at least 3) of DNA units by assembly PCR, wherein
`each DNAunit encodesa portion of the PKS module and combining the plurality of DNA units
`in a predetermined sequence to produce PKS module-encoding gene. In an embodiment, the
`methodincludes combining the module-encoding gene in-frame with a nucleotide sequence
`encoding a PKSextension module, a PKS loading module, a thioesterase domain, or an PKS
`interpolypeptide linker, to produce a PKS open reading frame.
`[0020]
`In arelated embodiment, the invention provides a methodfor joining a series of
`DNAunits using a vector pair by a) providing a first set of DNA units, each in a first-type
`selectable vector comprising a first selectable marker and providing a second set ofDNA units,
`each in a second-type selectable vector comprising a second selectable marker different from the
`first, wherein the first-type and second-type selectable vectors can be selected based on the
`
`
`
`WO 2004/029220
`
`PCT/US2003/030940
`
`different selectable markers, b) recombinantly joining a DNA unit from thefirst set with an
`adjacent DNAunit from the second set to generate a first-type selectable vector comprising a
`third DNA unit, and obtaining a desired clone by selecting for the first selectable marker c)
`recombinantly joining the third DNA unit with an adjacent DNA unit from the secondset to
`
`generate a first-type selectable vector comprising a fourth DNA unit, and obtaining a desired
`clone by selecting for the first selectable marker, or recombinantly joining the third DNA unit
`
`with an adjacent DNA unit from the second set to generate a second-type selectable vector
`
`comprising a fourth DNA unit, and obtaining a desired clone by selecting for the second
`
`selectable marker. In an embodiment, the step (c) comprises recombinantly joining the third
`
`DNAunit with an adjacent DNA unit from the second set to generate a first-type selectable
`
`vector comprising a fourth DNA unit, and obtaining a desired clone by selecting for the first
`selectable marker, the method further comprising recombinantly combining the fourthDNA unit
`with an adjacent DNA unit from the second set to generate.a first-type selectable vector
`
`comprising a fifth DNA unit, and obtaining a desired clone by selecting for the first selection
`
`marker, or recombinantly combining the third DNA unit with an adjacent DNA unit from the
`
`second set to generate a second-type selectable vector comprising a fifth DNA unit, and
`
`obtaining a desired clone by selecting for the second selection marker. In an embodiment, step
`
`(c). comprises recombinantly joining the third DNA unit with an adjacent DNA unit from the
`
`second series to generate a second-type selectable vector comprising a fourth DNA unit, and
`
`obtaining a desired clone by selecting for the second selectable marker, the method further
`
`comprising recombinantly joining the fourth DNA unit with an adjacent DNA unit from thefirst
`
`set to generate a first-type selectable vector comprising a fifth DNA unit, and obtaining a desired
`
`clone byselecting for the first selection marker, or recombinantly joining the third DNA unit
`
`with an adjacent DNA unit from the second set to generate a first-type selectable vector
`
`comprising a fifth DNA unit and obtaining a desired clone by selecting for the first selection
`
`marker.
`
`[0021]
`
`In a related aspect, the invention provides a methodfor joining a series of DNA units
`
`to generate a DNA construct by (a) providing a first plurality of vectors, each comprising a DNA
`
`unit and a first selectable marker; (b) providing a second plurality of vectors, each comprising a
`
`DNAunit and a second selectable marker; (c) digesting a vector from (a) to producea first
`
`fragment containing a DNA unit andat least one additional fragment not containing the DNA
`
`
`
`WO 2004/029220
`
`PCT/US2003/030940
`
`unit; (d) digesting a DNA from (b) to produce a second fragment containing a DNA unit andat
`least one additional fragment not containing the DNA unit, where only oneofthefirst and
`second fragments contains an origin ofreplication; ligating the fragments to generate a product
`vector comprising a DNA unit from (c) ligated to a DNA unit from (d); selecting the product
`vector by selecting for either the first or second selectable marker; (e) digesting the product
`vector to produce a third fragment containing a DNA unit andat least one additional fragment
`not containing the DNA unit; (d) digesting a DNA from (a) or (b) to produce a fourth fragment
`containing a DNA unit andat least one additional fragment not containing the DNA unit, where
`only one of the third and fourth fragments contains an origin ofreplication; (f) ligating the third
`and fourth fragments to generate a product vector comprising a DNA unit from (e) ligated to a
`DNAunit from (d) and selecting the product vector by selecting for either the first or second
`
`selectable marker.
`
`.
`
`In another aspect, an open reading frame vector is provided, which has an internal
`[0022]
`type {4-[7-*]-[*-8]-3}, left-edge type {4-[7-1]-[*-8]-3} or right-edge type {4-[7-*]-[6-8]-3}
`architecture where 7 and 8 are recognition sites for Type IIS restriction enzymes which cut to
`produce compatible overhangs “*” ; 1 and 6 are Type II restriction enzymesites that are
`optionally present; and 3 and4are recognition sites for restriction enzymes with 8-basepair
`recognition sites. In various embodiments, 1 is Nde I and/or 6 is Eco RI and/or 4 is Not I and/or
`3 is Pac I.
`In another aspect, a methodfor identifying restriction enzyme recognition sites useful
`[0023]
`for design of synthetic genes is provided. The method includes thesteps of obtaining amino acid
`sequences for a plurality of functionally related polypeptide segments; reverse-translating the
`amino acid sequences to produce multiple polypeptide segment-encoding nucleic acid sequences
`for each polypeptide segment; andidentifying restriction enzyme recognitionsites that are found
`in at least one polypeptide segment-encoding nucleic acid sequence ofat least about 50% ofthe
`polypeptide segments. In certain embodiments,the functionally related polypeptide segments
`are polyketide synthase modules or domains, such as regions of high homology in PKS modules
`
`or domains.
`
`In amethodfor designing a synthetic gene in accordance with the present invention a
`[0024]
`reference amino acid sequenceis provided and reverse translated to a randomized nucleotide
`sequence which encodesthe amino acid sequence using a random selection of codons which,
`
`
`
`WO 2004/029220
`
`PCT/US2003/030940
`
`optionally, have been optimized for a codon preference of a host organism. One or more
`parameters for positions of restriction sites on a sequence of the synthetic gene are provided and
`
`occurrences of one or more selected restriction sites from the randomized nucleotide sequence
`
`are removed. One or moreselected restriction sites are inserted at selected positions in the
`
`randomized nucleotide sequence to generate a sequence of the synthetic gene.
`
`[0025]
`
`In one aspect of the invention, a set of overlapping oligonucleotide sequences which
`
`together comprise a sequence of the synthetic gene are generated.
`
`[0026]
`
`In another aspect of the invention, one or more parametersfor positionsofrestriction
`
`sites on a sequence of the synthetic gene comprise one or more preselected restriction sites at
`
`selected positions.
`
`{0027}
`
`In another aspect of the invention, the selected position of the preselected restrictions
`
`site corresponds to a positions selected from the group consisting of a synthon edge, a domain
`
`edge and a module edge.
`
`[0028]
`
`In another aspect of the invention, providing one or more parameters for positions of
`
`restriction sites on a sequence of the synthetic gene is followed by predicting all possible
`
`restriction sites that can be inserted in the randomized nucleotide sequence and optionally,
`
`identifying one or more uniquerestriction sites.
`
`[0029]
`
`In another aspect of the invention, the sequence of the synthetic gene is divided into a
`
`series of synthons of selected length and then a set of overlapping oligonucleotide sequencesis
`
`generated which together comprise a sequence of each synthon.
`
`[0030]
`
`In another aspect of the invention, the set of overlapping oligonucleotide sequences
`
`comprise (a) oligonucleotide sequences which together comprise a synthon coding region
`
`corresponding to the synthetic gene, and (b) oligonucleotide sequences which comprise one or
`
`more synthon flanking sequences.
`
`[0031]
`
`In another aspect of the invention, one or more quality tests are performed on the set
`
`of overlapping oligonucleotide sequences, wherein the tests are selected from the group
`
`consisting of: translational errors, invalid restriction sites, incorrect positions ofrestriction sites,
`
`and aberrant priming.
`
`[0032]
`
`In another aspect of the invention, each oligonucleotide sequence is of a selected
`
`length and comprises an overlap of a predetermined length with adjacent oligonucletides of the
`set of oligonucleotides which together comprise the sequence of the synthetic gene.
`
`
`
`WO 2004/029220
`
`PCT/US2003/030940
`
`In another aspect ofthe invention, each oligonucleotide is about 40 nucleotides in
`[0033]
`length and comprises overlaps ofbetween about 17 and 23 nucleotides with adjacent
`oligonucleotides.
`{0034}
`In another aspectof the invention, a set of overlapping oligonucleotide sequences are
`selected wherein each oligonucleotide anneals with its adjacent oligonucleotide within a selected
`
`temperature range.
`[0035]
`In another aspect of the invention, generating a set of overlapping oligonucleotide
`sequencesincludes providing an alignment cutoff value for sequence specificity, aligning each
`oligonucleotide sequence with the sequenceof the synthetic gene and determiningits alignment
`value, and identifying andrejecting oligonucleotides comprising alignment values lowerthan the
`
`alignment cutoffvalue.
`10036)
`In another aspectofthe invention, a regionoferror in a rejected oligonucleotide is
`identified and optionally, one or more nucleotides in the region oferror are substituted such that
`the alignmentvalue ofthe rejected oligonucleotide is raised above the alignment cutoffvalue.
`{0037]
`In another aspect of the invention, an orderlist of oligonucleotides which comprise a
`synthetic gene or a synthon is generated.
`|
`[0038]
`In another aspect of the invention, removing ofrestriction sites includes
`[0039]
`identifying positions of preselected restriction sites in the randomized nucleotide
`sequence, identifying an ‘ability of one or more codons comprising the nucleotide sequence ofthe
`restriction site for accepting a substitution in the nucleotide sequenceofthe restriction site
`wherein such substitution will (a) remove the restriction site and (b) create a codon encoding an
`aminoacid identical to the codon whose sequence has been changed, and changing the sequence
`
`of therestriction site at the identified codon.
`[0040]
`In another aspect ofthe invention, inserting of restriction sites includes identifying
`selected positions for insertion of a selectedrestriction site in the randomized nucleotide
`sequence, performing a substitution in the nucleotide sequence at the selected position such that
`the selected restriction site sequence is created at the selected position, translating the substituted
`sequence to an amino acid sequence, and accepting a substitution wherein the translated amino
`acid sequenceis identical to the reference amino acid sequence at the selected position and
`rejecting a substitution wherein the translated amino acid sequenceis different from the
`reference amino acid sequenceat the selected position.
`
`10
`
`
`
`WO 2004/029220
`
`PCT/US2003/030940
`
`{0041]
`
`In another aspect of the invention, a translated amino acid sequence identical to the
`
`reference amino acid sequence comprises substitution of an amino acid with a similar amino acid
`
`at the selected position.
`
`[0042]
`
`In another aspect of the invention, the synthetic gene encodes a PKS module.
`
`In another aspect of the invention, the reference amino acid sequenceis of a naturally
`[0043]
`occurring polypeptide segment.
`
`In another aspect of the invention, one or more steps of the method may performed by
`[0044]
`a programmed computer.
`
`[0045]
`
`In another aspect of the invention, a computer readable storage medium contains
`
`computer executable code for carrying out the method ofthe present invention.
`
`[0046]
`
`In a method for analyzing a nucleotide sequence of a synthon in accordance with the
`
`present invention, a sequence of a synthetic gene is provided, wherein the synthetic geneis
`
`divided into a plurality of synthons. Sequences ofa plurality of synthon samplesare also
`
`provided wherein each synthon ofthe plurality of synthons is cloned in a vector. And, a
`
`. sequence of the vector without an insert is provided. Vector sequences from the sequence of the
`
`cloned synthonare eliminated and a contig map of sequencesofthe plurality of synthonsis
`constructed. The contig map of sequencesis aligned with the sequence ofthe synthetic gene;
`and a measure of alignment for each of the plurality of synthonsis identified.
`
`[0047]
`In another aspect of the invention, errors in one or more synthon sequences are
`identified; and one or more informations are reported, the informations selected from the group
`
`consisting of: a ranking of synthon samples by degree of alignment, an error in the sequence of a
`
`synthon sample, and identity of a synthon that can be repaired.
`
`[0048]
`
`In another aspect of the invention,a statistical report on a plurality of alignment
`
`errors is prepared.
`
`[0049]
`A system for high through-put synthesis of synthetic genes in accordance with the
`present invention includes a source microwell plate containing oligonucleotides for assembly
`PCR, a first source for amplification mixture including polymerase and buffers useable for
`
`assembly PCR, a second source for LIC extension primer mixture, and a PCR microwell plate
`for amplification of oligonucleotides. A liquid handling deviceretrieves a plurality of
`
`predetermined sets of oligonucleotides from the source microwell plate(s), combines the
`predetermined sets and the amplification mixture in wells of the PCR microwell plate, LIC
`
`11
`
`
`
`WO 2004/029220
`
`PCT/US2003/030940
`
`extension primer mixture, and combines the LIC extension primer mixture and ampliconsin a
`well ofthe PCR microwell plate. The system also includes a heat source for PCR amplification
`configured to accept the at least one PCR microwell plate.
`
`BRIEF DESCRIPTION OF THE FIGURES
`FIGURE 1 shows a UDG-cloningcassette (“cloning linker”) and a scheme ofvector
`[0050]
`preparation for ligation-independent cloning (LIC) using the nicking endonuclease N. BbvC IA.
`FIGURE 1A. UDG-cloning cassette. Sac I and nicking enzymesites used in vector preparation
`are labeled. FIGURE 1B. Schemeof vector preparation for LIC using nicking endonuclease N.
`BbvC IA.
`
`[0051] FIGURE2illustrates the Method S joining method using Bbs I and Bsa I as the Type
`
`IIS restriction enzymes.
`[0052]
`. FIGURE 3A showsthe Method S joining method using Vector Pair 1. FIGURE 3B
`shows the MethodS joining using Vector Pair II. 2S)4 are recognition sites for Type IIS
`«
`restriction enzymes, and A, B, B and C,respectively, are the cleavage sites for the enzymes.
`[0053]
`FIGURE 4 showsa vectorpair useful for stitching. FIGURE 4A: Vector pKos293-
`172-2. FIGURE 4B: Vector pKos293-172-A76. Both vectors contain a UDG-cloningcassette
`with N.Bbv C IA recognit