a
`
`714 rM5m.in}
`(51%-
`
`my),r11).3,1 71/I~.lq/
`W,.
`
`Scalable gene synthesis by selective amplification
`of DNA pools from highmlidellty microchips
`
`Sriram Kosurim’fi, Nikolai Eroshenkomfi, Emily M LeProust‘l, Michael Super}, leffrey ‘Wa '1,
`tin Billy H25 St George M lCIl‘nurchL2
`
`Development of cheap, highnthroughput and reliable gene
`synthesis methods will broadly stimulate progress in biology
`and hiotechnologyl. Currently, the reliance on column-
`synthesized oligonuoieotides as a source of DNA limits further
`cost reductions in gene synthesis? Oligonucieotides from
`DNA microchips can reduce costs by at least. an order of
`magnitude3“5, yet efforts to scale their use have been largely
`unsuccessful owing to the high error rates and complexity of
`the oligonucleotide mixtures. Here we use high-fidelity DNA
`microchips, selective oligohucleotide pool amplification,
`optimized gene assembly protocols and enzymatic error
`correction to develop a method for highly parallel gene
`synthesis. We tested our approach by assembling 47 genes,
`including 42 challenging therapeutic antibody sequences,
`encoding a total of ~35 kilobase pairs of DNA. These
`assemblies were performed from a complex background
`containing 13,000 oligohucleotides encoding ~15 megabases
`of DNA, which is at least 56 times larger than in previously
`published attempts.
`
`ongoing efforts to lower the cost otgene synthesis that focus on reduc-
`ing the cost of the oligonuclcotide precursors. For example, rtiicrolluidic
`oligonuclcotidc synthesis can reduce reagent cost by an order ol‘niagni-
`tutle and has been used for proof-of-concept gene synthesis”.
`Another promising route is to harness existing DNA microchips,
`which can produce up to a million different oligonucleotidcs on a sin-
`gle chip, as a source olD NA. Previous efforts have demonstrated that
`genes can be synthesizer] from DNA micr. clii,p-s3"5> 14. Thus far it has
`not been possible to scale up these approaches for at least three rea—
`sons. First, the error rates of oligonucleotides from DNA microchips
`are higher than traditional column-synthesized oligonucleotides.
`Second, the assembly of gene fragments becomes increasingly difficult
`as the diversity of the oligonucleotide mixture becomes larger. Finally,
`the potential for cross—hybridization between individual assemblies
`imposes strong constraints on the sequences that can be constructed
`on an individual microchip.
`Recently, the quality ofmicrochip—synthesized oligonuclcotides was
`improved by controlling tlepurination during the synthesis process”.
`These arrays produce up to 55,000 ZOO-incr oligonuclcotides on a
`single chip and are sold as a,
`r— lelll picomolc pools of oligonucleo-
`tides, termed 0 LS pools (oli go library synthesis). Several groups have
`used OLS pools in DNA, capture technologies, promoter analysis and
`DNA harcode developmentm'm. W'e have previously shot in that: indi--
`vidual oligonucleotides in a 55,000 liiO-nier 0L8 pool were evenly
`dist ributedm. We reanalyzed this data set to provide an estimate of
`the frequency of transitions, transversions, insertions and deletions
`in this OLS pool (Online Methods) and found the overall error rate
`to be ~l/50ti bp both before and after PCR artipliflcatlon, suggesting
`that 0L8 pools can be used for accurate large-scale gene synthesis
`if Supplementary Table i).
`To test whether 0L5 pools could be used for DNA microchip-
`hased gene synthesis, we designed two pools (OLS pools l and 2)
`of different lengths, each containing ~13,000 lSO-mcr or ZOO—mcr
`oligonucleotides, respectively. Figure. l, is a general schematic of our
`methods for using OLS pools to perform gene synthesis. Bricily, we
`designed oligonuclcotides that were then printed on DNA niicro~
`chips and recovered as a mixed pool ofoligonuclcotides (OLS pool).
`Next, we tool: advantage of the long oligonuclcotide lengths to
`
`
`lepartment of Genetics, Harvard Medical Sch Jl, Boston, :lassachu etts, USA.
`
`ogicaliy lnspired Engineering, B-ston, I‘tlassachusetts, USA.
`1Wyss Institute for Bi
`
`'dve Massacn
`
`SA. "Agilent Technologies, Santa Clara, California, USA. r)Prescnt address:
`eering and Applied Sciences, Car
`3Harvard School of E
`6-,
`
`
`id tic addressed to
`
`
`
`- authors contributed equally to this work. Correspondence s
`Departmc. t of Geneti s, Stant rd Universaty, Stanford, Cal ornia, USA. 5Thcs
`SK. (sri.kosuri@wyss.harvardedu) or NE. (croshenk@wyss.harvardcdul.
`
`The synthesis of DNA encoding regulatory elements, genes, pathways
`and entire genomes provides powerful ways to both test biological
`hypotheses and harness biology for our use. For example, from the
`use of oligonucleotidcs in deciphering the genetic code“ to the recent
`
`Lu
`complete synthesis of a viable bacterial genonicg, D NA synthesi has
`engendered tremendous progress in biology. Currently, almost all
`DNA synthesis relies on the use of phosphoramiditc chemistry on
`controlled—pore glass (CPS) substrates. The synthesis of gene—sized
`fragments ($006,000 base pairs (bp)) relies on assembling many
`CPG oligonucleotitles together using a variety oi‘genc synthesis tech-
`niques/3. Technologies to assemble verified gene—sized fragments into
`much larger synthetic constructs i re not r fairly matureg’lz.
`The price of gene synthesis has fallen drastica ly over the last dec-
`de. l-lowever, the current commercial price of gene synthesis, ~$0.4G~
`l.l)0/'bp, has begun to approach the relatively stable cost of the Cl’G
`oligonuclcotide precursors (will). 10770.20/bp) l, suggesting that oligonu~
`clcotidc cost is limiting. At these prices, the construction oflarge gene
`libraries and synthetic genomes is out of icicl‘ to most. There are many
`
`Received 3 August; accepted 25 October; published online 28 November 2010; doi:10.1038/nbtl71l3
`
`VOLUME 28 NUMBER 12
`
`DECEMBER 20’:
`
`fl.1
`
`
`
`
`
`to2016NatureAmerica,inc.Allrightsreserved.
`
`
`
`
`
`’/
`
`54
`
`7
`
`

`

`Figure 1 Schematic for scai ple gene synthesis ‘rom 0L3 pool 2.
`
`(ah)- F’re—deslgned ollgonucieotides (no distinction is made between
`dle‘lA and ssDNA in the figure) are synthesized on a DNA microchip (a)
`and then cleaved to make a pool of oligonucieotides (b). (c) Plate-specific
`
`primer sequences (yellow or brown) are used to amplify eparate plate
`subpools (only two are shown), which contain DNA to assemble different
`genes (only three are shown for each plate subpool). (d) Assembly-specific
`sequences (shades of blue): are used to amplify assembly subpools that
`
`contain only the DNA reouired to make a Sin ' gene. (6) The primer
`sequences are cleaved
`
`1g either type llS restriction enzymes (resulting
`in dsDNA) or by Dpnil/ ‘SLR/X exonuciease processing (producing
`ssDNA). (f) Construction primers (shown as white and black sites flanking
`the full assembly)- are then used in an assembly PCR reaction to build
`a gene from each assembly subpooi. Depending on the downstream
`application, the assembled products are then cloned either before or after
`
`an enzymatic error correction step.
`
`To construct genes from the OLS pools, we developed algorithms to
`
`split the sequence into overlapping segments with matching melting
`temperatures such that they could be later assembled by PCR. Genes
`on OLS pool l and 2 were designed differently to test the effect of
`different overlap lengths. W: designed genes on 01.8 pool
`such
`that the processed ssDNA pools fully overlapped to form a complete
`dsDNA sequence. in (HS pool 2, the processed dsDNA fragments
`partially overlapped by ~20 bp and could be assembled into a con—
`tiguous gene sequence using PCR. l/Ve initially constructed a set of
`fluorescent proteins to test the efficacy ofthe gene synthesis methods
`on both OLS pools (Fig. 2).
`
`For OLS pool l, we designed two independent ‘a
`‘emhly sulipools’
`that encoded GFP‘mutBb plus flanking orthogonal primer sequences
`that were later used for PCR assembly (construction primers). The
`two assembly subpools, (34943 and Gill’fifi, differed in the average
`overlap length (43 and 3.3 hp, respectively), total length (827790 and
`64 ~78 bases, respectively) a d number of oligonucleotides (18 and
`22 respectively). V‘v'e also designed two subpools, control subpools i
`and 2, containing ten and five l3ii--mer oligon icleotides, respectively,
`to test amplification efficacy. The other eight subpools, containing a
`
`
`total of 12,94. 130-nier sequen
`‘, were constructed on the same chip
`but were not used in this study except to provide potential sources
`of cross-hybridization. Each of these i}: subpools was flanked with
`independent orthogonal printer pairs (assemblyuspeci ic primers).
`As a control, we used these same algorithms to design a set of shorter
`column-synthesized oligonucleotides (20 hp average overlap; 35 ~45
`bases in length; and 39 total oligonucleotides} encoding GFPmutBb
`and obtained them from a commercial provider (lDT). These oligo-
`nucleotides were combined to form a third pool (Clip/2'0) that was
`also tested. (All synthesized oligonucleotides used in the study ca
`he found in Supplementary Sequences).
`Each of the four subpools (Gilli/B, GFP35, control 1 and control 2)-
`were PCR amplified from the synthesized OLS pool using modified
`primers that facilitated downstream processing (Supplementary
`Figs. 1! and 2208 The oligonucleotides were then processed to reniove
`primer sequences (Supplementary Figs. 2b and 3). Briefly, lambda
`exonuclease was used to make the P‘CR products single stranded, and
`then uracil DNA glycosylase, endonuclease Vlll and Dpnll restric-
`
`
`tion endonuclease were used to cleave off the assembly— spec' M prim-
`
`ers. The resultant gel shows t
`t although the reaction was etficient,
`unprocessed oligonucleotide still remained. in addition, we observed
`spurious cleavage by Dpnll that was likely due to the extensive over-
`lap within the subpool that is inherent in the gene synthesis process.
`V‘v'e assembled the “YEP/ti GFPSS and GFP‘ZO subpools using PCR,
`which resulted in GP Pmsized products as well as many incorrect: low
`molecular weight products (Fig. 2a).
`
`
`
`m;
`
`W W
`awn m «\\
`3.\
`
`W W W
`
`
`
`m rm
`
`\ t‘
`
`
`
`\~‘
`\cm ems
`am mm:
`m Elm-2
`mm ems
`
`Assembled genes
`
`independently PCR amplify and process only those oligonucleotides
`required for a given gene assembly. For the 200~mer OLS pool 2, we
`first amplified a ‘plate subpool’ that contained DNA to construct
`up to 96 genes, and then amplified individual ‘assenihly subpools’
`
`to sep; ate the oligonucleotides for an individual gene. For the
`iSO-mer OLS pool 1, we directly amplified assembly subpools,
`foregoing the plate subpool step. Next, the primers used for these
`asnplification steps were removed by either type 118 restriction endonu~
`cleases to forni double-stranded DNA (dsDNA) fragments (OLS "fiOCl 2),
`or a combination of enzymatic steps to form single-stranded D NA
`(ssDNA) fragments (0L8 pool 1). Finally, we constructed full-length
`
`
`.
`I error correction iO
`genes using PCR a
`‘emhly, performed enzym
`improve error rates if necessary, and, finally, cloned and character-
`iced the constructs.
`
`Obtaining subpools of only those DNA fragments required for any
`particular assembly is crucial for robust gene synthesis in very complex
`DNA backgrounds. In addition, isolating subpools relieves constraints
`on sequence similarity inherent in past approaches. To facilitate the
`partitioning oi"()l..S pools into smaller subpools, we designed ZO-rner
`PCR primer sets with low potential cross—hy bridization (‘orthogonal’
`
`primers) derived from a set of 244,000 25 -mer orthogonal sequent
`p,
`developed for barcoding purposesn. Two separate orthogonal printer
`sets were constructed for the dirt rent OlS pools because of their
`v rying requirements for downstream processing. Both sets were
`screened for potential cross—hybridization, low secondary structure
`and matched melting temperatures to construct large sets of ortho--
`gonal PCR primer pairs.
`
`E296
`
`VOLUME 28 NUMBER l2
`
`DECEMBER 20H)
`
`A m m ’3' ’fl
`
`to Q .1 m /,
`
`
`
`o2616NatureAmerica.inc.Allrightsreserved.
`
`
`
`
`
`
`
`;
`
`///'/ /’/ I5
`///,.
`tZ
`4/”a,
`
`

`

`h. I“ r
`=~~
`s
`
`t
`r
`
`t.
`i
`
`l \ x.-
`:D- x
`
`
`
`Figure 2 Gene synthesis products.
`
`(a) Results of PCR '
`ably of GFPmut3
`from two different assembiy subpools
`(GFP43 and GFF’SE) that were amplified
`from OLS pool 1. Full—length GFPmut3 is
`expected to be 779 bp and is indicated
`with an asterisk (*l. Other hands show lower
`moiecuiar weight rnisassernbled products.
`(13)- Gel purification and re—ampiitication
`of the mil-length assemhied GFPmut3.
`
`(c) Results of as ernbiihg three fluorescent
`proteins using the longer oligonucleotides in
`
`
`
`chailengihg GC-rich linkers. Of the
`42 assemblies, all but two (7 and 24):
`
`resulted in strong bands of tl‘ ‘
`We gel isolated and re--arnplitied these ‘
`resulting in bands of the correct size (Supplementary Fig. 10b). The antibody that corresponds to each number is given in Supplementary Table 3
`and the amino acid sequence of the linker region used is given above each gel with differing amino acids in red.
`
`out of a 12,995 oligonucleotide background. Second, the relative
`quantities of the oligortucleoticles in the assembly subpools were suf—
`ficient to allow PCR asse , bly. Third, the error rates from BO-nrer
`0L5 pools were sufficiently low to construct gene-sized fragments
`717 hp) such that >50% ot‘the sequences were perfect. in fact, the
`
`error rates from both the GFP/l3 and GFPSF
`emhlies were indis-
`
`inguishable from the column-synthesized G ill’ZO assemblies. 1?ourth,
`our data show that the level of fluorescence of our gene assemblies
`correlated with the number of constructs with perfect sequence, pro-
`
`
`l
`viding a u;
`.ul screen to test tluore;
`nt gene assemblies in 018 pool A
`(Supplementary Fig. 6):. Finally, although PCR assembly was abt .
`to generate full-l ngth product, many smaller misassembled prod—
`ucts were also formed, requiring the use ofdifficult--to--automate gel
`isolation steps.
`in OLS pool 2, we designed 836 assembly subpools split into
`it plate subpools, encoding 2,456,706 bases ofoligonucleoticles that
`could potentially result in 869,125 bp of final assembled sequence.
`
`we first: constructed three fluorescent proteins to test a
`(mblypro-
`tocols in 0L5 pool 2: mTE P1, rnCitrine and rnApplc. We found that
`the PCR assembly protocols developed for ssDNA suboools in OLS
`pool 1 only produced short (<20t) hp) misassenrblies when applied
`to the dsDNA suhpools in OLS pool, 2. We tested over 1,000 assem-
`bly PCR conditions by varying parameters such as DNA concen-
`tration, annealing temperature, cycle numbers, polymerase choice
`and butter conditions. Using the best protocol (Supplementary
`Note), we assembled the genes encoding the three proteins with
`no detectable misassernhlies, thereby removing the need for gel
`
`
`Figure 3 Characterization of products from OLS pools 1 and 2.
`
`(a) Percentage of fluorescent cells resulting from synth
`‘ products
`
`(1'
`rived from column-synthesized oiigonucleoiides (black), OLS Chip 1
`subpools GFP43 and GFP35 (green) and the three fluorescent proteins
`produced on OLS Chip 2 with and without ErrASE treatment (blue. yellow
`and orange). Error bars correspond to the range of replicates from 3
`(GFPZO, GFPIl-S, GFP35) 2 (GFPA ErrASE, GFP35Err‘ASE). 4(rnTFP1,
`rnCitr'ine, mApple, rnCitrine Err'ASE) and 1 (mTFP1 ErrASE. rnApple
`ErrASE) separate electropor'ations. (h) Error rates (average bp of correct
`
`sequence per error) from various synthes
`produ
`. Error bars show
`the expected Porsson '
`ror based on the number of '
`rors found (ix/n),
`
`
`Deletions of more than two consecutive bases are counted as a single error
`(no such errors were found in OLS pool 1
`
`’e gel isolated, digested and then cloned the assembly products
`into an expression vector (Fig. 2b and Supplementary Fig. 4-). We
`used flow cytometry tests. manual colony counts and sequencing
`ol‘ individual clones to measure the error rates -( Supplementary
`Fig. 53,11»). All three of the assays correlated well, and the error rates
`determined through sequencing were 1/1500 bp, ”1,130 hp and
`1/1350 bp for the (EFF/l3, GFPSS and GFPZO synthesis reactions,
`respectively (Fig. ‘3 and Supplementary Table. 2).
`7*.
`These results illustrate a number of important points. First, our
`subpool assembly printers were sutnciently well-designed to pro~
`
`vide stringer ‘ subpool amplification ofas few as 5 oligonucleotides
`
`
`u
`El OLS pool 2
`
`E
`
`E (296 olagos (lDT)
`CELS pool 1.
`
`so
`
`70
`
`50
`
`40
`30
`20
`10
`
`2
`g
`
`O a
`
`CD
`
`aP
`
`E
`U.
`.3
`
`E CPG pligps (lDT)
`(its pool 1
`(its pool 2
`
`o o
`
`b 2.000
`x
`”‘00
`1,600
`
`5a{B
`
`NATURE tilt}?§Cttz\l(}t,ifl‘i\’ VOLUME 28 NUMBER 12 DECEMBER 201.
`
`129.7
`
`
`
`
`
`o2616NatureAmerica,true.Allrightsreserved.
`
`
`
`
`
`;
`I5
`///'/ //
`///,.
`/a,4
`7
`
`

`

`niisassembled genes displayed faint bands at the correct size, which
`were gel isolated and reainplified to produce strong bands of the cor-
`rect size. We sequenced l5 antibodies including representatives from
`all three linlt’er types. We performed enzymatic error correction using
`ErrASE, gel isolated the product and finally cloned the constructs
`into an expression vector. (Tine of the 15 antibodies did not clone,
`and another had a deleted linlter region in all 21 sequenced clones.
`
`
`
`Both of t. ,ese antibodies were encoded with ll’lt
`ighest GC content
`linlter. The average error rate of the i4 antibodies that did clone was
`l/3 l5 bp (Fig. 3b and Supplementary Table 21: this was considerably
`higher than the GFP assemblies, but still sufficient: for construction
`of genes of this size (~10% of clones should be perfect, on average).
`
`in addition, the high levels ofsequenc similarity between the anti--
`bodies, combined with the successful assembly and sequencing (which
`showed no instances of cross-contamination) further validates that
`
`the selective amplification is at least stringent enough to make highly
`
`related protein sequent
`Our results show the assembly of gene-sized DNA fragments
`totaling ~35,000 bp from oligonucleotide pools of more than
`t) kilobases. A number of ltey features are impo ~tant to malte the
`..,
`‘ocess work, including the use of low- *rror starting material, well-
`chosen orthogonal primers, subpool amplification of individual
`assemblies, optimized assembly methods and enzymatic error
`correction. Together, these features enabled gene assembly from
`oligonucleotide pools containing at least 50 times more sequences
`than previously reported (Supplementary Note). We describe two
`separate OLS pool lengths and assembly methods, which have their
`own advantages and disadvantages (Supplementary Fig. i). The
`shorter, l30-mer OLS pool 1 assemblies have lower error rates, but
`because there are no plate amplifications, will be harder to scale as
`'lize larger OLS pools. The longer 200~mer OLS pool 2
`le, but contained higher error rates. The costs of
`oligonucleotides in both processes are <SO.Qi/hp of final synthesized
`
`sequence, and thus the dominant costs are enzymatic proc'
`.sing,
`cloning and sequence verification. Future work on reducing the cost
`of perfect sequence will focus on the ability to lower sequencing costs
`
`by using cheaper next—generation sequencing technologies, or by
`I. selec-
`.
`.
`.
`.
`.
`a ’J’
`incorporating other error-correction techniques such as PAC
`tion ofohgon tcleotide pools or niutS-based error filtrat:ion3’—j.
`
`we begin to u
`
`is easier to
`
`5 p
`
`
`
`m;
`
`
`isolation (Fig. 2c rnd Supplementary Fig. 7a,b). Cloning followed
`by flow cytornetry screening showed that 6.8%, 7.5% and 6.8% ofthe
`cells were fluorescent for mTFPl, inCitrine and in Apple assemblie,
`respectively (Fig. 3a).
`Assuming 6% correct sequence per construct and no selection
`t...
`against errors in the assembly process, the error rate was 4/250 b”)
`for ZOO-iner OLS pool 2, significantly above that of the estimates for
`l30~mer OLS pool 1 (Ni/1,000 bp) and the sequenced 55,000 150~
`mer 01.8 pool (~1/50ti hp). This is not completely unexpected, as the
`amount of depurination is dependent upon the number of depro-
`
`tection steps during synthes
`and thus the oligon tcleotide length.
`)espite the higher error rate, there were several advantages to the
`
`200--rner OLS pool 2. First, the extensive overlaps cl
`gned in 0L5
`pool l caused spurious processing of the primers from the assembly
`subpools. The use of type lls restriction endonucleases to process
`primers to form dsi)NA resulted in more robust processing Second,
`
`
`t.he use of two amplii
`.ation steps conserves chip-eluted DNA to
`allow for future scaling of the gene synthesis process (Supplementary
`Note). Third, the assemblies of OLS pool l produced many smaller
`bands and required lower-throughput: gel isolation procedures. This
`could be due to misprirning during PCR assembly because of are long
`overlap lengths used in the design process. The assemblies in OLS
`pool 2 used much shorter overlap lengths and resulted in no smaller
`
`molecular weight misa sembled products.
`To improve the error rates ofthe genes assembled from OLS pool 2,
`we used ErrASE, a commercially available enzyme cocktail that
`detects and corrects mismatched base pairs, to remove errors in the
`assembled fluorescent proteins For each gene, we applied ErrASE
`at six different stringencies. reaniplified the constructs, cloned the
`PCR products and rescreened the cloned genes using flow cytornetry.
`Improvement of the level of fluorescence progressively increased
`with greater ErrASE stringency. At the highest levels of error cor--
`rection, the fluorescence levels were 31%, 49% and 26% for m'l‘FPl,
`
`niCitrine and rnApple respectively (Fig. 3a and Supplementary
`Fig. 8). We also performed the ErrASE procedure on our GFPIB
` .ence levels
`and GFP35 pools from OLS pool l, resulting in iluore.
`of 89% and 92%, respectively (Fig. 3a and Supplementary Fig. 5c).
`‘Ne sequenced clones of CFP43 and GFP35 and found three errors in
`2i,5]()
`l/’7,l7(l bp) and four errors in 20,076 (i 5,019 bp') sequenced
`bases, respectively
`As a more challenging test for our DNA synthesis technology,
`we designed and synthesized oligonucleotides in OLS pool 2 for
`42 genes encoding the variable regions of single-chain antibodv
`fragments (scliv) regions corresponding to a number ofwell~ltnown
`antibodies. We have previously had trouble synthesizing these
`
`genes using commercial gene synthe.
`companies. This might be
`partly due to the prototype (Glyfier)3 linlt'er, which is designed
`to maximize flexibility and allow the heavy and light V regions to
`assemblezz. The repetitive nature and h‘ h GC content of the linker-
`
`
`encoding sequences often represent.
`illenge for accurate DNA
`synthe s. We therefore tested three cifferent linker sequences that
`varied in GC content and repetitive character of the linlcer encoding
`u
`sequence. in addition, the presence of high sequence homology in
`,..
`1e antibody backbones and linkers represented a potential source of
` on t
`
`cros hybridi;
`'could interfere with assembly (6] % average
`sequence identity).
`/
`As exjected, the antibody se uences did not assemble as robustlv
`as the fluorescent proteins, and thus we further optimized the con-
`ditions during pre- and post-assembly (Supplementary Figs. 7c, 9
`and ma). Under the best protocol, 40 of the 42 constructs assembled
`to the correct size (Fig. 2d and Supplementary Table 3}. The two
`
`;
`
`///'/ /’/ I5
`///,.
`4/”a,Z
`
`M ET H 0 [35
`
`Methods and any associated references are available in the online version
`of the paper at http://www n ature.coni/n aturebiotechnologyi'.
`
`Note: Supplementary information is available on the Nature Biotechnology website.
`
`ACKNO‘NLEDGL‘VlENTS
`This work was so pportcd by the US Office of'l‘laval Re ‘arch (N00014l010144),
`Nrtional liurnan Genome Research institute Center fo Excellence in Genomics
`
`
`
`St )nce (P50 HGOO3l70), Department of Energy Genomes to Life ( ‘E—FGOZ-
`0211363445), Defense Advanced Research Projects Agency (W911i iii-~08—i-0254)
`and the Wyss institute for Biologically inspired Engineering (all to G.M.C.). We
`
`
`
`thank H. Pa.
`for providing ErrASE
`tnd exper
`during optimiz lion and
`,l. Boeke for adirice on gene assembly protocols. We tdso thank S. Raman,
`F. Vigneault‘ and F. thang for critical
`* dings ofthe manuscript, G. Dantas for
`
`pZEZi "Washington University), F. l"
`s (Yale University) for pZEZiG and
`
`ES. 'Workman (V‘Jyss institute) for pS-
`'Tag‘ZA.
`
`AU THUR CONTRIBUTIONS
`SK. and NE. wrote the paper with contributions from all authors: SK. and G.M.C.
`
`conceived the study; 5.1:. wrote all :d
`"thins and destgned all sequences: SK. and
`
`ME.
`l "gried and performed all experiments; El. provided the oligonucleotides
`
`librarie , MS. and l
`' designed the single-chained. versions of commercial
`antibodi t
` ,
`.B.L. performed the (its high—throughput sequencing experiment and
`provided critical advice on the processing of subpools,
`
`E298
`
`VOLUME 28 NUMBER l2
`
`DECEMBER 20H)
`
`4 m m ’3' ’fl w 9 .1 m /,
`
`
`
`
`
`(d2616NatureAmerica,inc.Allrightsreserved.
`
`
`
`
`
`

`

`Gibson, D.G.
`t
`y
`in
`ragmenis
`’0 to g. ID
`Acids Res. 37, 6
`overlanping 0‘“
`846-990 (2009).
`.__I C .Li,?
`&_
`..omologo .
`.
`
` l’BCOII
`
`rIant DIIA via SLEC. Nai. Mefiled; 4 25
`.Bang,
`1). & Church, GM. Gene
`1
`I
`
`Nat. Meiiwds 5, 37—39 (29GB).
`1-1. & Zhao H DNA. aSSLIVIIbleI, an il'
`.Shao, Z., 7IIao,
`
`
`COIISIr
`on at biocIIeI’r Ial paIIIways.
`I'V'u.r
`6 Ac
`.
`_, Snyder, TM. 8/. Quake, SR. A Inicmfluien ongonucleetide synthes Ize
`_ee, C
`
`Nucleic Aci9 Res. 38, 2514-2521 (201“).
`.KIm, C. et al. Progres- in gene a
`ssea.‘mLIIy from a I\’lAS--driven [TINA rnicroarray.
`Micmeiectron. L
`.83, 1613-1616 (20D).6
`.
`libraries 0
`.LePreusi,
`E.I\/l. et ai. SII'rIIIIesIs
`0
`I high-quaIi‘,
`
`oIigonucleotides by a new! depur i.ation controlled process. Nucieic r’IC/ds Res. 3513,
`2'32”) 2640 (2010).
`. Pan
`”Ian, R P.
`er? al.
`”" is of DNA re Iuiatory elements by
`
`gynthetII: saturation mutagene '
`'Iofeclmoi. 27, 1173-1175 (2009).
`. Nat.
`
`.5!
`'Iiabach,
`IV R.ei ai. Synthetic (3:15 gm of strong prometers. Prop.
`I’Vau'I. Acad,
`
`U014 107, 2633-2543 ’2013)
`
`Iargrfi
`. Li, I8. “I ai. II/lulIipiex padm‘.
`ed serIIeIICing reveais human hyperIIIuIabie
`
`_ (200 I
`Is. Genome Res. 39,160
` CpG var
`
`I.
`Ion of hI. nan RNA. ed!
` by pai‘aiiel
`
`
`'Ll'. .I. B. et 3!.Gen0nIe-w1de IdenIiIIC t
`e 3241210—1213 (2009‘
`Acaptunng a..o seque‘ ' g. S‘s/en
`lVlui
`eca, SJ. et
`ai.
`Lip:'ex amplification of
`large
`sets oi human exons.
`.il/I’Ciiwds 4
`31—936 (2007)
` 5. Proc. Nari.
`
`'.XLI 62‘.“c.3/. DeSIgn of 240.000 orthogenaI 25mer DNA bemode pr-
`Acad. Sci. USA 106. 2"8:O 229412009).
`‘.
`l-lusion, LS. sf 3!. MedIcai aIpiicaIIons of singIe-chain antibodies.
`immunoi. 10, 195»
`
`4/. Fret
`”. Carr, PA.
`Acids 533332816"
`'error correciion for de novelUNA SyIIIIIesis Nucleic
`
`
`
`by one—step assembiy of
`
`aerate
`
`
`
`Ci.
`
`Int. Rev.
`
`
`
`ED
`
`I...
`
`1.1
`
`l\)
`
`,_.
`
`(Al
`
`' I‘ERESTS
`COMPEIINGHRANLIAL I
`TIIe authors deciar: cempeting 1113111‘.1:11 interests: detailsaccompany the fullItxt
`I11MI. version of the paper a: http:/iV IWWJIatulecem, 11amrebiote 1111010gy/.
`
`pR
`
`
`IIblIshed onlIne athm: ’iIIIWIIInaIII
`
`epi’Il’ILS and p"
`
`re rintsandp rm:
`
`
`1.
`
`Church,12M. Ge..r;me engineering Nat. Biotechrioi 27, 1151—1162
`
`
`
`
`
`a, II 81 Saeerr ,i. Advancing I‘.lg.I-IIIruugIIIpuI gene Synthcris Icchnelogy.
`It. 5, 7-' +--722 (2099).
`LAC:mate mun Ieex gene synthesss from programmabie DNA microch
`Naiure 432 10513—10 54 (2094}
` In and assembiy of chip-eluted DNA (AACED): a
`4. Richmond, K.E. at at. Ampliiic
`
`'( ALI/(ls Res. 32, 5011—5018
`method for iIigh-IIroughpuI gen' synthesis Nucl
`
`"S and
`32,
`
`
`
`r;
`
`oligodenKynLIcIeotid
`Nucl
`Acids I?
`
`IIuidIc PicoAIray svrIIIIesis_
`
`ng of
`rrIulI ipie DNA "equII
`
`
`
`540964172004).
`
`6. NiIreIIberg I.’l‘v“I & IvIaiIIaei III. The dependence of cellfree pro‘III‘ri Synthesis III
`E coii upon.naturaiiy 00-:Urlil!g or syntIIIetic polynbonucieoti 'es. Prob. Nati. Acad.
` In
`. USA 47, 1588—1602 (1961).
`
` Ill)
`
`the binding of
`I
`7. Sell, D. e! aI‘. “'1udi
`on polyniicleoti'
`Is, XL'.
`IulaIion of
`
`
`
`rvey of coder:
`am IoacyI-ssRNAs
`Io
`by ribOIrinIII: leohd;s and a
`assignIIIerIis for 2C arrIirl0 arias Proc. Natl. Acaa. Sci. USA 54 1378—1385
`(1061::
`0
`Gibson, D.G. et ai. Creetieh of :1 ba ‘Ieriel ceii conIrolled by a. c Iemically synthesized
`genome. Science 2329, 52—56 (2010).
`
`'2.
`
`01
`
`p:
`
`
`
`
`
`©2616NatureAmerica,Ina.AI:rightsreserved.
`
`
`
`
`
`;
`/’/
`745
`///
`/
`//Z
`
` VOLUME 28 NUMBER 12 DECEMBER 20
`
`

`

`
`
`
`
`o2616NatureAmerica,true.Allrightsreserved.
`
`
`
`
`
`;
`/’/
`74
`///
`Ir5
`{Z/
`
`GNLlNE METHGDS
`
`Reanalysis of 015 pool error rates. We eanalyzed a previously published
`data set for determining sequencing error rates”. Briefly, the data. set was
`derived from high-throughput sequencing using the lllumina Genome
`Analyzer platform of a 53,777 i SO—tner O‘LS pool, Two sequencing runs were
`performed, the first before any amplification, and the second after two rounds
`of ten cycles of PCR {20 cycles total). As our previous analyses were mostly
`looking for distribution effects, we reanalyzed these existing data to let an esti-
`mate of error rates before and after PCR amplification. We realigned the data
`set using Exonerate to allow for gapped alignments and analysis of indelsz“.
` Spec
`.‘ally, we used an affine local alignment model that is equivalent. to the
`classic Smith-W'aterrnan-Gotoh alignment, a gap extension penalty of 7775,
`and used the full refine option to allow for dynamic programming—based
`
`1 calls
`optimization of the alignment. These reads were solely mapped on bas
`by the lllumina platform. We used these alignments to count mismatches,
`deletions and insertions as compared to the designed sequences. However,
`
`as base—calling can be more error pr
`ne on next—generation platforms than
`traditional Sanger~hased approaches, we filtered the results based only on
`high—quality base—calls (Phred scores of230 or >99.9% accuracy). This was
`accomplished by converting lllumina quality scores to Phred values using
`the Maq utility sol2sanger25 and only using statistics from base calls ot‘Phred
`30 or higher. All error rate analysis s . were implement ‘.d in Python and
`are available upon request. Although this method provides an estimate for
`error rates, unmapped reads may have higher error rates, thus underestimating
`the total average error rate. In addition, base—calling errors iight still overes-
`
`tirnate the error rate
`nall y, using only high—quality base calls, which usually
`
`occur only in the i st ten bases or a r
`, might only reflect error rates on the
`5’ end of the synthesized oligormcleoticle.
`
`Design and synthesis of OH; pools. The 13,000 oligos in the first OLS library
`
`(OLE poo l) were broken up into 12 separately amplifi able s bpools (assem-—
`
`bly subpools}, Each assembly subpool was defined by unique 20 bp prim—
`ing sites that flanked each of the oligos in the pool. The priming site, were
`designed to minimize amplification of oligos not in the particular assembly
`subpool. This was done by designing set of orthogonal 20-mers (assembly--
`21
`
`spec
`‘ primers) using a set of 240,000 orthogonal 25-mers
`as a seed. Froi
`these sequences we selected 2’)»-mers \ Vith 3’ sequence ending in thymidine
`
`or GATC for the forward and reverse primers
`‘tively. We screened for
`melting temperatures of 62-64 °C and low primer secondary structure. After
`
`
`the additional filte ;
`iners were chosen
`g, 12 pairs of forward and reverse pr‘
`to be the assembl 7»-specific primers. The £3,000 oligos in the second OLS
`library (CT-LS pool 2) were broken up into ii subpools corresponding to ll
`sets of up to 96 assemblies (plate subpools), which were further divided into a
`total of 836 assembly subpools. A new set oforthogonal primers were designed
`
`simii rly to the. previous set (without the GATC and thymidine constrai
`)
`but further filtered to remove type llS restriction sites, secondary structure,
`primer dimers and .
`dimers. The. final set of primer pairs was distributed
`
`among the plate-speciric primers, assembly-specific primers and construction
`primers. See Supplementary Methods for more detailed design information
`and primer sequences.
`tologies and are available upon
`0L8 pools were synthesized by Agilent Tecl
`signing a Collaborative Technology Develop ient agreement with Agilent.
`Cos ts of OliS pools are a function of the number of unique oligos synthesized
`and of the
`ngth of the oligos («($0.0l per final assembled base-pair for all
`
`scales used in this study). OLS pools l and 2 were independently synthesized,
`cleaved and delivered
`as lyophilized ~~1---lO picomole pools.
`
`Amplification and processing of 0L5 subpools. liyophilized DNA from
`
`O 3 pools 1 and 2 were resuspended in .
`i til lli. ”\ssembly subpools were
`amplified from 1 ul of 013 pool 1 in a 50 pl ql’CR reaction using the lCAl’A
`SYBR FAST qPCR hit (Kapa Biosystems). A secondary 20 ml PCR amplifi—
`
`cation using Taq polvme
`ase was performed from the primary amplifica-
`

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.

We are unable to display this document.

PTO Denying Access

Refresh this Document
Go to the Docket