`
`714 rM5m.in}
`(51%-
`
`my),r11).3,1 71/I~.lq/
`W,.
`
`Scalable gene synthesis by selective amplification
`of DNA pools from highmlidellty microchips
`
`Sriram Kosurim’fi, Nikolai Eroshenkomfi, Emily M LeProust‘l, Michael Super}, leffrey ‘Wa '1,
`tin Billy H25 St George M lCIl‘nurchL2
`
`Development of cheap, highnthroughput and reliable gene
`synthesis methods will broadly stimulate progress in biology
`and hiotechnologyl. Currently, the reliance on column-
`synthesized oligonuoieotides as a source of DNA limits further
`cost reductions in gene synthesis? Oligonucieotides from
`DNA microchips can reduce costs by at least. an order of
`magnitude3“5, yet efforts to scale their use have been largely
`unsuccessful owing to the high error rates and complexity of
`the oligonucleotide mixtures. Here we use high-fidelity DNA
`microchips, selective oligohucleotide pool amplification,
`optimized gene assembly protocols and enzymatic error
`correction to develop a method for highly parallel gene
`synthesis. We tested our approach by assembling 47 genes,
`including 42 challenging therapeutic antibody sequences,
`encoding a total of ~35 kilobase pairs of DNA. These
`assemblies were performed from a complex background
`containing 13,000 oligohucleotides encoding ~15 megabases
`of DNA, which is at least 56 times larger than in previously
`published attempts.
`
`ongoing efforts to lower the cost otgene synthesis that focus on reduc-
`ing the cost of the oligonuclcotide precursors. For example, rtiicrolluidic
`oligonuclcotidc synthesis can reduce reagent cost by an order ol‘niagni-
`tutle and has been used for proof-of-concept gene synthesis”.
`Another promising route is to harness existing DNA microchips,
`which can produce up to a million different oligonucleotidcs on a sin-
`gle chip, as a source olD NA. Previous efforts have demonstrated that
`genes can be synthesizer] from DNA micr. clii,p-s3"5> 14. Thus far it has
`not been possible to scale up these approaches for at least three rea—
`sons. First, the error rates of oligonucleotides from DNA microchips
`are higher than traditional column-synthesized oligonucleotides.
`Second, the assembly of gene fragments becomes increasingly difficult
`as the diversity of the oligonucleotide mixture becomes larger. Finally,
`the potential for cross—hybridization between individual assemblies
`imposes strong constraints on the sequences that can be constructed
`on an individual microchip.
`Recently, the quality ofmicrochip—synthesized oligonuclcotides was
`improved by controlling tlepurination during the synthesis process”.
`These arrays produce up to 55,000 ZOO-incr oligonuclcotides on a
`single chip and are sold as a,
`r— lelll picomolc pools of oligonucleo-
`tides, termed 0 LS pools (oli go library synthesis). Several groups have
`used OLS pools in DNA, capture technologies, promoter analysis and
`DNA harcode developmentm'm. W'e have previously shot in that: indi--
`vidual oligonucleotides in a 55,000 liiO-nier 0L8 pool were evenly
`dist ributedm. We reanalyzed this data set to provide an estimate of
`the frequency of transitions, transversions, insertions and deletions
`in this OLS pool (Online Methods) and found the overall error rate
`to be ~l/50ti bp both before and after PCR artipliflcatlon, suggesting
`that 0L8 pools can be used for accurate large-scale gene synthesis
`if Supplementary Table i).
`To test whether 0L5 pools could be used for DNA microchip-
`hased gene synthesis, we designed two pools (OLS pools l and 2)
`of different lengths, each containing ~13,000 lSO-mcr or ZOO—mcr
`oligonucleotides, respectively. Figure. l, is a general schematic of our
`methods for using OLS pools to perform gene synthesis. Bricily, we
`designed oligonuclcotides that were then printed on DNA niicro~
`chips and recovered as a mixed pool ofoligonuclcotides (OLS pool).
`Next, we tool: advantage of the long oligonuclcotide lengths to
`
`
`lepartment of Genetics, Harvard Medical Sch Jl, Boston, :lassachu etts, USA.
`
`ogicaliy lnspired Engineering, B-ston, I‘tlassachusetts, USA.
`1Wyss Institute for Bi
`
`'dve Massacn
`
`SA. "Agilent Technologies, Santa Clara, California, USA. r)Prescnt address:
`eering and Applied Sciences, Car
`3Harvard School of E
`6-,
`
`
`id tic addressed to
`
`
`
`- authors contributed equally to this work. Correspondence s
`Departmc. t of Geneti s, Stant rd Universaty, Stanford, Cal ornia, USA. 5Thcs
`SK. (sri.kosuri@wyss.harvardedu) or NE. (croshenk@wyss.harvardcdul.
`
`The synthesis of DNA encoding regulatory elements, genes, pathways
`and entire genomes provides powerful ways to both test biological
`hypotheses and harness biology for our use. For example, from the
`use of oligonucleotidcs in deciphering the genetic code“ to the recent
`
`Lu
`complete synthesis of a viable bacterial genonicg, D NA synthesi has
`engendered tremendous progress in biology. Currently, almost all
`DNA synthesis relies on the use of phosphoramiditc chemistry on
`controlled—pore glass (CPS) substrates. The synthesis of gene—sized
`fragments ($006,000 base pairs (bp)) relies on assembling many
`CPG oligonucleotitles together using a variety oi‘genc synthesis tech-
`niques/3. Technologies to assemble verified gene—sized fragments into
`much larger synthetic constructs i re not r fairly matureg’lz.
`The price of gene synthesis has fallen drastica ly over the last dec-
`de. l-lowever, the current commercial price of gene synthesis, ~$0.4G~
`l.l)0/'bp, has begun to approach the relatively stable cost of the Cl’G
`oligonuclcotide precursors (will). 10770.20/bp) l, suggesting that oligonu~
`clcotidc cost is limiting. At these prices, the construction oflarge gene
`libraries and synthetic genomes is out of icicl‘ to most. There are many
`
`Received 3 August; accepted 25 October; published online 28 November 2010; doi:10.1038/nbtl71l3
`
`VOLUME 28 NUMBER 12
`
`DECEMBER 20’:
`
`fl.1
`
`
`
`
`
`to2016NatureAmerica,inc.Allrightsreserved.
`
`
`
`
`
`’/
`
`54
`
`7
`
`
`
`Figure 1 Schematic for scai ple gene synthesis ‘rom 0L3 pool 2.
`
`(ah)- F’re—deslgned ollgonucieotides (no distinction is made between
`dle‘lA and ssDNA in the figure) are synthesized on a DNA microchip (a)
`and then cleaved to make a pool of oligonucieotides (b). (c) Plate-specific
`
`primer sequences (yellow or brown) are used to amplify eparate plate
`subpools (only two are shown), which contain DNA to assemble different
`genes (only three are shown for each plate subpool). (d) Assembly-specific
`sequences (shades of blue): are used to amplify assembly subpools that
`
`contain only the DNA reouired to make a Sin ' gene. (6) The primer
`sequences are cleaved
`
`1g either type llS restriction enzymes (resulting
`in dsDNA) or by Dpnil/ ‘SLR/X exonuciease processing (producing
`ssDNA). (f) Construction primers (shown as white and black sites flanking
`the full assembly)- are then used in an assembly PCR reaction to build
`a gene from each assembly subpooi. Depending on the downstream
`application, the assembled products are then cloned either before or after
`
`an enzymatic error correction step.
`
`To construct genes from the OLS pools, we developed algorithms to
`
`split the sequence into overlapping segments with matching melting
`temperatures such that they could be later assembled by PCR. Genes
`on OLS pool l and 2 were designed differently to test the effect of
`different overlap lengths. W: designed genes on 01.8 pool
`such
`that the processed ssDNA pools fully overlapped to form a complete
`dsDNA sequence. in (HS pool 2, the processed dsDNA fragments
`partially overlapped by ~20 bp and could be assembled into a con—
`tiguous gene sequence using PCR. l/Ve initially constructed a set of
`fluorescent proteins to test the efficacy ofthe gene synthesis methods
`on both OLS pools (Fig. 2).
`
`For OLS pool l, we designed two independent ‘a
`‘emhly sulipools’
`that encoded GFP‘mutBb plus flanking orthogonal primer sequences
`that were later used for PCR assembly (construction primers). The
`two assembly subpools, (34943 and Gill’fifi, differed in the average
`overlap length (43 and 3.3 hp, respectively), total length (827790 and
`64 ~78 bases, respectively) a d number of oligonucleotides (18 and
`22 respectively). V‘v'e also designed two subpools, control subpools i
`and 2, containing ten and five l3ii--mer oligon icleotides, respectively,
`to test amplification efficacy. The other eight subpools, containing a
`
`
`total of 12,94. 130-nier sequen
`‘, were constructed on the same chip
`but were not used in this study except to provide potential sources
`of cross-hybridization. Each of these i}: subpools was flanked with
`independent orthogonal printer pairs (assemblyuspeci ic primers).
`As a control, we used these same algorithms to design a set of shorter
`column-synthesized oligonucleotides (20 hp average overlap; 35 ~45
`bases in length; and 39 total oligonucleotides} encoding GFPmutBb
`and obtained them from a commercial provider (lDT). These oligo-
`nucleotides were combined to form a third pool (Clip/2'0) that was
`also tested. (All synthesized oligonucleotides used in the study ca
`he found in Supplementary Sequences).
`Each of the four subpools (Gilli/B, GFP35, control 1 and control 2)-
`were PCR amplified from the synthesized OLS pool using modified
`primers that facilitated downstream processing (Supplementary
`Figs. 1! and 2208 The oligonucleotides were then processed to reniove
`primer sequences (Supplementary Figs. 2b and 3). Briefly, lambda
`exonuclease was used to make the P‘CR products single stranded, and
`then uracil DNA glycosylase, endonuclease Vlll and Dpnll restric-
`
`
`tion endonuclease were used to cleave off the assembly— spec' M prim-
`
`ers. The resultant gel shows t
`t although the reaction was etficient,
`unprocessed oligonucleotide still remained. in addition, we observed
`spurious cleavage by Dpnll that was likely due to the extensive over-
`lap within the subpool that is inherent in the gene synthesis process.
`V‘v'e assembled the “YEP/ti GFPSS and GFP‘ZO subpools using PCR,
`which resulted in GP Pmsized products as well as many incorrect: low
`molecular weight products (Fig. 2a).
`
`
`
`m;
`
`W W
`awn m «\\
`3.\
`
`W W W
`
`
`
`m rm
`
`\ t‘
`
`
`
`\~‘
`\cm ems
`am mm:
`m Elm-2
`mm ems
`
`Assembled genes
`
`independently PCR amplify and process only those oligonucleotides
`required for a given gene assembly. For the 200~mer OLS pool 2, we
`first amplified a ‘plate subpool’ that contained DNA to construct
`up to 96 genes, and then amplified individual ‘assenihly subpools’
`
`to sep; ate the oligonucleotides for an individual gene. For the
`iSO-mer OLS pool 1, we directly amplified assembly subpools,
`foregoing the plate subpool step. Next, the primers used for these
`asnplification steps were removed by either type 118 restriction endonu~
`cleases to forni double-stranded DNA (dsDNA) fragments (OLS "fiOCl 2),
`or a combination of enzymatic steps to form single-stranded D NA
`(ssDNA) fragments (0L8 pool 1). Finally, we constructed full-length
`
`
`.
`I error correction iO
`genes using PCR a
`‘emhly, performed enzym
`improve error rates if necessary, and, finally, cloned and character-
`iced the constructs.
`
`Obtaining subpools of only those DNA fragments required for any
`particular assembly is crucial for robust gene synthesis in very complex
`DNA backgrounds. In addition, isolating subpools relieves constraints
`on sequence similarity inherent in past approaches. To facilitate the
`partitioning oi"()l..S pools into smaller subpools, we designed ZO-rner
`PCR primer sets with low potential cross—hy bridization (‘orthogonal’
`
`primers) derived from a set of 244,000 25 -mer orthogonal sequent
`p,
`developed for barcoding purposesn. Two separate orthogonal printer
`sets were constructed for the dirt rent OlS pools because of their
`v rying requirements for downstream processing. Both sets were
`screened for potential cross—hybridization, low secondary structure
`and matched melting temperatures to construct large sets of ortho--
`gonal PCR primer pairs.
`
`E296
`
`VOLUME 28 NUMBER l2
`
`DECEMBER 20H)
`
`A m m ’3' ’fl
`
`to Q .1 m /,
`
`
`
`o2616NatureAmerica.inc.Allrightsreserved.
`
`
`
`
`
`
`
`;
`
`///'/ /’/ I5
`///,.
`tZ
`4/”a,
`
`
`
`h. I“ r
`=~~
`s
`
`t
`r
`
`t.
`i
`
`l \ x.-
`:D- x
`
`
`
`Figure 2 Gene synthesis products.
`
`(a) Results of PCR '
`ably of GFPmut3
`from two different assembiy subpools
`(GFP43 and GFF’SE) that were amplified
`from OLS pool 1. Full—length GFPmut3 is
`expected to be 779 bp and is indicated
`with an asterisk (*l. Other hands show lower
`moiecuiar weight rnisassernbled products.
`(13)- Gel purification and re—ampiitication
`of the mil-length assemhied GFPmut3.
`
`(c) Results of as ernbiihg three fluorescent
`proteins using the longer oligonucleotides in
`
`
`
`chailengihg GC-rich linkers. Of the
`42 assemblies, all but two (7 and 24):
`
`resulted in strong bands of tl‘ ‘
`We gel isolated and re--arnplitied these ‘
`resulting in bands of the correct size (Supplementary Fig. 10b). The antibody that corresponds to each number is given in Supplementary Table 3
`and the amino acid sequence of the linker region used is given above each gel with differing amino acids in red.
`
`out of a 12,995 oligonucleotide background. Second, the relative
`quantities of the oligortucleoticles in the assembly subpools were suf—
`ficient to allow PCR asse , bly. Third, the error rates from BO-nrer
`0L5 pools were sufficiently low to construct gene-sized fragments
`717 hp) such that >50% ot‘the sequences were perfect. in fact, the
`
`error rates from both the GFP/l3 and GFPSF
`emhlies were indis-
`
`inguishable from the column-synthesized G ill’ZO assemblies. 1?ourth,
`our data show that the level of fluorescence of our gene assemblies
`correlated with the number of constructs with perfect sequence, pro-
`
`
`l
`viding a u;
`.ul screen to test tluore;
`nt gene assemblies in 018 pool A
`(Supplementary Fig. 6):. Finally, although PCR assembly was abt .
`to generate full-l ngth product, many smaller misassembled prod—
`ucts were also formed, requiring the use ofdifficult--to--automate gel
`isolation steps.
`in OLS pool 2, we designed 836 assembly subpools split into
`it plate subpools, encoding 2,456,706 bases ofoligonucleoticles that
`could potentially result in 869,125 bp of final assembled sequence.
`
`we first: constructed three fluorescent proteins to test a
`(mblypro-
`tocols in 0L5 pool 2: mTE P1, rnCitrine and rnApplc. We found that
`the PCR assembly protocols developed for ssDNA suboools in OLS
`pool 1 only produced short (<20t) hp) misassenrblies when applied
`to the dsDNA suhpools in OLS pool, 2. We tested over 1,000 assem-
`bly PCR conditions by varying parameters such as DNA concen-
`tration, annealing temperature, cycle numbers, polymerase choice
`and butter conditions. Using the best protocol (Supplementary
`Note), we assembled the genes encoding the three proteins with
`no detectable misassernhlies, thereby removing the need for gel
`
`
`Figure 3 Characterization of products from OLS pools 1 and 2.
`
`(a) Percentage of fluorescent cells resulting from synth
`‘ products
`
`(1'
`rived from column-synthesized oiigonucleoiides (black), OLS Chip 1
`subpools GFP43 and GFP35 (green) and the three fluorescent proteins
`produced on OLS Chip 2 with and without ErrASE treatment (blue. yellow
`and orange). Error bars correspond to the range of replicates from 3
`(GFPZO, GFPIl-S, GFP35) 2 (GFPA ErrASE, GFP35Err‘ASE). 4(rnTFP1,
`rnCitr'ine, mApple, rnCitrine Err'ASE) and 1 (mTFP1 ErrASE. rnApple
`ErrASE) separate electropor'ations. (h) Error rates (average bp of correct
`
`sequence per error) from various synthes
`produ
`. Error bars show
`the expected Porsson '
`ror based on the number of '
`rors found (ix/n),
`
`
`Deletions of more than two consecutive bases are counted as a single error
`(no such errors were found in OLS pool 1
`
`’e gel isolated, digested and then cloned the assembly products
`into an expression vector (Fig. 2b and Supplementary Fig. 4-). We
`used flow cytometry tests. manual colony counts and sequencing
`ol‘ individual clones to measure the error rates -( Supplementary
`Fig. 53,11»). All three of the assays correlated well, and the error rates
`determined through sequencing were 1/1500 bp, ”1,130 hp and
`1/1350 bp for the (EFF/l3, GFPSS and GFPZO synthesis reactions,
`respectively (Fig. ‘3 and Supplementary Table. 2).
`7*.
`These results illustrate a number of important points. First, our
`subpool assembly printers were sutnciently well-designed to pro~
`
`vide stringer ‘ subpool amplification ofas few as 5 oligonucleotides
`
`
`u
`El OLS pool 2
`
`E
`
`E (296 olagos (lDT)
`CELS pool 1.
`
`so
`
`70
`
`50
`
`40
`30
`20
`10
`
`2
`g
`
`O a
`
`CD
`
`aP
`
`E
`U.
`.3
`
`E CPG pligps (lDT)
`(its pool 1
`(its pool 2
`
`o o
`
`b 2.000
`x
`”‘00
`1,600
`
`5a{B
`
`NATURE tilt}?§Cttz\l(}t,ifl‘i\’ VOLUME 28 NUMBER 12 DECEMBER 201.
`
`129.7
`
`
`
`
`
`o2616NatureAmerica,true.Allrightsreserved.
`
`
`
`
`
`;
`I5
`///'/ //
`///,.
`/a,4
`7
`
`
`
`niisassembled genes displayed faint bands at the correct size, which
`were gel isolated and reainplified to produce strong bands of the cor-
`rect size. We sequenced l5 antibodies including representatives from
`all three linlt’er types. We performed enzymatic error correction using
`ErrASE, gel isolated the product and finally cloned the constructs
`into an expression vector. (Tine of the 15 antibodies did not clone,
`and another had a deleted linlter region in all 21 sequenced clones.
`
`
`
`Both of t. ,ese antibodies were encoded with ll’lt
`ighest GC content
`linlter. The average error rate of the i4 antibodies that did clone was
`l/3 l5 bp (Fig. 3b and Supplementary Table 21: this was considerably
`higher than the GFP assemblies, but still sufficient: for construction
`of genes of this size (~10% of clones should be perfect, on average).
`
`in addition, the high levels ofsequenc similarity between the anti--
`bodies, combined with the successful assembly and sequencing (which
`showed no instances of cross-contamination) further validates that
`
`the selective amplification is at least stringent enough to make highly
`
`related protein sequent
`Our results show the assembly of gene-sized DNA fragments
`totaling ~35,000 bp from oligonucleotide pools of more than
`t) kilobases. A number of ltey features are impo ~tant to malte the
`..,
`‘ocess work, including the use of low- *rror starting material, well-
`chosen orthogonal primers, subpool amplification of individual
`assemblies, optimized assembly methods and enzymatic error
`correction. Together, these features enabled gene assembly from
`oligonucleotide pools containing at least 50 times more sequences
`than previously reported (Supplementary Note). We describe two
`separate OLS pool lengths and assembly methods, which have their
`own advantages and disadvantages (Supplementary Fig. i). The
`shorter, l30-mer OLS pool 1 assemblies have lower error rates, but
`because there are no plate amplifications, will be harder to scale as
`'lize larger OLS pools. The longer 200~mer OLS pool 2
`le, but contained higher error rates. The costs of
`oligonucleotides in both processes are <SO.Qi/hp of final synthesized
`
`sequence, and thus the dominant costs are enzymatic proc'
`.sing,
`cloning and sequence verification. Future work on reducing the cost
`of perfect sequence will focus on the ability to lower sequencing costs
`
`by using cheaper next—generation sequencing technologies, or by
`I. selec-
`.
`.
`.
`.
`.
`a ’J’
`incorporating other error-correction techniques such as PAC
`tion ofohgon tcleotide pools or niutS-based error filtrat:ion3’—j.
`
`we begin to u
`
`is easier to
`
`5 p
`
`
`
`m;
`
`
`isolation (Fig. 2c rnd Supplementary Fig. 7a,b). Cloning followed
`by flow cytornetry screening showed that 6.8%, 7.5% and 6.8% ofthe
`cells were fluorescent for mTFPl, inCitrine and in Apple assemblie,
`respectively (Fig. 3a).
`Assuming 6% correct sequence per construct and no selection
`t...
`against errors in the assembly process, the error rate was 4/250 b”)
`for ZOO-iner OLS pool 2, significantly above that of the estimates for
`l30~mer OLS pool 1 (Ni/1,000 bp) and the sequenced 55,000 150~
`mer 01.8 pool (~1/50ti hp). This is not completely unexpected, as the
`amount of depurination is dependent upon the number of depro-
`
`tection steps during synthes
`and thus the oligon tcleotide length.
`)espite the higher error rate, there were several advantages to the
`
`200--rner OLS pool 2. First, the extensive overlaps cl
`gned in 0L5
`pool l caused spurious processing of the primers from the assembly
`subpools. The use of type lls restriction endonucleases to process
`primers to form dsi)NA resulted in more robust processing Second,
`
`
`t.he use of two amplii
`.ation steps conserves chip-eluted DNA to
`allow for future scaling of the gene synthesis process (Supplementary
`Note). Third, the assemblies of OLS pool l produced many smaller
`bands and required lower-throughput: gel isolation procedures. This
`could be due to misprirning during PCR assembly because of are long
`overlap lengths used in the design process. The assemblies in OLS
`pool 2 used much shorter overlap lengths and resulted in no smaller
`
`molecular weight misa sembled products.
`To improve the error rates ofthe genes assembled from OLS pool 2,
`we used ErrASE, a commercially available enzyme cocktail that
`detects and corrects mismatched base pairs, to remove errors in the
`assembled fluorescent proteins For each gene, we applied ErrASE
`at six different stringencies. reaniplified the constructs, cloned the
`PCR products and rescreened the cloned genes using flow cytornetry.
`Improvement of the level of fluorescence progressively increased
`with greater ErrASE stringency. At the highest levels of error cor--
`rection, the fluorescence levels were 31%, 49% and 26% for m'l‘FPl,
`
`niCitrine and rnApple respectively (Fig. 3a and Supplementary
`Fig. 8). We also performed the ErrASE procedure on our GFPIB
` .ence levels
`and GFP35 pools from OLS pool l, resulting in iluore.
`of 89% and 92%, respectively (Fig. 3a and Supplementary Fig. 5c).
`‘Ne sequenced clones of CFP43 and GFP35 and found three errors in
`2i,5]()
`l/’7,l7(l bp) and four errors in 20,076 (i 5,019 bp') sequenced
`bases, respectively
`As a more challenging test for our DNA synthesis technology,
`we designed and synthesized oligonucleotides in OLS pool 2 for
`42 genes encoding the variable regions of single-chain antibodv
`fragments (scliv) regions corresponding to a number ofwell~ltnown
`antibodies. We have previously had trouble synthesizing these
`
`genes using commercial gene synthe.
`companies. This might be
`partly due to the prototype (Glyfier)3 linlt'er, which is designed
`to maximize flexibility and allow the heavy and light V regions to
`assemblezz. The repetitive nature and h‘ h GC content of the linker-
`
`
`encoding sequences often represent.
`illenge for accurate DNA
`synthe s. We therefore tested three cifferent linker sequences that
`varied in GC content and repetitive character of the linlcer encoding
`u
`sequence. in addition, the presence of high sequence homology in
`,..
`1e antibody backbones and linkers represented a potential source of
` on t
`
`cros hybridi;
`'could interfere with assembly (6] % average
`sequence identity).
`/
`As exjected, the antibody se uences did not assemble as robustlv
`as the fluorescent proteins, and thus we further optimized the con-
`ditions during pre- and post-assembly (Supplementary Figs. 7c, 9
`and ma). Under the best protocol, 40 of the 42 constructs assembled
`to the correct size (Fig. 2d and Supplementary Table 3}. The two
`
`;
`
`///'/ /’/ I5
`///,.
`4/”a,Z
`
`M ET H 0 [35
`
`Methods and any associated references are available in the online version
`of the paper at http://www n ature.coni/n aturebiotechnologyi'.
`
`Note: Supplementary information is available on the Nature Biotechnology website.
`
`ACKNO‘NLEDGL‘VlENTS
`This work was so pportcd by the US Office of'l‘laval Re ‘arch (N00014l010144),
`Nrtional liurnan Genome Research institute Center fo Excellence in Genomics
`
`
`
`St )nce (P50 HGOO3l70), Department of Energy Genomes to Life ( ‘E—FGOZ-
`0211363445), Defense Advanced Research Projects Agency (W911i iii-~08—i-0254)
`and the Wyss institute for Biologically inspired Engineering (all to G.M.C.). We
`
`
`
`thank H. Pa.
`for providing ErrASE
`tnd exper
`during optimiz lion and
`,l. Boeke for adirice on gene assembly protocols. We tdso thank S. Raman,
`F. Vigneault‘ and F. thang for critical
`* dings ofthe manuscript, G. Dantas for
`
`pZEZi "Washington University), F. l"
`s (Yale University) for pZEZiG and
`
`ES. 'Workman (V‘Jyss institute) for pS-
`'Tag‘ZA.
`
`AU THUR CONTRIBUTIONS
`SK. and NE. wrote the paper with contributions from all authors: SK. and G.M.C.
`
`conceived the study; 5.1:. wrote all :d
`"thins and destgned all sequences: SK. and
`
`ME.
`l "gried and performed all experiments; El. provided the oligonucleotides
`
`librarie , MS. and l
`' designed the single-chained. versions of commercial
`antibodi t
` ,
`.B.L. performed the (its high—throughput sequencing experiment and
`provided critical advice on the processing of subpools,
`
`E298
`
`VOLUME 28 NUMBER l2
`
`DECEMBER 20H)
`
`4 m m ’3' ’fl w 9 .1 m /,
`
`
`
`
`
`(d2616NatureAmerica,inc.Allrightsreserved.
`
`
`
`
`
`
`
`Gibson, D.G.
`t
`y
`in
`ragmenis
`’0 to g. ID
`Acids Res. 37, 6
`overlanping 0‘“
`846-990 (2009).
`.__I C .Li,?
`&_
`..omologo .
`.
`
` l’BCOII
`
`rIant DIIA via SLEC. Nai. Mefiled; 4 25
`.Bang,
`1). & Church, GM. Gene
`1
`I
`
`Nat. Meiiwds 5, 37—39 (29GB).
`1-1. & Zhao H DNA. aSSLIVIIbleI, an il'
`.Shao, Z., 7IIao,
`
`
`COIISIr
`on at biocIIeI’r Ial paIIIways.
`I'V'u.r
`6 Ac
`.
`_, Snyder, TM. 8/. Quake, SR. A Inicmfluien ongonucleetide synthes Ize
`_ee, C
`
`Nucleic Aci9 Res. 38, 2514-2521 (201“).
`.KIm, C. et al. Progres- in gene a
`ssea.‘mLIIy from a I\’lAS--driven [TINA rnicroarray.
`Micmeiectron. L
`.83, 1613-1616 (20D).6
`.
`libraries 0
`.LePreusi,
`E.I\/l. et ai. SII'rIIIIesIs
`0
`I high-quaIi‘,
`
`oIigonucleotides by a new! depur i.ation controlled process. Nucieic r’IC/ds Res. 3513,
`2'32”) 2640 (2010).
`. Pan
`”Ian, R P.
`er? al.
`”" is of DNA re Iuiatory elements by
`
`gynthetII: saturation mutagene '
`'Iofeclmoi. 27, 1173-1175 (2009).
`. Nat.
`
`.5!
`'Iiabach,
`IV R.ei ai. Synthetic (3:15 gm of strong prometers. Prop.
`I’Vau'I. Acad,
`
`U014 107, 2633-2543 ’2013)
`
`Iargrfi
`. Li, I8. “I ai. II/lulIipiex padm‘.
`ed serIIeIICing reveais human hyperIIIuIabie
`
`_ (200 I
`Is. Genome Res. 39,160
` CpG var
`
`I.
`Ion of hI. nan RNA. ed!
` by pai‘aiiel
`
`
`'Ll'. .I. B. et 3!.Gen0nIe-w1de IdenIiIIC t
`e 3241210—1213 (2009‘
`Acaptunng a..o seque‘ ' g. S‘s/en
`lVlui
`eca, SJ. et
`ai.
`Lip:'ex amplification of
`large
`sets oi human exons.
`.il/I’Ciiwds 4
`31—936 (2007)
` 5. Proc. Nari.
`
`'.XLI 62‘.“c.3/. DeSIgn of 240.000 orthogenaI 25mer DNA bemode pr-
`Acad. Sci. USA 106. 2"8:O 229412009).
`‘.
`l-lusion, LS. sf 3!. MedIcai aIpiicaIIons of singIe-chain antibodies.
`immunoi. 10, 195»
`
`4/. Fret
`”. Carr, PA.
`Acids 533332816"
`'error correciion for de novelUNA SyIIIIIesis Nucleic
`
`
`
`by one—step assembiy of
`
`aerate
`
`
`
`Ci.
`
`Int. Rev.
`
`
`
`ED
`
`I...
`
`1.1
`
`l\)
`
`,_.
`
`(Al
`
`' I‘ERESTS
`COMPEIINGHRANLIAL I
`TIIe authors deciar: cempeting 1113111‘.1:11 interests: detailsaccompany the fullItxt
`I11MI. version of the paper a: http:/iV IWWJIatulecem, 11amrebiote 1111010gy/.
`
`pR
`
`
`IIblIshed onlIne athm: ’iIIIWIIInaIII
`
`epi’Il’ILS and p"
`
`re rintsandp rm:
`
`
`1.
`
`Church,12M. Ge..r;me engineering Nat. Biotechrioi 27, 1151—1162
`
`
`
`
`
`a, II 81 Saeerr ,i. Advancing I‘.lg.I-IIIruugIIIpuI gene Synthcris Icchnelogy.
`It. 5, 7-' +--722 (2099).
`LAC:mate mun Ieex gene synthesss from programmabie DNA microch
`Naiure 432 10513—10 54 (2094}
` In and assembiy of chip-eluted DNA (AACED): a
`4. Richmond, K.E. at at. Ampliiic
`
`'( ALI/(ls Res. 32, 5011—5018
`method for iIigh-IIroughpuI gen' synthesis Nucl
`
`"S and
`32,
`
`
`
`r;
`
`oligodenKynLIcIeotid
`Nucl
`Acids I?
`
`IIuidIc PicoAIray svrIIIIesis_
`
`ng of
`rrIulI ipie DNA "equII
`
`
`
`540964172004).
`
`6. NiIreIIberg I.’l‘v“I & IvIaiIIaei III. The dependence of cellfree pro‘III‘ri Synthesis III
`E coii upon.naturaiiy 00-:Urlil!g or syntIIIetic polynbonucieoti 'es. Prob. Nati. Acad.
` In
`. USA 47, 1588—1602 (1961).
`
` Ill)
`
`the binding of
`I
`7. Sell, D. e! aI‘. “'1udi
`on polyniicleoti'
`Is, XL'.
`IulaIion of
`
`
`
`rvey of coder:
`am IoacyI-ssRNAs
`Io
`by ribOIrinIII: leohd;s and a
`assignIIIerIis for 2C arrIirl0 arias Proc. Natl. Acaa. Sci. USA 54 1378—1385
`(1061::
`0
`Gibson, D.G. et ai. Creetieh of :1 ba ‘Ieriel ceii conIrolled by a. c Iemically synthesized
`genome. Science 2329, 52—56 (2010).
`
`'2.
`
`01
`
`p:
`
`
`
`
`
`©2616NatureAmerica,Ina.AI:rightsreserved.
`
`
`
`
`
`;
`/’/
`745
`///
`/
`//Z
`
` VOLUME 28 NUMBER 12 DECEMBER 20
`
`
`
`
`
`
`
`o2616NatureAmerica,true.Allrightsreserved.
`
`
`
`
`
`;
`/’/
`74
`///
`Ir5
`{Z/
`
`GNLlNE METHGDS
`
`Reanalysis of 015 pool error rates. We eanalyzed a previously published
`data set for determining sequencing error rates”. Briefly, the data. set was
`derived from high-throughput sequencing using the lllumina Genome
`Analyzer platform of a 53,777 i SO—tner O‘LS pool, Two sequencing runs were
`performed, the first before any amplification, and the second after two rounds
`of ten cycles of PCR {20 cycles total). As our previous analyses were mostly
`looking for distribution effects, we reanalyzed these existing data to let an esti-
`mate of error rates before and after PCR amplification. We realigned the data
`set using Exonerate to allow for gapped alignments and analysis of indelsz“.
` Spec
`.‘ally, we used an affine local alignment model that is equivalent. to the
`classic Smith-W'aterrnan-Gotoh alignment, a gap extension penalty of 7775,
`and used the full refine option to allow for dynamic programming—based
`
`1 calls
`optimization of the alignment. These reads were solely mapped on bas
`by the lllumina platform. We used these alignments to count mismatches,
`deletions and insertions as compared to the designed sequences. However,
`
`as base—calling can be more error pr
`ne on next—generation platforms than
`traditional Sanger~hased approaches, we filtered the results based only on
`high—quality base—calls (Phred scores of230 or >99.9% accuracy). This was
`accomplished by converting lllumina quality scores to Phred values using
`the Maq utility sol2sanger25 and only using statistics from base calls ot‘Phred
`30 or higher. All error rate analysis s . were implement ‘.d in Python and
`are available upon request. Although this method provides an estimate for
`error rates, unmapped reads may have higher error rates, thus underestimating
`the total average error rate. In addition, base—calling errors iight still overes-
`
`tirnate the error rate
`nall y, using only high—quality base calls, which usually
`
`occur only in the i st ten bases or a r
`, might only reflect error rates on the
`5’ end of the synthesized oligormcleoticle.
`
`Design and synthesis of OH; pools. The 13,000 oligos in the first OLS library
`
`(OLE poo l) were broken up into 12 separately amplifi able s bpools (assem-—
`
`bly subpools}, Each assembly subpool was defined by unique 20 bp prim—
`ing sites that flanked each of the oligos in the pool. The priming site, were
`designed to minimize amplification of oligos not in the particular assembly
`subpool. This was done by designing set of orthogonal 20-mers (assembly--
`21
`
`spec
`‘ primers) using a set of 240,000 orthogonal 25-mers
`as a seed. Froi
`these sequences we selected 2’)»-mers \ Vith 3’ sequence ending in thymidine
`
`or GATC for the forward and reverse primers
`‘tively. We screened for
`melting temperatures of 62-64 °C and low primer secondary structure. After
`
`
`the additional filte ;
`iners were chosen
`g, 12 pairs of forward and reverse pr‘
`to be the assembl 7»-specific primers. The £3,000 oligos in the second OLS
`library (CT-LS pool 2) were broken up into ii subpools corresponding to ll
`sets of up to 96 assemblies (plate subpools), which were further divided into a
`total of 836 assembly subpools. A new set oforthogonal primers were designed
`
`simii rly to the. previous set (without the GATC and thymidine constrai
`)
`but further filtered to remove type llS restriction sites, secondary structure,
`primer dimers and .
`dimers. The. final set of primer pairs was distributed
`
`among the plate-speciric primers, assembly-specific primers and construction
`primers. See Supplementary Methods for more detailed design information
`and primer sequences.
`tologies and are available upon
`0L8 pools were synthesized by Agilent Tecl
`signing a Collaborative Technology Develop ient agreement with Agilent.
`Cos ts of OliS pools are a function of the number of unique oligos synthesized
`and of the
`ngth of the oligos («($0.0l per final assembled base-pair for all
`
`scales used in this study). OLS pools l and 2 were independently synthesized,
`cleaved and delivered
`as lyophilized ~~1---lO picomole pools.
`
`Amplification and processing of 0L5 subpools. liyophilized DNA from
`
`O 3 pools 1 and 2 were resuspended in .
`i til lli. ”\ssembly subpools were
`amplified from 1 ul of 013 pool 1 in a 50 pl ql’CR reaction using the lCAl’A
`SYBR FAST qPCR hit (Kapa Biosystems). A secondary 20 ml PCR amplifi—
`
`cation using Taq polvme
`ase was performed from the primary amplifica-
`