`
`
`
`(19) World Intellectual Property Organization
`International Bureau
`
`(43) International Publication Date
`17 April 2008 (17.04.2008)
`
`(51) International Patent Classification:
`CI2N 15/10 (2006.01)
`
`(21) International Application Number:
`PCT/US2007/021488
`
`(22) International Filing Date: 4 October 2007 (04.10.2007)
`
`(10) International Publication Number
`WO 2008/045380 A2
`
`[US/US]; 19 Orrin Street, Cambridge, MA 02138 (US).
`DANNER,John, P. [US/US]; 239 Central Avenue, Mil-
`ton, MA 02186 (US). LIPOVSEK,Dasa [SI/US]; 45 Sun-
`set Road, Cambridge, MA 02138 (US). BASU, Subhayu
`[IN/US]; 1630 Worcester Road, Apartment 529c, Framing-
`ham, MA 01702 (US).
`
`Agent: WALLER,Patrick, R., H.; Wolf, Greenfield &
`Sacks, P.C., Federal Reserve Plaza, 600 Atlantic Avenuc,
`Boston, MA 02210-2206 (US).
`
`Designated States (unless otherwise indicated, for every
`kind of national protection available): AE, AG, AL, AM,
`AT, AU, AZ, BA, BB, BG, BH, BR, BW, BY, BZ, CA, CH,
`CN, CO, CR, CU, CZ, DE, DK, DM, DO, DZ, EC, EE, EG,
`ES, FI, GB, GD, GE, GH, GM, GT, HN, HR, HU,ID,IL,
`IN, IS, JP, KE, KG, KM, KN, KP, KR, KZ, LA, LC, LK,
`LR, LS, LT, LU, LY, MA, MD, ME, MG, MK, MN, MW,
`MX, MY, MZ, NA, NG, NI, NO, NZ, OM, PG, PH, PL,
`PT, RO, RS, RU, SC, SD, SE, SG, SK, SL, SM, SV, SY,
`TJ, TM, TN, TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA,
`ZM, ZW.
`
`[Continued on next page]
`
`English
`
`English
`
`(74)
`
`(81)
`
`(25) Filing Language:
`
`(26) Publication Language:
`
`(30) Priority Data:
`60/849,558
`60/876,641
`60/878,331
`
`4 October 2006 (04.10.2006)
`21 December 2006 (21.12.2006)
`31 December 2006 (31.12.2006)
`
`US
`US
`US
`
`(71) Applicant (for ail designated States except US): CODON
`DEVICES, INC. [US/US]; One Kendall Square, Building
`300, Cambridge, MA 02139 (US).
`
`(72) Inventors; and
`(75) Inventors/Applicants (for US only): BAYNES, Brian, M.
`
`(54) Title: LIBRARIES AND THEIR DESIGN AND ASSEMBLY
`
` variants
`
`Predetermined sequence
`
`200
`
`Identify variable regions
`
`
` Identify constant regions
`
`
`Identify assembly strategy
`
`210
`
`220
`
`230
`
`Aspects of
`(57) Abstract:
`the
`invention relate
`to
`the
`
`design and synthesis of nucleic
`acid
`libraries
`containing
`non-random mutations
`or
`
`the
`Aspects of
`variants.
`invention provide methods for
`assembling libraries containing
`high densities of predetermined
`variant
`sequences.
`Certain
`embodiments
`relate
`to
`the
`
`design and synthesis of nucleic
`acid libraries that express a
`predetermined
`polypeptide
`from a library of nucleic acids
`having silent sequence variants.
`Certain
`embodiments
`relate
`
`to the design and synthesis
`of nucleic acid libraries that
`
`express predetermined RNA
`variants that encade the same
`
`polypeptide sequence.
`
`
`
`
`
`WO2008/045380A2INTIMATIANATATAMMA
`
`
`
`WO 2008/045380 A2
`
`_IIMIIMMIMNNININITAMNIT MTA KIRT TAIT AAY CMM Mt
`
`(84) Designated States (unless otherwise indicated, for every
`kind of regional protection available): ARIPO (BW, GH,
`GM, KE, LS, MW, MZ, NA, SD, SL, SZ, TZ, UG, ZM,
`ZW), Eurasian (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM),
`European (AT, BE, BG, CH, CY, CZ, DE, DK, EE, ES, FI,
`FR, GB, GR, HU, IE,IS, IT, LT, LU, LV, MC, MT, NL, PL,
`
`PT, RO, SE, SI, SK, TR), OAPI (BF, BJ, CF, CG, CI, CM,
`GA, GN, GQ, GW, ML, MR, NE, SN, TD, TG).
`
`Published:
`
`without international search report and to be republished
`upon receipt of that report
`
`
`
`WO2008/045380
`
`PCT/US2007/021488
`
`1
`LIBRARIES AND THEIR DESIGN AND ASSEMBLY
`
`Related Applications
`
`This application claims the benefit under 35 U.S.C. § 119(e) of United States
`
`provisional patent applications, serial number 60/849,558,filed October 4, 2006,serial
`
`number 60/876,641, filed December 21, 2006 and serial number 60/878,33 1, filed December31,
`
`2006, the contents of which are incorporated herein by reference in their entirety.
`
`10
`
`Aspects of the application relate to nucleic acid compositions and assembly methods.
`In particular, the invention relates to the design and assembly of nucleic acid libraries.
`
`Field of the Invention
`
`15
`
`20
`
`25
`
`Background
`Nucleic acid libraries containing large numbers of random nucleic acid variants have
`been used to study the functional properties of a variety of translated or non-translated
`nucleic acid sequences. Smaller nucleic acid libraries that express proteins with variant
`amino acid sequences have been usedto analyze the structure-function relationships of
`certain amino acids at specific positions in target proteins. Variant libraries also have been
`
`used to select or screen for certain nucleic acids or polypeptides that have one or more
`desired properties. For example, variant expression libraries have been screened to identify
`candidate polypeptides that have one or more therapeutic properties ofinterest.
`
`Summary of the Invention
`
`Aspects of the invention provide methods for designing and/or assembling nucleic
`acid libraries that represent large numbers of non-random specified sequencesofinterest
`(e.g., libraries of silent mutations). In some embodiments, high-density nucleic acid libraries
`are provided that exclude non-specified sequences and include only orat least a high-density
`of non-random specified sequences(e.g., sequence variants) of interest. In contrast, libraries
`
`assembled from degenerate nucleic acids may include large numbers of random sequencesin
`addition to sequencesofinterest.
`
`Assemblystrategies of the invention can be usedto generate very large libraries
`representative of many different nucleic acid sequencesofinterest (e.g., libraries of silent
`
`30
`
`mutations). In contrast, current methods for assembling small numbers of variant nucleic
`
`
`
`WO 2008/045380
`
`PCT/US2007/021488
`
`acids cannot be scaled upin a cost-effective manner to generate large numbersof specified
`variants.
`
`Aspects of the invention involve combining and assembling two or more(e.g., 2, 3, 4,
`5, 6, 7, 8, 9, 10, or more) pools of nucleic acid variants, wherein each pool correspondsto a
`different variable regionofa target library. Each pool contains nucleic acids having variant
`sequencesthat were selected for the corresponding variable region. By combiningthe pools,
`the numberofdifferent variants amongst the assembled nucleic acids is the product of the
`numberofvariants in each pool, provided that variants from the first pool are independently
`assembled with variants from the second pool. By choosing appropriate numbersof variable
`regions, each represented by a different pool of specified variant nucleic acids, libraries
`
`containing large numbers of predetermined sequences may be assembled.
`
`Accordingly, aspects of the invention are particularly useful to produce libraries that
`contain large numbers of specified sequencevariants (e.g., libraries of silent mutations).
`Libraries of the invention can be usedto selectively screen or analyze large numbers of
`different predetermined nucleic acids and/or different peptides encoded by the nucleic acids.
`Aspects of the invention relate to the design and assembly oflibraries that contain
`variant nucleic acids having specific predetermined sequences. Aspects of the invention are
`useful to prepare libraries that contain subsetsofall possible sequencesat particular positions
`in a nucleic acid orlibraries that contain all possible silent sequence variants at one or more
`protein-encodingpositionsin a gene ofinterest. In some embodiments,the invention
`provides methodsfor analyzing specific sequencesof interest and designing strategies for
`preparinglibraries that are representative of these sequences. Aspects of the invention
`involve optimizing an assembly strategy to generate a library that only represents
`predetermined nucleic acid variants of interest. In some aspects, an optimized assembly
`strategy is one that excludes non-specified sequence variants. For example, a library of the
`invention may be assembled to include only certain predetermined sequencevariantsat
`positionsof interest and to exclude other sequence variants that would have been presentif
`the library were assembled to include degenerate sequencesat the positionsofinterest. By
`focusing on specified variants, a library can be designed and assembled to maximize the
`
`10
`
`15
`
`20
`
`25
`
`30
`
`numberof sequence variants of interest that are represented. In contrast, if a library is
`designed to be degenerate at all positions of interest in a nucleic acid, then the numberof
`
`constructs or clones requiredfor the library to be representative will be significantly higher
`than the actual numberofvariants of interest. This number quickly becomes impractical
`whenvariants at a plurality of sites are contemplated.
`
`
`
`WO 2008/045380
`
`PCT/US2007/021488
`
`Accordingly, one aspect ofthe invention relates to the design ofassemblystrategies
`for preparing precise high-density nucleic acid libraries. Another aspect of the invention
`
`relates to assembling precise high-density nucleic acid libraries. Aspects of the invention
`also provide precise high-density nucleic acid libraries. A high-density nucleic acid library
`mayinclude more than 100 different sequence variants (e.g., about 10” to 10°; about 10° to
`10*; about 10° to 10°; about 10° to 10°; about 10° to 10’; about 107 to 10°; about 10°to 10°;
`about 10° to 10!°: about 10!° to 10"); about 10'! to 10”: about 10!” to 10)?: about 10)? to 101%.
`about 10'* to 10!°; or more different sequences) wherein a high percentage ofthe different
`sequencesare specified sequences as opposed to random sequences(e.g., more than about
`
`10
`
`50%, more than about 60%, more than about 70%, more than about 75%, more than about
`
`15
`
`20
`
`25
`
`30
`
`80%, more than about 85%, more than about 90%, about 91%, about 92%, about 93%, about
`
`94%, about 95%, about 96%, about 97%, about 98%, about 99%, or more of the sequences
`are predetermined sequencesof interest). In some embodiments, a library may contain only
`non-random variants at a plurality of positions. For example, 10 or more positions may
`include fewerthan all four possible nucleotides (e.g., 3, 2, or 1 nucleotides).
`In some embodiments, an assemblystrategy involves identifying variable and
`constant regionsthat will be assembled to generate a precise high-density nucleic acid library.
`The sequencesof the variant nucleic acids that will be used to assemble the variable regions
`may be designedasillustrated in FIGS. 1 and 2. An assembly strategy also may include
`identifying or selecting constant sequencesthat will be used to connectvariant nucleic acids.
`
`It should be appreciated that variable region boundaries maybeassigned differently
`depending on the level of resolution that is used to analyze library sequences, as explained in
`moredetail below for FIG. 2. In some embodiments,library sequences may be subdivided
`into different numbers of variable and constant regions dependingon thesize (e.g., number of
`consecutive nucleotides) that is used to define each region. For example, at one level of
`
`analysis, a stretch of 10 nucleotides (positions 1-10) for which two or morevariants are
`
`present at each of positions 1-5 and 7-10 may be consideredas a single variable region of 10
`nucleotides. However, at a higherresolution,this region may be separated into two variable
`regions (positions 1-5 and 7-10) separated by a constant region (position 6 that is constant in
`the library). An assembly strategy may include determining how to subdividea library
`sequence into variable and constant regions (e.g., how many different regions and where to
`delineate the boundaries between different regions).
`
`In some embodiments,all the nucleic acid variants in a pool correspondingto a
`predetermined variable region are independently synthesized (e.g., as different
`
`
`
`WO 2008/045380
`
`PCT/US2007/021488
`
`oligonucleotides), and each variant nucleic acid in a pool spansthe length ofthe variable
`region to whichit corresponds. Two or more pools of independently synthesized nucleic
`acids then may be combined and assembled (with or without separate intervening constant
`nucleic acids) to generate a larger pool(e.g., a library) of longer predetermined sequence
`variants. The numberofvariants in this larger pool is expected to be the productofthe
`numberof variants in each poolthat is used for assembly. This approach allows an
`
`exponential reduction in the numberof construction oligonucleotides to be synthesized, as
`compared to more conventional approaches, in which each variantis individually
`synthesized. Aspects of the invention involve the use of nucleic acid modifying enzymes
`suchasrestriction enzymes(e.g., Type IIS restriction enzymes) and ligase enzymes(e.g., T4
`ligase) to prepare and combinepluralities of nucleic acid pools, each pool corresponding to
`predetermined variants of a variable region.
`
`It should be appreciated that the number of sequencevariants in each pool, the size of
`the sequence variants in each pool, and the combined numberofvariants after assemblyall
`maybe determinedbythe selection of sequence boundaries for each variable region stretch
`that is going to be represented by a separate pool of variant nucleic acids. Accordingly,
`assembly strategies may be optimized to obtain a high density library that is representative of
`a large numberof different sequence variants by mixing and assemblingrelatively small
`numbersof different nucleic acid variants. In some embodiments, the variant nucleic acid
`
`pools may be assembledin a hierarchical series of assembly reactions with each assembly
`reaction involving a few (e.g., 2, 3, 4, or 5) variant pools corresponding to adjacent variable
`regions. However, in some embodiments, more variant pools (e.g., 5-10, or more) may be
`mixed and assembledin a single reaction. In some embodiments, an entire variant library
`may be assembledin a single reaction.
`
`10
`
`15
`
`20
`
`25
`
`In some embodiments, an assembly strategy may involve one or more intermediate
`
`sequencing steps to determine and/or confirm the representativeness ofthe final library. This
`strategy can be used to determine/confirm thati) the different variant sequencesofinterest
`are represented and/orii) non-specified variant sequencesare rare (e.g., not represented or
`only present at a low frequency, for example, less than about 30%, less than about 25%,less
`
`30
`
`than about 20%,less than about 15%, less than about 10%, less than about 5%, less than
`about 1%,etc.) in the finallibrary.
`.
`In some embodiments, an assembly strategy may involve one or moreerror-removal
`steps to exclude variant nucleic acids that were not specified (e.g., one or more error-
`
`containing synthetic oligonucleotides). In some embodiments, the same pool of constant
`
`
`
`WO 2008/045380
`
`PCT/US2007/021488
`
`region nucleic acids may be reused and combined with one or more different pools ofvariant
`nucleic acids to assemble a plurality of library variants. In some embodiments, one or more
`nucleic acids representing constant regions may be assembled and/orisolated as perfect
`fragments (e.g., isolated with the correct predetermined sequence havingnoerrors, for
`
`example, by sequencing one or more candidates to identify a construct having a correct
`sequence). These perfect fragments may be used in one or more assembly reactions in
`combination with pools of variant nucleic acids. The pools of variant nucleic acids may be
`perfect (e.g., they contain only specified variants), but in some embodiments they may
`contain a fraction of non-specified variant nucleic acids (e.g., less than about 30%, less than
`about 25%, less than about 20%, less than about 15%, less than about 10%, less than about
`
`5%, less than about 1%, etc.). However, the overall percentage of unspecified variants in the
`final library may be kept low byusing the perfect constant region sequences.
`In some embodiments,libraries (e.g., libraries of silent mutations) can be used to
`evaluate, screen,orselect different polypeptides of interest. In some embodiments,the
`inventionrelates to expression libraries that can be usedto screen orselect for polypeptides
`having one or more functional and/or structural properties (e.g., one or more predetermined
`catalytic, enzymatic, receptor-binding, therapeutic, or other properties). Aspects of the
`invention provide expression libraries (e.g., nucleic-acid/polypeptide libraries) that are
`enriched for candidate polypeptides lacking one or more unwanted characteristics. For
`example, a library that expresses many different polypeptide variants may be designed to
`exclude polypeptides that have poorin vivo solubility, high immunogenicity, low stability,
`etc., or any combination thereof. Accordingly, aspects of the invention provide methods of
`generating filtered expression libraries that are enriched for candidate molecules having
`physiologically compatible or desirable characteristics. In some embodiments,a filtered
`
`10
`
`15
`
`20
`
`25
`
`expression library may be screened and/or exposed to selection conditions to identify one or
`more polypeptides having a function orstructure ofinterest.
`
`Aspects of the invention relate to therapeutic compositions. In someaspects, a
`therapeutic nucleic acid may include one or moresilent mutations. In some embodiments, a
`therapeutic polypeptide may be expressed from a nucleic acid construct that includes one or
`more silent mutations.
`|
`
`30
`
`Aspects of the invention relate to diagnostic methods, compositions, and applications
`related to detecting one or more silent mutationsin a biological sample. A silent mutation in
`a coding sequenceis a nucleotide sequence change in a codonthat doesnotalter the identity
`of the encoded aminoacid due to the degeneracy of the genetic code. For example, an amino
`
`
`
`WO 2008/045380
`
`PCT/US2007/021488
`
`acid may be encoded byoneto six different codons (depending on the aminoacid).Asilent
`mutation is a sequence changethat changes a codonfroma first codon(e.g., a wild type
`codon,a naturally occurring polymorphism,a scaffold codon, a consensus codon,or any
`other starting codon) that encodes an aminoacid to a second different codon that encodesthe
`
`same amino acid. In some embodiments,a silent mutation may be a single nucleotide
`change. In some embodiments,a silent mutation may involve twoor three nucleotide
`
`changes within the codon.
`
`10
`
`15
`
`One or moresilent mutations may be screenedfor in a protein-coding portion of a
`gene associated with a disease (e.g., cancer, a degenerative disease, a neurodegenerative
`disease, an inherited disease, or other disease), a predisposition to a disease (e.g., cancer, a
`degenerative disease, a neurodegenerativedisease, an inherited disease, an infectious disease,
`or other disease), aresponsiveness to a drug or a class of drugs, a susceptibility to an adverse
`drug reaction, a locus associated with a beneficialtrait (e.g., in a crop or other agricultural or
`industrial organism).
`
`Aspects of the invention relate to identifying one or more silent mutations that can be
`used for subsequent diagnostic screening and/or therapeutic applications. Silent mutations
`associated with a trait of interest may be identified by analyzing known silent mutationsin
`
`genes associated with the trait and determining whetherone or moreofthe silent mutations
`
`is associated with (e.g., causative of) the trait. An analysis may involve population genetics
`andstatistical analysis. An analysis may involve preparing one or more nucleic
`
`20
`
`acids having one or moreof the silent mutations and determining if the encoded
`polypeptide(s) have different functional and/or structural properties and determining whether
`any differences in properties may be associated withthe trait of interest (e.g., the disease,
`condition, etc.). A library of silent mutations from a population of individuals (e.g.,
`identified in a population of individuals having one or more phenotypesofinterest, for
`example, patients having a disease or a predisposition to a disease) may be assembled and the
`encoded polypeptides may be analyzed (e.g., screened or selected) for one or more functional
`
`25
`
`30
`
`and/orstructural properties of interest. Libraries may be assembled from and/or screened
`against pooled samples.
`|
`In some embodiments,a library of silent mutations in one or more genes that encode
`proteins associated with drug processing (e.g., drug pumps, such as MDR1, MRP, LRP,drug
`metabolizing enzymes and other drug processing enzymes) may be assembled. Suchalibrary
`may be screened and/orselected to identify silent mutations that increase or decrease drug
`processing (e.g., pumping) and that may be associated increased or decreased responsiveness
`
`
`
`WO 2008/045380
`
`PCT/US2007/021488
`
`to one or more therapeutic compounds(e.g., drug resistance or drug ineffectiveness,etc.).
`Similarly, libraries of silent mutations in genes encoding proteins associated with adverse
`
`responses to drugs and/or toxicity may be assembled and screenedor selected to identify
`variants that may be associated with increased or decreased adverse response and/ortoxicity.
`Similarly, silent mutations associated with othertraits of interest may be identified by
`assemblinglibraries of silent mutations in genes known to be associated with the trait. As
`
`discussed herein, the silent mutation libraries may include one or moresilent mutations in
`each gene(e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more silent mutations may be present in each
`gene or about 1%, about 10%, about 25%, about 50%, about 75%, about 80%, about 90%,
`about 95%, or about all of the possible silent mutations may be represented in a library for a
`predetermined protein-encoding gene).
`
`10
`
`Onceidentified, silent mutations associated with any condition ofinterest (e.g.,
`disease, drug responsiveness, etc.) may be used for diagnostic and/or therapeutic purposes.
`In diagnostic applications, a patient or population of patients may be screened for the
`presence of one or more silent mutations associated withatrait of interest. Any suitable
`15
`biological sample may be screened or assayed for the presence of one or moresilent
`
`mutations. A sample may be analyzedfor a silent mutation using any suitable technique. For
`example, sequencing, primer extension, hybridization, or any other suitable technique, or any
`combination thereof may be used.
`Accordingly, aspects of the invention relate to primers that are designed to interrogate
`a nucleic acid sample for the presence of one or more silent mutations. For example, a
`primer maybe designed for a single base extension reaction to detect a silent mutation. Such
`
`a primer mayhybridize to a nucleic acid immediately adjacent to a position at whicha silent
`mutation may be present suchthat a single base extension product can determine whether a
`silent mutation is present. A biological sample may bea patient sample (e.g., a human or
`other patient such asa pet, an agricultural animal, a vertebrate, a mammal, etc.). A biological
`sample maybea tissue sample(e.g., a tissue biopsy), a fluid sample (e.g., blood, plasma,
`saliva,urine, etc.), or other biological sample (e.g., stool, etc.). The nucleic acid in a sample
`may be enriched, amplified, or selected (e.g., by binding to an immobilization probe, for
`example, on a column,in a microfluidic channel, on a bead, or any other suitable solid
`support), etc., or any combination thereof. The presence of one or more silent mutations ina
`patient may beindicative of a risk of a disease or condition as described herein.
`
`20
`
`25
`
`30
`
`A human patient treatment recommendation may be based onasilent mutation in a
`patient sample. In therapeutic applications, a nucleic acid encoding a therapeutic protein and
`
`
`
`WO 2008/045380
`
`PCT/US2007/021488
`
`8
`having one or more silent mutations of interest may be introducedinto a patientorcell
`(and for example, the cell may be introduced into a patient. Alternatively, or in addition, a
`
`polypeptide product expressed from a gene havinga silent mutation ofinterest may be
`isolated and administered to a patient (e.g., orally, intravenously, intraperitoneally, or
`
`otherwise injected).
`
`Accordingly, aspects of the invention relate to genes having one or moresilent
`
`mutations. Aspects of the invention relate to polypeptides(e.g., isolated polypeptides)
`expressed from genes having one or moresilent mutations. Aspects of the inventionrelate to
`diagnostic tools (e.g., primers, kits, enzymes, etc.) for detecting one or moresilent mutations.
`
`Accordingly, aspects of the invention may be usedto screen orselect libraries(e.g.,
`filtered libraries, silent mutation libraries, or other predeterminedlibraries) for target RNAs
`or polypeptides ofinterest that also have desirable in vivo traits.
`It should be appreciated that selection methods using un-filtered libraries may yield
`proteins with required bindingor catalytic properties, they generally do not select for other
`desirable properties. For example, proteins selected using un-filtered libraries frequently are
`found to have unacceptably low stability or solubility when purified and characterized. In the
`case of proteins designed for therapeutic applications, such as antibodies, antibody fragments,
`non-antibody target-binding proteins, and modified hormonesor receptors, a common
`problemis that proteins selected from un-filtered libraries often evoke an immuneresponse
`whenintroducedinto patients, causing either inactivation of the putative therapeutic or
`
`10
`
`15
`
`20
`
`adverse side effects.
`
`25
`
`In some embodiments,filtering techniques of the invention can be used to identify
`nucleic acid sequencesto be included in a polypeptide expression library. In some
`embodiments,filtering techniques of the invention can be usedto identify nucleic acid
`sequences to be excluded from a polypeptide expression library. In some embodiments,
`methodsofthe invention are useful for screening nucleic acid sequencesthat are candidates
`for inclusion in an expression library and identifying those sequences that encode
`polypeptides with one or more undesirable properties (e.g., poor solubility, high
`immunogenicity, low stability, etc.). Accordingly, aspects of the invention may be used to
`design and assemblealibrary of nucleic acids that encode a plurality of polypeptides having
`30
`one or more biophysical or biological properties that are known or predicted to be within a
`
`predetermined acceptable or desirable range of values.
`
`In some embodiments,libraries can be used to evaluate, screen, and/or select different
`
`nucleic acid sequences that encode the same amino acid sequence. In some embodiments, the
`
`
`
`WO 2008/045380
`
`PCT/US2007/021488
`
`9
`invention relates to expression libraries that can be used to screen orselect for different
`
`expression levels of polypeptides that have the same amino acid sequence, but that are
`
`expressed from different nucleic acid sequences. In some embodiments, the invention relates
`to expressionlibraries that can be usedto screen or select for one or more functional and/or
`structural properties (e.g., one or more predeterminedcatalytic, enzymatic, receptor-binding,
`therapeutic, or other properties) of polypeptides that have the same aminoacid sequence,but
`that are expressed from different nucleic acid sequences. According to the invention,
`different nucleic acid sequences encoding the same polypeptide sequence maybetranslated
`at different rates (e.g., dueto the presence of one or more rare codons). Different translation
`rates may result in different polypeptide expression levels and/or polypeptides that are folded
`into different three-dimensional configurations (and therefore may have different functional
`and/or structural properties).
`
`In some embodiments, libraries can be used to evaluate, screen, and/or select different
`
`nucleic acid sequences that do not encode polypeptides. In some embodiments, the nucleic
`acids in a library may encode putative functional RNAs(e.g., ribozymes, RNA aptamers,
`RNAimolecules, antisense RNAs,etc.) and the library may be usedto identify one or more
`expressed RNAshaving function(s) of interest. In some embodiments,the nucleic acids ina
`library may be non-coding(e.g., neither RNA nor polypeptide encoding), and the library may
`be usedto identify one or more nucleic acids with one or more regulatory and/orstructural
`properties of interest (e.g., one or more promoter, enhancer, response, silencer, binding,
`conformational, or other property of interest, or any combination thereof).
`Accordingly, aspects of the invention relate to assembling libraries that are
`
`representative ofa plurality of predetermined nucleic acid and/or polypeptide sequences of
`interest. A library assembly reaction may include a polymerase and/or a ligase mediated
`reaction. In some embodiments the assembly reaction involves two or more cycles of
`~ denaturing, annealing, and extension conditions. In some embodiments, assembled library
`nucleic acids may be amplified, sequenced or cloned. In some embodiments,a host cell may
`be transformed with the assembled library nucleic acids. Library nucleic acids may be
`integrated into the genomeofthe host cell. In some embodiments,the library nucleic acids
`maybe expressed, for example, underthe control of a promoter(e.g., an inducible promoter).
`Individual variant clones maybeisolated from a library. Nucleic acids and/or polypeptides
`of interest may be isolated or purified. A cell preparation transformed with a nucleic acid
`
`library, or an isolated nucleic acid of interest, may be stored, shipped, and/or propagated
`(e.g., grown in culture).
`
`10
`
`15
`
`20
`
`25
`
`30
`
`
`
`WO 2008/045380
`
`.
`
`10
`In another aspect, the invention provides methodsof obtaining nucleic acid libraries
`
`PCT/US2007/021488
`
`by sending sequence information and delivery information to a remote site. The sequence
`information maybe analyzedat the remote site. Starting nucleic acids may be designed
`and/or produced at the remote site. The starting nucleic acids may be assembledin a process
`that generates the desired sequencevariation at the remote site. In some embodiments, the
`
`starting nucleic acids, an intermediate product in the assembly reaction, and/or the assembled
`
`nucleic acid library may be shipped to the delivery address that was provided.
`Other aspects of the invention provide systems for designing starting nucleic acids
`and/or for assembling the starting nucleic acids to makea target library. Other aspects of the
`invention relate to methods and devices for automating a multiplex oligonucleotide assembly
`reaction (e.g., using a microfluidic device, a robotic liquid handling device, or a combination
`thereof) to generate a library of interest. Further aspects of the invention relate to business
`
`methods of marketing one or morestrategies, protocols, systems, and/or automated
`proceduresthat are associated with a high-density nucleic acid library assembly. Yet further
`aspects of the invention relate to business methods of marketing one or morelibraries.
`Other features and advantagesof the invention will be apparent from the following
`detailed description, and from the claims. The claims provided below are hereby
`incorporated into this section by reference.
`
`Brief Description of the Figures
`FIG.1 illustrates a non-limiting embodimentofa strategy for designing and
`assembling a precise high-density nucleic acid library;
`
`FIG.2 illustrates a non-limiting embodimentof a method for designing assembly
`nucleic acids and an assemblystrategy for a precise high-density nucleic acid library;
`FIG.3 illustrates non-limiting embodiments of assembly techniques in panels A-D;
`
`10
`
`15
`
`20
`
`25
`
`FIG.4 illustrates a non-limiting embodiment of an assembly technique for producing
`a pool of predetermined nucleic acid sequence variants;
`
`FIG.5 illustrates non-limiting embodimentsof hairpin oligonucleotide designs in
`panels A-D;
`
`30
`
`FIG.6 illustrates non-limiting embodiments dumbbell oligonucleotide designs in
`panels A-B;
`FIG.7 illustrates non-limiting embodiments of hairpin oligonucleotide designsin
`panels A-D;
`‘
`
`FIG.8 illustrates non-limiting embodiments of.assembly techniques in panel A-B;
`
`
`
`WO 2008/045380
`
`PCT/US2007/021488
`
`11
`FIG.9illustrates a non-limiting embodimentof a silent mutation scanningstrategy;
`and,
`
`FIG, 10 illustrates a non-limiting embodiment of a methodfor selecting protein
`sequencesfora library.
`
`Detailed Description of the Invention
`Aspects of the invention relate to strategies and methodsfor constructing non-random
`nucleic acid libraries comprising pluralities of substantially predetermined(e.g., pre-selected)
`variant nucleic acid sequences. A “non-random”library meansthat the target species in the
`library are substantially predetermined or pre-selected prior to assembly, as opposed to being
`substantially degenerate or randomly derived. Generally, predetermined (or non-random)
`species are specified or selected from all possible species. Thus, unlike randomly derived
`variants or mutations, predetermined species represent a subsetofall possible species.
`Nonethe