`
`(19) World Intellectual Property Organization
`International Bureau
`
`(43) International Publication Date
`17 April 2008 (17.04.2008)
`
`(51) International Patent Classification:
`CIZN 15/10 (2006.01)
`
`
`
`(21) International Application Number:
`PCT/US2007/021488
`
`(22) International Filing Date: 4 October 2007 (04.10.2007)
`
`(25) Filing Language:
`
`(26) Publication Language:
`
`English
`
`English
`
`(74)
`
`(81)
`
`(30) Priority Data:
`60/849,558
`60/876,641
`60/878,331
`
`4 October 2006 (04.10.2006)
`21 December 2006 (21 .12.2006)
`31 December 2006 (31.12.2006)
`
`US
`US
`US
`
`(71) Applicant (for all designated States except US): CODON
`DEVICES, INC. [US/US]; One Kendall Square, Building
`300, Cambridge, MA 02139 (US).
`
`(72) Inventors; and
`(75) Inventors/Applicants (for US only): BAYNES,Brian, M.
`
`(54) Title: LIBRARIES AND THEIR DESIGN AND ASSEMBLY
`
`(10) International Publication Number
`
`WO 2008/045380 A2
`
`[US/US]; 19 Orrin Street, Cambridge, MA 02138 (US).
`DANNER, John, P. [US/US]; 239 Central Avenue, Mil—
`ton, MA 02186 (US). LIPOVSEK, Dasa [SI/US]; 45 Sun—
`set Road, Cambridge, MA 02138 (US). BASU, Subhayu
`[IN/US]; 1630 Worcester Road, Apartment 529e, Framing—
`ham, MA 01702 (US).
`
`Agent: WALLER, Patrick, R., H.; Wolf, Greenfield &
`Sacks, P.C., Federal Reserve Plaza, 600 Atlantic Avenue,
`Boston, MA 02210—2206 (US).
`
`Designated States (unless otherwise indicated, for every
`kind of national protection available): AE, AG, AL, AM,
`AT, AU, AZ, BA, BB, BG, BH, BR, BW, BY, BZ, CA, CH,
`CN, CO, CR, CU, CZ, DE, DK, DM, DO, DZ, EC, EE, EG,
`ES, FI, GB, GD, GE, GH, GM, GT, HN, HR, HU, ID, IL,
`IN, IS, JP, KE, KG, KM, KN, KP, KR, KZ, LA, LC, LK,
`LR, LS, LT, LU, LY, MA, MD, ME, MG, MK, MN, MW,
`MX, MY, MZ, NA, NG, NI, NO, NZ, OM, PG, PH, PL,
`PT, RO, RS, RU, SC, SD, SE, SG, SK, SL, SM, SV, SY,
`TJ, TM, TN, TR, TT, TZ, UA, UG, US, UZ, VC, VN, ZA,
`ZM, ZW.
`
`[Continued on next page]
`
` Predetermined sequence
`
`variants
`
` Identify variable regions
`
` Identify constant regions
`
` Identify assembly strategy
`
`
`
`200
`
`210
`
`220
`
`230
`
`Aspects of
`(57) Abstract:
`the
`invention relate
`to
`the
`
`design and synthesis of nucleic
`acid
`libraries
`containing
`non—random mutations
`or
`
`the
`Aspects of
`variants.
`invention provide methods for
`assembling libraries containing
`high densities of predetermined
`variant
`sequences.
`Certain
`embodiments
`relate
`to
`the
`
`design and synthesis of nucleic
`acid libraries that express a
`predetermined
`polypeptide
`from a library of nucleic acids
`having silent sequence variants.
`Certain
`embodiments
`relate
`
`to the design and synthesis
`of nucleic acid libraries that
`
`express predetermined RNA
`variants that encode the same
`
`polypeptide sequence.
`
`
`
`W02008/045380A2|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
`
`
`
`WO 2008/045380 A2
`
`|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
`
`(84) Designated States (unless otherwise indicated, for every
`kind of regional protection available): ARIPO (BW, GH,
`GM, KE, LS, MW, MZ, NA, SD, SL, SZ, TZ, UG, ZM,
`Published:
`ZW), Eurasian (AM, AZ, BY, KG, KZ, MD, RU, TJ, TM),
`European (AT, BE, BG, CH, CY, CZ, DE, DK, EE, ES, FI, — without international search report and to be republished
`FR, GB, GR, HU, IE, IS, IT, LT, LU, LV, MC, MT, NL, PL,
`upon receipt of that report
`
`PT, RO, SE, SI, SK, TR), OAPI (BF, BJ, CF, CG, CI, CM,
`GA, GN, GQ, GW, ML, MR, NE, SN, TD, TG).
`
`
`
`WO 2008/045380
`
`PCT/U82007/021488
`
`1
`
`LIBRARIES AND THEIR DESIGN AND ASSEMBLY
`
`Related Applications
`
`This application claims the benefit under 35 U.S.C. § 119(e) of United States
`
`provisional patent applications, serial number 60/849,558, filed October 4, 2006, serial
`
`number 60/876,641, filed December 21, 2006 and serial number 60/878,331, filed December 31,
`
`2006, the contents of which are incorporated herein by reference in their entirety.
`
`Field of the Invention
`
`Aspects of the application relate to nucleic acid compositions and assembly methods.
`
`In particular, the invention relates to the design and assembly of nucleic acid libraries.
`
`Background
`
`Nucleic acid libraries containing large numbers of random nucleic acid variants have
`
`been used to study the functional properties of a variety of translated or non-translated
`
`nucleic acid sequences. Smaller nucleic acid libraries that express proteins with variant
`
`amino acid sequences have been used to analyze the structure-function relationships of
`
`certain amino acids at specific positions in target proteins. Variant libraries also have been
`
`used to select or screen for certain nucleic acids or polypeptides that have one or more
`
`desired properties. For example, variant expression libraries have been screened to identify
`
`candidate polypeptides that have one or more therapeutic properties of interest.
`
`10
`
`15
`
`20
`
`Summary of the Invention
`
`Aspects of the invention provide methods for designing and/or assembling nucleic
`
`acid libraries that represent large numbers of non-random specified sequences of interest
`
`(e. g., libraries of silent mutations). In some embodiments, high-density nucleic acid libraries
`are provided that exclude non-specified sequences and include only or at least a high-density
`
`25
`
`of non-random specified sequences (e.g., sequence variants) of interest. In contrast, libraries
`
`assembled from degenerate nucleic acids may include large numbers of random sequences in
`
`addition to sequences of interest.
`
`Assembly strategies of the invention can be used to generate very large libraries
`
`30
`
`representative of many different nucleic acid sequences of interest (e.g., libraries of silent
`
`mutations). In contrast, current methods for assembling small numbers of variant nucleic
`
`
`
`WO 2008/045380
`
`PCT/U82007/021488
`
`.
`2
`.
`.
`.
`ac1ds cannot be scaled up in a cost-effectlve manner to generate large numbers of spec1fied
`
`variants.
`
`Aspects of the invention involve combining and assembling two or more (e.g., 2, 3, 4,
`
`5, 6, 7, 8, 9, 10, or more) pools of nucleic acid variants, wherein each pool corresponds to a
`
`different variable region of a target library. Each pool contains nucleic acids having variant
`
`sequences that were selected for the corresponding variable region. By combining the pools,
`
`the number of different variants amongst the assembled nucleic acids is the product of the
`
`number of variants in each pool, provided that variants from the first pool are independently
`
`assembled with variants from the second pool. By choosing appropriate numbers of variable
`
`regions, each represented by a different pool of specified variant nucleic acids, libraries
`
`containing large numbers of predetermined sequences may be assembled.
`
`Accordingly, aspects of the invention are particularly useful to produce libraries that
`
`contain large numbers of specified sequence variants (e.g., libraries of silent mutations).
`
`Libraries of the invention can be used to selectively screen or analyze large numbers of
`
`different predetermined nucleic acids and/or different peptides encoded by the nucleic acids.
`
`Aspects of the invention relate to the design and assembly of libraries that contain
`
`variant nucleic acids having specific predetermined sequences. Aspects of the invention are
`
`useful to prepare libraries that contain subsets of all possible sequences at particular positions
`
`in a nucleic acid or libraries that contain all possible silent sequence variants at one or more
`
`protein-encoding positions in a gene of interest. In some embodiments, the invention
`provides methods for analyzing specific sequences of interest and designing strategies for
`
`preparing libraries that are representative of these sequences. Aspects of the invention
`
`involve optimizing an assembly strategy to generate a library that only represents
`
`predetermined nucleic acid variants of interest. In some aspects, an optimized assembly
`
`strategy is one that excludes non-specified sequence variants. For example, a library of the
`
`invention may be assembled to include only certain predetermined sequence variants at
`
`positions of interest and to exclude other sequence variants that would have been present if
`
`the library were assembled to include degenerate sequences at the positions of interest. By
`
`focusing on specified variants, a library can be designed and assembled to maximize the
`
`number of sequence variants of interest that are represented. In contrast, if a library is
`
`designed to be degenerate at all positions of interest in a nucleic acid, then the number of
`
`constructs or clones required for the library to be representative will be significantly higher
`
`than the actual number of variants of interest. This number quickly becomes impractical
`
`when variants at a plurality of sites are contemplated.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`
`
`WO 2008/045380
`
`PCT/U82007/021488
`
`,
`
`3
`
`Accordingly, one aspect of the invention relates to the design of assembly strategies
`
`for preparing precise high-density nucleic acid libraries. Another aspect of the invention
`
`relates to assembling precise high-density nucleic acid libraries. Aspects of the invention
`
`also provide precise high-density nucleic acid libraries. A high-density nucleic acid library
`
`may include more than 100 different sequence variants (e. g., about 102 to 103; about 103 to
`
`104; about 104 to 105; about 105 to 10"; about 106 to 107; about 107 to 103; about 108 to 109;
`
`about 109 to 1010; about 1010 to 10”; about 1011 to 1012; about 1012 to 1013; about 10'3 to 10”;
`
`about 1014 to 1015; or more different sequences) wherein a high percentage of the different
`
`sequences are specified sequences as opposed to random sequences (e. g., more than about
`
`50%, more than about 60%, more than about 70%, more than about 75%, more than about
`
`80%, more than about 85%, more than about 90%, about 91%, about 92%, about 93%, about
`
`94%, about 95%, about 96%, about 97%, about 98%, about 99%, or more of the sequences
`
`are predetermined sequences of interest). In some embodiments, a library may contain only
`
`non—random variants at a plurality of positions. For example, 10 or more positions may
`
`include fewer than all four possible nucleotides (e.g., 3, 2, or 1 nucleotides).
`
`In some embodiments, an assembly strategy involves identifying variable and
`
`constant regions that will be assembled to generate a precise high-density nucleic acid library.
`
`The sequences of the variant nucleic acids that will be used to assemble the variable regions
`
`may be designed as illustrated in FIGS. 1 and 2. An assembly strategy also may include
`
`identifying or selecting constant sequences that will be used to connect variant nucleic acids.
`
`It should be appreciated that variable region boundaries may be assigned differently
`depending on the level of resolution that is used to analyze library sequences, as explained in
`
`more detail below for FIG. 2. In some embodiments, library sequences may be subdivided
`
`into different numbers of variable and constant regions depending on the size (e.g., number of
`
`consecutive nucleotides) that is used to define each region. For example, at one level of
`
`analysis, a stretch of 10 nucleotides (positions 1—10) for which two or more variants are
`
`present at each of positions 1-5 and 7—10 may be considered as a single variable region of 10
`
`nucleotides. However, at a higher resolution, this region may be separated into two variable
`
`regions (positions 1-5 and 7-10) separated by a constant region (position 6 that is constant in
`
`the library). An assembly strategy may include determining how to subdivide a library
`
`sequence into variable and constant regions (e.g., how many different regions and where to
`
`delineate the boundaries between different regions).
`
`In some embodiments, all the nucleic acid variants in a pool corresponding to a
`
`predetermined variable region are independently synthesized (e. g., as different
`
`10
`
`15
`
`20
`
`25
`
`30
`
`
`
`WO 2008/045380
`
`PCT/U82007/021488
`
`4
`oligonucleotides), and each variant nucleic acid in a pool spans the length of the variable
`
`region to which it corresponds. Two or more pools of independently synthesized nucleic
`
`acids then may be combined and assembled (with or without separate intervening constant
`
`nucleic acids) to generate a larger pool (e. g., a library) of longer predetermined sequence
`
`variants. The number of variants in this larger pool is expected to be the product of the
`
`number of variants in each pool that is used for assembly. This approach allows an
`
`exponential reduction in the number of construction oligonucleotides to be synthesized, as
`
`compared to more conventional approaches, in which each variant is individually
`
`synthesized. Aspects of the invention involve the use of nucleic acid modifying enzymes
`
`such as restriction enzymes (e.g., Type 113 restriction enzymes) and ligase enzymes (e.g., T4
`
`ligase) to prepare and combine pluralities of nucleic acid pools, each pool corresponding to
`
`predetermined variants of a variable region.
`
`It should be appreciated that the number of sequence variants in each pool, the size of
`
`the sequence variants in each pool, and the combined number of variants after assembly all
`
`may be determined by the selection of sequence boundaries for each variable region stretch
`
`that is going to be represented by a separateypool of variant nucleic acids. Accordingly,
`
`assembly strategies may be optimized to obtain a high density library that is representative of
`a large number of different sequence variants by mixing and assembling relatively small
`
`numbers of different nucleic acid variants. In some embodiments, the variant nucleic acid
`
`pools may be assembled in a hierarchical series of assembly reactions with each assembly
`
`reaction involving a few (e. g., 2, 3, 4, or 5) variant pools corresponding to adjacent variable
`
`regions. However, in some embodiments, more variant pools (e. g., 5—10, or more) may be
`mixed and assembled in a single reaction. In some embodiments, an entire variant library
`
`may be assembled in a single reaction.
`
`In some embodiments, an assembly strategy may involve one or more intermediate
`
`sequencing steps to determine and/or confirm the representativeness of the final library. This
`
`strategy can be used to determine/confirm that i) the different variant sequences of interest
`are represented and/or ii) non-specified variant sequences are rare (e.g., not represented or
`
`only present at a low frequency, for example, less than about 30%, less than about 25%, less
`
`than about 20%, less than about 15%, less than about 10%, less than about 5%, less than
`
`about 1%, etc.) in the final library.
`
`'
`
`In some embodiments, an assembly strategy may involve one or more error—removal
`
`steps to exclude variant nucleic acids that were not specified (e. g., one or more error-
`
`containing synthetic oligonucleotides). In some embodiments, the same pool of constant
`
`10
`
`15
`
`20
`
`25
`
`30
`
`
`
`WO 2008/045380
`
`PCT/U82007/021488
`
`5
`
`region nucleic acids may be reused and combined with one or more different pools of variant
`nucleic acids to assemble a plurality of library variants. In some embodiments, one or more
`
`nucleic acids representing constant regions may be assembled and/or isolated as perfect
`
`fragments (e.g., isolated with the correct predetermined sequence having no errors, for
`
`example, by sequencing one or more candidates to identify a construct having a correct
`sequence). These perfect fragments may be used in one or more assembly reactions in
`
`combination with pools of variant nucleic acids. The pools of variant nucleic acids may be
`
`perfect (e.g., they contain only specified variants), but in some embodiments they may
`
`'contain a fraction of non—specified variant nucleic acids (e.g., less than about 30%, less than
`
`about 25%, less than about 20%, less than about 15%, less than about 10%, less than about
`
`5%, less than about 1%, etc.). However, the overall percentage of unspecified variants in the
`
`final library may be kept low by using the perfect constant region sequences.
`
`In some embodiments, libraries (e.g., libraries of silent mutations) can be used to
`
`evaluate, screen, or select different polypeptides of interest. In some embodiments, the
`
`invention relates to expression libraries that can be used to screen or select for polypeptides
`
`having one or more functional and/or structural properties (e.g., one or more predetermined
`
`catalytic, enzymatic, receptor-binding, therapeutic, or other properties). Aspects of the
`invention provide expression libraries (e.g., nucleic-acid/polypeptide libraries) that are
`
`enriched for candidate polypeptides lacking one or more unwanted characteristics. For
`
`example, a library that expresses many different polypeptide variants may be designed to
`
`exclude polypeptides that have poor in vivo solubility, high immunogenicity, low stability,
`
`etc., or any combination thereof. Accordingly, aspects of the invention provide methods of
`
`generating filtered expression libraries that are enriched for candidate molecules having
`
`physiologically compatible or desirable characteristics. In some embodiments, a filtered
`
`expression library may be screened and/or exposed to selection conditions to identify one or
`
`more polypeptides having a fimction or structure of interest.
`
`Aspects of the invention relate to therapeutic compositions. In some aspects, a
`
`therapeutic nucleic acid may include one or more silent mutations. In some embodiments, a
`therapeutic polypeptide may be expressed from a nucleic acid construct that includes one or
`more silent mutations.
`I
`
`Aspects of the invention relate to diagnostic methods, compositions, and applications
`
`related to detecting one or more silent mutations in a biological sample. A silent mutation in
`
`a coding sequence is a nucleotide sequence change in a codon that does not alter the identity
`
`of the encoded amino acid due to the degeneracy of the genetic code. For example, an amino
`
`10
`
`15
`
`20
`
`25
`
`30
`
`
`
`WO 2008/045380
`
`PCT/U82007/021488
`
`acid may be encoded by one to six different cgdons (depending on the amino acid). A silent
`
`mutation is a sequence change that changes a codon from a first codon (e.g., a wild type
`codon, a naturally occurring polymorphism, a scaffold codon, a consensus codon, or any
`
`other starting codon) that encodes an amino acid to a second different codon that encodes the
`
`same amino acid. In some embodiments, a silent mutation may be a single nucleotide
`
`change. In some embodiments, a silent mutation may involve two or three nucleotide
`
`changes within the codon.
`
`One or more silent‘mutations may be screened for in a protein-coding portion of a
`
`gene associated with a disease (e.g., cancer, a degenerative disease, a neurodegenerative
`
`disease, an inherited disease, or other disease), a predisposition to a disease (e. g., cancer, a
`
`degenerative disease, a neurodegenerative‘ disease, an inherited disease, an infectious disease,
`or other disease), a'responsiveness to a drug or a class of drugs, a susceptibility to an adverse
`
`drug reaction, a locus associated with a beneficial trait (e.g., in a crop or other agricultural or
`
`industrial organism).
`
`Aspects of the invention relate to identifying one or more silent mutations that can be
`
`used for subsequent diagnostic screening and/or therapeutic applications. Silent mutations
`
`associated with a trait of interest may be identified by analyzing known silent mutations in
`
`genes associated with the trait and determining whether one or more of the silent mutations
`
`is associated with (e.g., causative of) the trait. An analysis may involve population genetics
`
`and statistical analysis. An analysis may involve preparing one or more nucleic
`
`acids having one or more of the silent mutations and determining if the encoded
`
`polypeptide(s) have different functional and/or structural properties and determining whether
`
`any differences in properties may be associated with the trait of interest (e.g., the disease,
`
`condition, etc.). A library of silent mutations from a population of individuals (e.g.,
`
`identified in a population of individuals having one or more phenotypes of interest, for
`
`example, patients having a disease or a predisposition to a disease) may be assembled and the
`
`encoded polypeptides may be analyzed (e.g., screened or selected) for one or more functional
`
`and/or structural properties of interest. Libraries may be assembled from and/or screened
`against pooled samples.
`'
`
`In some embodiments, a library of silent mutations in one or more genes that encode
`
`proteins associated with drug processing (e.g., drug pumps, such as MDRl , MRP, LRP, drug
`
`metabolizing enzymes and other drug processing enzymes) may be assembled. Such a library
`
`may be screened and/or selected to identify silent mutations that increase or decrease drug
`
`processing (e.g., pumping) and that may be associated increased or decreased responsiveness
`
`10
`
`15
`
`20
`
`25
`
`30
`
`
`
`WO 2008/045380
`
`PCT/U82007/021488
`
`n
`7
`to one or more therapeutic compounds (e.g., drug resistance or drug ineffectiveness, etc.).
`
`Similarly, libraries of silent mutations in genes encoding proteins associated with adverse
`
`responses to drugs and/or toxicity may be assembled and screened or selected to’identify
`
`variants that may be associated with increased or decreased adverse response and/or toxicity.
`
`Similarly, silent mutations associated with other traits of interest may be identified by
`
`assembling libraries of silent mutations in genes known to be associated with the trait. As
`
`discussed herein, the silent mutation libraries may include one or more silent mutations in
`
`each gene (e.g., 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, or more silent mutations may be present in each
`
`gene or about 1%, about 10%, about 25%, about 50%, about 75%, about 80%, about 90%,
`
`about 95%, or about all of the possible silent mutations may be represented in a library for a
`
`predetermined protein-encoding gene).
`
`Once identified, silent mutations associated with any condition of interest (e. g.,
`
`disease, drug responsiveness, etc.) may be used for diagnostic and/or therapeutic purposes.
`
`In diagnostic applications, a patient or population of patients may be screened for the
`
`presence of one or more silent mutations associated with a trait of interest. Any suitable
`
`biological sample may be screened or assayed for the presence of one or more silent
`
`mutations. A sample may be analyzed for a silent mutation using any suitable technique. For
`
`example, sequencing, primer extension, hybridization, or any other suitable technique, or any
`combination thereof may be used.
`
`Accordingly, aspects of the invention relate to primers that are designed to interrogate
`
`a nucleic acid sample for the presence of one or more silent mutations. For example, a
`
`primer may be designed for a single base extension reaction to detect a silent mutation. Such
`
`a primer may hybridize to a nucleic acid immediately adjacent to a position at which a silent
`mutation may be present such that a single base extension product can determine whether a
`
`silent mutation is present. A biological sample may be a patient sample (e. g., a human or
`
`other patient such as a pet, an agricultural animal, a vertebrate, a mammal, etc.). A biological
`
`sample may be a tissue sample (e.g., a tissue biopsy), a fluid sample (e. g., blood, plasma,
`
`saliva, urine, etc.), or other biological sample (e. g., stool, etc.). The nucleic acid in a sample
`
`may be enriched, amplified, or selected (e.g., by binding to an immobilization probe, for
`
`example, on a column, in a microfluidic channel, on a bead, or any other suitable solid
`
`support), etc., or any combination thereof. The presence of one or more silent mutations in a
`
`patient may be indicative of a risk of a disease or condition as described herein.
`
`A human patient treatment recommendation may be based on a silent mutation in a
`
`patient sample. In therapeutic applications, a nucleic acid encoding a therapeutic protein and
`
`10
`
`15
`
`20
`
`25
`
`30
`
`
`
`WO 2008/045380
`
`PCT/U82007/021488
`
`8
`
`having one or more silent mutations of interest may be introduced into a patient or cell
`
`(and for example, the cell may be introduced into a patient. Alternatively, or in addition, a
`
`polypeptide product expressed from a gene having a silent mutation of interest may be
`
`isolated and administered to a patient (e.g., orally, intravenously, intraperitoneally, or
`
`otherwise injected).
`
`Accordingly, aspects of the invention relate to genes having one or more silent
`
`mutations. Aspects of the invention relate to polypeptides (e. g., isolated polypeptides)
`
`expressed from genes having one or more silent mutations. Aspects of the invention relate to
`
`diagnostic tools (e.g., primers, kits, enzymes, etc.) for detecting one or more silent mutations.
`
`Accordingly, aspects of the invention may be used to screen or select libraries(e.g.,
`
`filtered libraries, silent mutation libraries, or other predetermined libraries) for target RNAs
`or polypeptides of interest that also have desirable in vivo traits.
`
`It should be appreciated that selection methods using un-filtered libraries may yield
`
`proteins with required binding or catalytic properties, they generally do not select for other
`
`desirable properties. For example, proteins selected using un-filtered libraries frequently are
`
`found to have unacceptably low stability or solubility when purified and characterized. In the
`
`case of proteins designed for therapeutic applications, such as antibodies, antibody fragments,
`
`non-antibody target-binding proteins, and modified hormones or receptors, a common
`
`problem is that proteins selected from un-filtered libraries often evoke an immune response
`
`when introduced into patients, causing either inactivation of the putative therapeutic or
`
`10
`
`15
`
`20
`
`adverse side effects.
`
`In some embodiments, filtering techniques of the invention can be used to identify
`
`nucleic acid sequences to be included in a polypeptide expression library. In some
`
`embodiments, filtering techniques of the invention can be used to identify nucleic acid
`
`25
`
`sequences to be excluded from a polypeptide expression library. In some embodiments,
`
`methods of the invention are useful for screening nucleic acid sequences that are candidates
`
`for inclusion in an expression library and identifying those sequences that encode
`
`polypeptides with one or more undesirable properties (e. g., poor solubility, high
`
`immunogenicity, low stability, etc.). Accordingly, aspects of the invention may be used to
`
`30
`
`design and assemble a library of nucleic acids that encode a plurality of polypeptides having
`
`one or more biophysical or biological properties that are known or predicted to be within a
`
`predetermined acceptable or desirable range of values.
`
`In some embodiments, libraries can be used to evaluate, screen, and/or select different
`
`nucleic acid sequences that encode the same amino acid sequence. In some embodiments, the
`
`
`
`WO 2008/045380
`
`PCT/U82007/021488
`
`9
`
`invention relates to expression libraries that can be used to screen or select for different
`
`expression levels of polypeptides that have the same amino acid sequence, but that are
`
`expressed from different nucleic acid sequences. In some embodiments, the invention relates
`
`to expression libraries that. can be used to screen or select for one or more functional and/or
`
`structural properties (e.g., one or more predetermined catalytic, enzymatic, receptor-binding,
`
`therapeutic, or other properties) of polypeptides that have the same amino acid sequence, but
`
`that are expressed from different nucleic acid sequences. According to the invention,
`
`different nucleic acid sequences encoding the same polypeptide sequence may be translated
`
`at different rates (e.g., dueto the presence of one or more rare codons). Different translation
`
`10
`
`rates may result in different polypeptide expression levels and/or polypeptides that are folded
`
`into different three-dimensional configurations (and therefore may have different fimctional
`
`and/or structural properties).
`
`In some embodiments, libraries can be used to evaluate, screen, and/or select different
`
`nucleic acid sequences that do not encode polypeptides. In some embodiments, the nucleic
`acids in a library may encode putative functional RNAs (e.g., ribozymes, RNA aptamers,
`
`15
`
`RNAi molecules, antisense RNAs, etc.) and the library may be used to identify one or more
`
`expressed RNAs having function(s) of interest. In some embodiments,the nucleic acids in a
`
`library may be non-coding (e.g., neither RNA nor polypeptide encoding), and the library may
`
`be used to identify one or more nucleic acids with one or more regulatory and/or structural
`
`20
`
`properties of interest (e. g.,‘one or more promoter, enhancer, response, silencer, binding,
`
`conformational, or other property of interest, or any cOmbination thereof).
`
`Accordingly, aspects of the invention relate to assembling libraries that are
`
`representative of a plurality of predetermined nucleic acid and/or polypeptide sequences of
`
`interest. A library assembly reaction may include a polymerase and/or a ligase mediated
`
`reaction. In some embodiments the assembly reaction involves two or more cycles of
`
`, denaturing, annealing, and extension conditions. In some embodiments, assembled library
`
`nucleic acids may be amplified, sequenced or cloned. In some embodiments, a host cell may
`
`be transformed with the assembled library nucleic acids. Library nucleic acids may be
`
`integrated into the genome of the host cell. In some embodiments, the library nucleic acids
`
`may be expressed, for example, under the control of a promoter (e.g., an inducible promoter).
`
`Individual variant clones may be isolated from a library. Nucleic acids and/or polypeptides
`
`of interest may be isolated or purified. A cell preparation transformed with a nucleic acid
`
`library, or an isolated nucleic acid of interest, may be stored, shipped, and/or propagated
`
`25
`
`30
`
`(e.g., grown in culture).
`
`
`
`WO 2008/045380
`
`.
`
`10
`
`PCT/U82007/021488
`
`In another aspect, the invention provides methods of obtaining nucleic acid libraries
`
`by sending sequence information and delivery information to a remote site. The sequence
`
`information may be analyzed at the remote site. Starting nucleic acids may be designed
`
`and/or produced at the remote site. The starting nucleic acids may be assembled in a process
`
`that generates the desired sequence variation at the remote site. In some embodiments, the
`
`starting nucleic acids, an intermediate product in the assembly reaction, and/or the assembled
`
`nucleic acid library may be shipped to the delivery address that was provided.
`
`Other aspects of the invention provide systems for designing starting nucleic acids
`
`and/or for assembling the starting nucleic acids to make a target library. Other aspects of the
`
`invention relate to methods and devices for automating a multiplex oligonucleotide assembly
`
`reaction (e. g., using a microfluidic device, a robotic liquid handling device, or a combination
`
`thereof) to generate a library of interest. Further aspects of the invention relate to business
`
`methods of marketing one or more strategies, protocols, systems, and/or automated
`
`procedures that are associated with a high-density nucleic acid library assembly. Yet further
`
`aspects of the invention relate to business methods of marketing one or more libraries.
`
`Other features and advantages of the invention will be apparent from the following
`
`detailed description, and from the claims. The claims provided below are hereby
`
`incorporated into this section by reference.
`
`Brief Description of the Figures
`
`FIG. 1 illustrates anon-limiting embodiment of a strategy for designing and
`
`assembling a precise high—density nucleic acid library;
`
`FIG. 2 illustrates a non-limiting embodiment of a method for designing assembly
`
`nucleic acids and an assembly strategy for a precise high-density nucleic acid library;
`
`FIG. 3 illustrates non-limiting embodiments of assembly techniques in panels A-D;
`
`FIG. 4 illustrates anon-limiting embodiment of an assembly technique for producing
`
`a pool of predetermined nucleic acid sequence variants;
`
`FIG. 5 illustrates non-limiting embodiments of hairpin oligonucleotide designs in
`
`panels A-D;
`
`FIG. 6 illustrates non-limiting embodiments dumbbell oligonucleotide designs in
`
`10
`
`15
`
`20
`
`25
`
`30
`
`panels A-B;
`
`FIG. 7 illustrates non-limiting embodiments of hairpin oligonucleotide designs in
`
`panels A—D;
`
`.
`
`FIG. 8 illustrates non-limiting embodiments ofassembly techniques inpanel A-B;
`
`
`
`WO 2008/045380
`
`PCT/U82007/021488
`
`1 1
`
`FIG. 9 illustrates a non-limiting embodiment of a silent mutation scanning strategy;
`
`and,
`
`FIG. 10 illustrates a non-limiting embodiment of a method for selecti