`
`i'iLEiE§5T§iIs“,
`
`!.'f_3{>:§
`
`5*»-*'DI<»Ii?,".’
`
`}3F;§3Ii‘~3
`
`‘reign priority/clalrnoo
`use" us cohdltlon: mu
`
`.
`
`-
`
`‘
`
`‘
`
`A'r1'oRNEv's
`°°°"E"' ”°'
`
`U5. DEPT. of COMM.-fit. I ‘I'M OlIlu— E!'I"O-GJBL Ircv. 10-1!)
`
`PARTS OF APPLICATION
`I FILED SEPARATELY
`_
`I once or-' ALLOWANCE MAILED
`PREPARED F0 sure
`é
`7
`‘W 96 M’ II’/‘my
`1
`"
`Ia _ n xamine
`:
`=
`'55“ FEE
`.
`”"TE
`/'
`'
`_ Sheets Drwg.
`F|gs.Drwg.
`2.).
`TL I
`
`.
`
`.4
`
`I
`
`7
`
`-«
`
`prohibited by the Unltod Slams Coda Tl
`Poasosslon omslde the U.S. Patent 8: Tr
`and contractors only.
`
`
`
`
`
`‘AFPROVED FOR LICENSE El
`AEPTH 99.33.38
`INITIALS
`
`
`
`1. AppIica§_ipn--~“ "‘
`
`.
`
`papers.
`
`Received
`or
`Mailed
`
`‘
`
` ,3
` AAQJ‘.
`3.__
`" J
`I,
`,
`_
`v"14.
`
`
`
`Ei&Q’1sf I
`j+16
`A -<91
`
`
`
`
`
`12.‘
`
`
`
`can
`
`
`13-//I/F 5’
`
`
`«//—f2=’
`7 '(
`I
`(N5 5-2°/-Z5?
`).51;/€55
`-
`IF‘
`
`
`
`~
`
`no
`'75
`
`I
`
`
`
`
`
`
`/I
`
`-
`
`
`
`13.
`
`I
`
`I
`..,»H..’_
`
`
`
`
`
`
`
`
`-JI9.
`__.
`_. _.__..m.
`._
`.1
`0_._. Ant] _
`I I I
`
`
` ...eo M»
`
`
`,
`'
`_
`
`' W22.
` 5'19!‘
`
`
`I
`I (11%-Z591?
`" 30.‘ 5 4
`
`
`
`
`
`
`
`
`
`
`
`;.,;.‘
`
`
`
`
`
`5?» 7'44 _/2.;__;mn<A«'
`
`'3;o,~3fr, am, 3674;
`4"?‘/7 W2
`
`\
`
`
`
`'
`
`/‘
`
`g;
`
`
`
`INTERFERENCE SEARCHED
`T
`3.Iz6/4;;
`
`
`
`
`
`POSITION
`
`CORPS CORR.
`
`9
`
`rSmMumMSLMCSF.m0Wmsm
`
`
`
`HlllllllllllllllIlllllllllllllllllllll||llllll|lllll||ilI|»|ll|l|l|+l||l
`US005573905A
`
`1
`
`United States Patent
`5,573,905
`[11] Patent Number:
`Nov. 12, 1996
`‘[451 Date or Patent:
`Lerner et al.
`________________,__,_._,,§___1,___,_______
`
`[191
`
`Mneji, et al., “Simultaneous Multiple Synthesis of Peptide-
`-Carrier Conjugates,” J. Immunal. Met, 146: 83-90 (1992).
`Fodor, et al.. “Light—Directed, Spatially Addressable Paral-
`lel Chemical Synthesis,” Science, 251: 767-775 (1991).
`Lam, et £11., "A New Type of Synthetic Peptide Library for
`Identifying Ligand-Binding Activity.” Nature, 354: 82-84
`(1991).
`
`I-Ioughton, eta1.. “Generation and Use of Synthetic Peptide
`Combinatorial Libraries for Basic Research and Drug Dis-
`covery," Nature, 354: 84-86 (1991).
`'
`Cwirla, et al.. “Peptides on Phage: A Vast Library ofPeptides
`for ldenifying Ligands,” Proc. Natl. Acad. Sci. U.S./1., 87:
`6378—6382__(1990).
`Scott, et al., “Searching for Peptide Ligands with an Epitope
`Library," Science, 249: 386-390 (1990).
`Devlin, et al., “Random Peptide Libraries: A Source of
`Specific Protein Binding Molecules," Science, 249: 404-406
`(1990).
`
`Primary Examt'ner—W. Gary Jones
`Assistant Examiner-Eggerton Campbell
`Attorney, Agent, or Fimr—-«Donald G. Lewis
`
`[57]
`
`ABSTRACT
`
`The present invention describes an encoded combinatorial
`chemical library comprised of a plurality of bifunctional
`molecules having both a. chemical polymer and an identifier
`oligonucleotide sequence that defines the structure of the
`chemical polymer. Also described are the hifunctional mol-
`ecules of the library, and methods of using the library to
`identify chemical structures within the library that bind to
`biologically active molecules in preselected binding inter-
`actions.
`
`5 Claims, 2 Drawing Sheets
`
`[54] ENCODED COMBINATORIAL CHEMICAL
`LIBRARIES
`
`[75]
`
`Inventors: Richard Lerner, La Jolla; Kim Janda,
`San Diego, both of Calif.; Sydney
`Brenner. Cambridge, England
`
`[73] Assignee: The Scrippskesearch Institute, La
`lolla. Calif.
`
`[211 App]. No.: 350,445
`
`[22] Filed:
`
`Mar. 30,1992
`
`InL Cl.‘
`[51]
`.. c12Q 1l68;G01N 33/53
`
`[52] U.S. Cl. .......
`.. ............ 435/6; 43517.94
`[58] Field of Search .................... 435/6, 7.94; 935/77,
`935/78; 530/350; 436/501
`
`[56]
`
`References Cited
`‘ U.S. PATENT DOCUMENTS
`
`5/1988 Damgupm et a1.
`4,748,111
`4,965,188 10/1990 Mullis et a].
`5,082,780
`l/1992 Warren et :1
`5,141,813
`8/1992 Nelson
`OTHER PUBLICATIONS
`
`
`
`. 435/7
`. 435/6
`435/191
`42!/402
`
`Devlin. et al.. Science, 249: 404-406 (1990).
`Nelson, et al., Nucleic Acids Research. 17: 7179-7195
`(1989).
`Nelson, et al., Nucleic Acids Research, 20: 6253-6259
`(1992).
`
`"Sales Literatuare" from Cloutech Laboratories, Inc., p. 3
`(1994).
`Geysen, et al., "Use of Peptide Synthesis of Probe Viral
`Antigens for Epitopcs to a Resolution of a Single Amino
`Acid," Proc. Natl. Acad. Sci. U.S.A., 81: 3993-4002 (1984).
`
`
`
`U.S. Patent
`
`Nov. 12, 1996
`
`Sheet .1 of2
`
`5,573,905
`
`Apa I
`Sty I
`]GGGCCCTATTCTTAG 3'
`5‘ AGCTACTTCccaAGG[coding sequence
`3' TCGATGAAGfi§1£Qg[anticoding strand]£QQ§fl§ATAAGAATc 5'
`
`gtep 1 Cleavage by
`Sty I & Apa I
`
`5' AGCTACTTCC
`3' TCGATGAAGEQIQQ
`
`]GGGCC
`cAAGG[coding sequence
`g[anticodinq strand]Q
`
`GTATTCTTAG 3'
`CCQGQATAAGAATC 5'
`
`J’§fi§p_2 Biotinylation
`
`JGGGCC
`cAAGG[coding sequence
`5' AGCTACTTCCC
`3' TCGATGAAGQQEIQ Bncgtanticoding strand];
`
`CTATTCTTAG 3'
`QQQQQATAAGAATC 5'
`
`FIGURE 1
`
`
`
`U.S. Patent
`
`Nov. 12, 1996
`
`Sheet 2 of 2
`
`5,573,5.
`
`P1 - LINK
`
`Step 1
`‘L
`CACATG-P12-LINK-gly
`
`ACGGTA-P1-LINK-met
`
`\L
`
`Step 2
`
`CACATGCACATG—P1-LINK—g1y.gly
`CACATGACGGTA-P1~LINK-met.g1y
`ACGGTACACATG-1751--LINK-gly . met
`ACGGTAACGGTA-P1-LINK—met . met
`
`Step 3
`
`P2CACATGCACATGCACATGP1—LINK-gly.gly.g1y
`P2CACATGCACATGACGGTAP1-LINK—met.gly.gly
`22 CACATGACGGTACACATGP1-LINK-g 1y . met . gly
`P2CACATGACGGTAACGGTAP1~LINK—met.met.g1y
`
`PZACGGTACACATGCACATGP1-LINK~g1y.gly.met
`PZACGGTACACATGACGGTAP1-LINK-met.g1y.met
`PZACGGTAACGGTACACATGP1-LINK-gly.met.met
`PZACGGTAACGGTAACGGTAP1-LINK-met.met.met
`
`P1
`P2
`
`GGGCCCTATTCTTAG
`AGCTACTTCCCAAGG
`
`FIGURE 2
`
`
`
`5,573,905
`
`. 2
`
`1
`ENCODED COMIBINATORIAL CHEMICAL
`LIBRARIES
`
`TECHNICAL FIELD
`
`The present invention relates to encoded chemical librar-
`ies that contain repertoires of chemical structures defining a
`diversity of biological structures, and methods for using the
`libraries.
`
`BACKGROUND
`
`15
`
`25
`
`There is an increasing need to find new molecule which
`can efi"ect:ively modulate a wide range of biological pro-
`cesses, for applications in medicine and agriculture. A
`standard way for searching for novel bioactive chemicals is
`to screen collections of natural materials. such as fermen-
`tation broths or plant extracts, or libraries of synthesized
`molecules using assays which can range in complexity from
`simple binding reactions to elaborate physiological prepa-"
`rations. The screens often only provide leads which then
`require further improvement either by empirical methods or
`by chemical design. The process it time-consuming and
`costly but it is unlikely to be totally replaced by rational
`methods even when they are based on detailed knowledge of
`the chemical structure of the target molecules. Thus, what
`we might call “irrational drug dcsign”—the process of
`selecting the right molecules front large ensembles or rep-
`ertoir_es——requires continual improvement both in the gen-
`eration of repertoires and in the methods of selection.
`Recently there have been several developments in using
`peptides or nucleotides to provide libraries of compounds
`for lead discovery. The methods were originally developed
`to speed up the determination of epitopes recognized by
`monoclonal antibodies. For example,
`the standard serial
`process of stepwise search of synthetic peptides now encom-
`passes a variety of highly sophisticated methods in which
`large arrays of peptides are synthesized in parallel and
`screened with acceptor molecules labelled with fluorescent
`or other reporter groups. The sequence of any eifective
`peptide can be decoded from its address in the array. See for
`example Geysen et
`al., Proc. Natl.Acod. Sci. USA.
`813998-4002 (1984); Maeji
`et
`a1.,
`J.Immunal.Met.,
`146:83-90 (1992); and Fodor ct aJ., Science, 25]: 767-775
`(1991).
`-
`
`searching a dictionary; the peptide is decoded by construc-
`tion using a series of ieves or buckets and this makes the
`search logarithmic.
`
`A very powerful biological method has recently been
`described in which the library ofpeptides is presented on the
`surface of a bacteriophage such that each phage has an
`individual peptide and contains the DNA sequence specify-
`ing it. The library is made by synthesizing a repertoire of
`random oligonucleotides to generate all combinations, fol-
`lowed by their insertion into a phage vector. Each of the
`sequences is cloned in one phage and the relevant peptide
`can be selected by finding those that bind to the particular
`target. The phages recovered in this way can be amplified
`and the selection repeated. The sequence of the peptide is
`decoded by sequencing the DNA. See for example Cwirla et
`al., Pmc. NarI.Acad. Scr'.USA, 87:6378—6382 (1990); Scott
`et al., Science, 249386-390 (1990); and Devlin et al.,
`Science, 249:404—406 (1990).
`Another "genetic" method has been described where the
`libraries are the synthetic oligonucleotides
`themselves
`wherein active oligonuclectide molecules are selected by
`binding to an acceptor and are then amplified by the poly-
`merase chain reaction (PCR). PCR allows serial enrichment
`and the structure of the active molecules is then decoded by
`DNA sequencing on clones generated front the PCR prod-
`ucts. The repertoire is limited to nucleotides and the natural
`pyrimidine and purine bases or those modifications that
`preserve specific Watson-Crick pairing and can be copied by
`polymerases.
`The main advantages of the genetic methods reside in the
`capacity for cloning and amplification of DNA sequences,
`which allows enrichment by serial selection and provides a
`facile method for decoding the structure of active molecules.
`However, the genetic repertoires are restricted to nucleotides
`and peptides composed of natural amino acids and a more
`extensive chemical repertoire is required to populate the
`entire universe of binding sites. In contrast, chemical meth-
`ods can provide limitless repertoires but
`they lack the
`capacity for serial enrichment and there are difiiculties in
`discovering the structures of selected active molecules.
`
`BRIEF SUMMARY OF THE INVENTION
`
`The present invention provides a way of combining the
`virtues of both of the chemical and genetic methods sum-
`marized above through the construction of encoded combi-
`natorial chemical libraries, in which each chemical sequence
`is labelled by an appended “genetic" tag, itself constructed
`by chemical synthesis, to provide a “retrogenetic" way of
`specifying each chemical structure.
`
`55
`
`65
`
`In outline, two alternating parallel combinatorial synthe-
`ses are performed so that the genetic tag is chemically linked
`to the chemical structure being synthesized; in each case, the
`addition of one of the particular chemical units to the
`structure is followed by the addition of an oligonucleotide
`sequence, which is defined to "code” for that chemical unit,
`ie.,
`to function as an identifier for the structure of the
`chemical unit. The library is built up by the repetition of this
`process after pooling and division.
`Active _molecules are selected from the library so pro-
`duced by binding to a preselected biological molecule of
`interest. Thereafter, the identity of the active molecule is
`determined by reading the genetic tag, i.e., the identifier
`oligonucleotidc sequence. In one embodiment, amplified
`copies of their retrogenetic tags can be obtained by the
`polymerase chain reaction.
`
`In another approach, Lam ct. al., Nature, 354:82—84
`' (1991) describes combinatorial libraries ofpeptides that are
`synthesized on resin beads such that each resin bead con-
`tains about 20 pmoles of the same peptide. The beads are
`screened with labelled acceptor molecules and those with
`bound acceptor are searched for by visual inspection, physi-
`cally removed, and the peptide identified by direct sequence
`analysis. In principle, this method could be used with other
`chemical entities but
`it requires sensitive methods for
`' sequence determination.
`A diiferent method of solving the problem of identifica-
`tion in a combinatorial peptide library is used by I-Ioughten
`et al., Nature, 354:84—86 (1991). For hexapeptides of the 20
`natural amino acids, 400 separate libraries are synthesized,
`each with the first two amino acids fixed and the remaining
`four positions occupied by all possible combinations. An
`assay, based on competition for binding or other activity, is
`then used to find the library with an active peptide. Then
`twenty new libraries are synthesized and assayed to deter-
`mine the eifective amino acid in the third position, and the
`process is reiterated in this fashion“ until the active hexapep-
`tide is defined. This is analogous. to the method used in
`
`
`
`The strands of the amplified copies with the appropriate
`polarity can then be used to enrich for a subset of the library
`by hybridization with the matching tags and the process can
`then be repeated on this subset. Thus serial enrichment is
`achieved by a process of purification exploiting linkage to a
`nucleotide sequence which can be amplified. Finally, the
`structure ofthe chemical entities are decoded by cloning and
`sequencing the products of the PCR reaction.
`The present invention therefore provides a novel method
`for identifying a chemical structure having a preselected
`binding activity through the use of a library of bifunctional
`molecules that provides a rich source of chemical diversity.
`The library is used to identify chemical structures (structural
`motifs) that interact with preselected biological molecules.
`Thus, in one embodiment, the invention contemplates a
`bifunctional molecule according to the formula A-B-C,
`where A is a chemical moiety, B is a linker molecule
`operatively linked to A and C, and C is an identifier
`oligonucleotide comprising a sequence of nucleotides that
`identifies the structure of chemical moiety A.
`In another embodiment, the invention contemplates a
`library comprising a plurality of species of bifunctional
`molecules. thereby forming a repertoire of chemical diver-
`sity.
`
`Another embodiment contemplates a method for identi-
`fying a chemical structure that participates in a preselected
`binding interaction with a biologically active molecule,
`where the chemical structure is present in the library of
`bifunctional molecules according to this invention. The 39
`method comprises the steps of:
`a) admixing in solution the library of bifunctional mol-
`ecules with the biologically active molecule under
`binding conditions for a time period sufiicient to form
`a binding reaction complex;
`b) isolating the complex formed in step (a); and
`c) detennining the nucleotide sequence of the polymer
`identifier oligonucleotide in the isolated complex and
`thereby identifying the chemical suucture that partici-
`pated in the preselected binding interaction.
`The invention also contemplates a method for preparing a
`library according to this invention comprising the steps of:
`a.) providing a linker molecule B having tennini A‘ and C‘
`according to the formula A’-B-C‘ that is adapted for
`reaction with a chemical precursor unit X’ at termini A‘
`and with a nucleotide precursor 2‘ at termini C’;
`b) conducting syntheses by adding chemical precursor
`unit X‘ to termini A’ of said linker and adding precursor
`unit identifier oligonucleotide Z’ to termini C‘ of said
`linker, to form a composition containing bifunctional
`molecules having the structure X,,——B—Z,,;
`c) repeating step (b) on one or more aliquots of the
`composition to produce aliquots that contain a product
`containing a bifunctional molecule;
`d) combining the aliquots produced in step (c) to form an
`admixture of bifuuctiortal molecules, thereby forming
`said library.
`
`45
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`In the drawings, forming a portion of this disclosure:
`FIG. 1 illustrates a scheme for the restriction endonu-
`clease cleavage of a PCR amplification product derived front
`a bifunctional molecule of this invention (Step 1), and the 65
`subsequent addition of biotin to the cleaved PCR product
`(Step 2). The unique coding and non-coding nucleotide base
`
`3
`
`5,573,905
`
`4
`sequences shown in FIG. 1 are listed in
`Listing, SEQ ID NOs l5~22.
`
`the Sequence
`
`5
`
`FIG. 2 illustrates the process of producing a library of
`bifimctional molecules according to the method described in
`Example 9. The nucleotide base sequences shown in FIG. 1
`are listed in the Sequence Listing, SEQ ID NOs 15~22.
`
`DETAILED DESCRIPTION OF THE INVENTION
`
`A. Encoded Combinatorial Chemical Libraries
`An encoded combinatorial chemical library is a compo-
`sition comprising a plurality of species of bifunctional
`molecules that each define a dilferent chemical structure and
`that each contain a unique identifier oligonucleotide whose
`nucleotide sequence defines the corresponding chemical
`structure.
`1. Bifunctioual Molecules
`A bifunctional molecule is the basic unit in a library of
`this invention, and combines the elements of a polymer
`comprised of a series of chemical building blocks to form a
`chemical moiety in the library, and a code for identifying the
`structure of the chemical moiety.
`Thus. a bifunctional molecule can be represented by the
`formula A-B-C, where A is a chemical moiety, B is a linker
`molecule operatively linked to A and C, and C is an identifier
`oligonucleotide comprising a sequence of nucleotides that
`identifies the structure of chemical moiety A.
`a. Chemical Polymers
`A chemical moiety in a bifunctional molecule of this
`invention is represented by A in the above formula A-B-C
`and is a polymer comprising a linear series of chemical units
`represented by the formula (X,,),,, wherein X is a single
`chemical unit in polymer A and n is a position identifier for
`X in polymer A. n has the value of l+i where i is an integer
`from 0 to 10, such that when n is l, X is located most
`proximal to the linker (B).
`Although the length of the polymer can vary, defined by
`a, practical library size limitations arise if there is a large
`alphabet size as discussed further herein. Typically, a is an
`integer from 4 to 50.
`A chemical moiety (polymer A) can be any of a variety of
`polymeric structures, depending on the choice of classes of
`chemical diversity to be represented in a library of this
`invention. Polymer A can be any monomeric chemical unit
`that can be coupled and extended in polymeric form. For
`example, polymer A can be a polypeptide, oligosaccharide,
`glycolipid, lipid. protcoglycan, glycopeptide, sulfonamide,
`nucleoprotein, conjugated peptide (i.e., having prosthetic
`groups), polymer containing enzyme substrates, including
`transition state analogues, and the like biochemical poly-
`mers. Exemplary is the polypeptide-based library described
`herein.
`
`Where the library is comprised of peptide polymers, the
`chemical unit X can be selected to form a region of a natural
`protein or can be a non-namr-al polypeptide, can be com-
`prised of natural D-amino acids. or can be comprised of
`non-natural amino acids or rrtixtures of natural and non-
`natural amino acids. The non-natural combinations provide
`for the identification of useful and unique structural motifs
`involved in biological interactions.
`Non-natural antino acids include modified amino acids
`and L-amino acids, stereoisomer of D-amino acids.
`The amino acid residues described herein are preferred to
`be in the “L" isomeric form. NI-I2 refers to the free amino
`group present at
`the amino temtinus of a polypeptide.
`COOH refers to the free carboxy group present at
`the
`carboxy terminus of a polypeptide. In keeping with standard
`
`
`
`5
`
`5,573,905
`
`6
`
`l301l'P°Ptide nomenclature. J. Biol. Chem, 243:3552—59
`(1969) and adopted at 37 C.ER. §1.s22(s) (2)), abbrevia-
`tions for amino acid residues are shown in the following
`Table of Correspondence:
` ,__j:_Z_
`TABLE OF CORRESPONDENCE
`SYMBOL
`
`and the unit identifier oligonucleotide Z at position n.
`The length of a unit identifier oligonucleotide can vary
`depending on the complexity of the library, the number of
`different chemical units to be uniquely identified, and other
`considerations relating to requirements for uniqueness of
`oligonucleotides such as hybridization and polymerase
`chain reaction fidelity. A typical length can be from about 2
`to about 10 nucleotides, although nothing is to preclude a
`unit identifier from being longer.
`Insofar as adenosine (A), guanosine (G), thymidine (T)
`and cytosine (C) represent the typical choices of nucleotides
`for inclusion in a unit identifier oligonucleotide, A, G, T and
`C form a representative “alphabet” used to “spell" out a unit
`idtifier oligonucleotide's sequence. Other nucleotides or
`nucleotide analogs can be utilized in addition to or in place
`of the above four nucleotides, so long as they have the ability
`to form Watson-Crick pairs and be replicated by DNA
`polymerases in a PCR reaction. However, the nucleotides A,
`G, T and C are preferred.
`For the design of the code in the identifier oligonucle-
`otidc, it -is essential to chose a coding repreentation such
`that no significant part of the oligonucleotide sequence can
`occur in another unrelated combination by chance or other-
`wise during the manipulations of a bifunctional molecule in
`the library.
`For example, consider a library where Z is a trinucleotide
`whose sequence defines a unique chemical unit X. Because
`the methods of this invention provide for all combinations
`and permutations of an alphabet of chemical units,
`it is
`possible for two dilferent unit
`identifier oligonucleotide
`sequences to have closely related sequences that differ by
`only a frame shifi and therefore are not easily distinguish-
`able by hybridization or sequencing unless the frame is clear.
`Other sources of misreading of a unit identifier oliga-
`nucleotide can arise. For example, mismatch in DNA
`hybridization, transcription errors during a primer extension
`reaction nrarnplify or sequence the identifier oligonucle
`otide, and the like errors can occur during a manipulation of
`a bifunctional molecule.
`_
`The invention contemplates a variety of means to reduce
`the possibility of error in reading the identifier oligonucle-
`otide, such as to use longer nucleotide lengths for a unit
`identifier nucleotide sequence as to reduce the similarity
`between unit
`identifier nucleotide sequences. Typical
`lengths depend on the size of the alphabet of chemical units.
`A representative system useful for eliminating read errors
`due to frame shift or mutation is a code developed as a
`theoretical alternative to the genetic code and is known as
`the cornmaless genetic code.
`Where the chemical units are amino acids. a convenient
`unit identifier nucleotide sequence is the well known genetic
`code using triplet cations. The invention need not be limited
`by the translation alforded between the uiplet codon of the
`genetic code and the natural amino acids; other systems of
`correspondence can be assigned.
`identifier nucleotide
`A typical and exemplary unit
`sequence is based on the commaless code having a length of
`six nucleotides (hexanucleotlde) per chemical unit.
`Preferably, an identifier oligonucleotide has at least 15
`nucleotides in the tag (coding) region for elfective hybrid-
`ization. In addition, considerations of the complexity of the
`libraty. the size of the alphabet of chemical units, and the
`length of the polymer length of the chemical moiety all
`contribute to length of the identifier oligouucleotide as
`discussed in more detail herein.
`In a preferred embodiment, an identifier oligonucleotide
`C has a nucleotide sequence according to the formula
`
`‘
`
`1-Letter
`
`3-letter
`
`AMINO ACID
`
` Y
`
`tyrosine
`Ty:
`glycine
`Gly
`G
`phenylalanine
`Fhe
`F
`methionine
`Met
`M
`alanine
`Ala
`A
`scrim
`Set
`3
`isoleuclne
`lle
`I
`leucine
`I.eu
`L
`tlueonine
`Thr
`T
`vallne
`Val
`V
`prnline
`Pm
`P
`lysine
`Lys
`K
`hiatldine
`His
`H
`glnlamlne
`Gln
`Q
`glutarnic ac-ld
`Glu
`B
`WPWPM1
`'1l‘l!
`W
`arglnine
`Arg
`R
`sspartic acid
`Asp
`D
`sspsraginc
`Am
`N
`
`
`CysC cysteine.
`
`30
`
`‘The phrase “amino acid residue" is broadly defined to
`include the amino acids listed in the Thble of Correspon-
`dence and modified and unusual amino acids, such as those
`listed in 37 C.F.R. §l.822(b) (4), and incorporated herein by
`reference.
`The polymer defined by chemical moiety A can therefor
`contain any polymer backbone modifications that provide
`increased chemical diversity. In building of a polypeptide
`system as exemplary, a variety of modifications are contem-
`plated,
`including the
`following backbone
`structures:
`——NHN(R)C0—, —-Nl-IB(R)CO—. —NHC(RR')C0—-,
`—-NHC(=Cl-lR)C0—-,
`—NHCgH..CO—.
`—NHCH¢Cl~IRCO—, —Nl-lCl-lRC}l,C0-—, and lactam
`structures.
`
`In addition, amide bond modifications are contemplated
`including --COCH,--—. —COS—, —CONR, -~COO—.
`——CSNH—, —CH,NH-—,
`. »_CH,CH,-—-. —CH,S-—,
`—CH,S0——,
`—CH,SO,,—,
`'
`-—CH[Cl-l3)S-—,
`-—Cl-l-—-CH—,
`—Nl-ICO-~,
`—Nl-lCONl-l—,
`—CONl-I0-. and ——C(m‘CI-I)Cl-l,—.
`'
`b. Polymer Identifier Oligonucleotide
`An identifier oligonucleotide in a bifunctional molecule of
`this invention is repressed by C in the above formula
`A-B-C and is an oligonucleotide having a sequence repre-
`sented by the formula (Z,,),,'wherein Z is a unit identifier
`nucleotide sequence within oligonucleotide C that identifies
`the chemical unit X at position n.. n has the value of l+i
`where i is an integer from 0 to 10. such that when n is 1, Z
`is located most proximal to the linker (B). a is an integer as
`described previously to connote the number of chemical unit
`identifiers in the oligonucleotide.
`For example. a bifunctional molecule can be represented
`by the formula:
`
`50
`
`x4xaxax1‘B'z1z:z5z4-
`
`In this example. the sequence of oligonucleotides 2,, Z2, Z3
`and Z. identifies the structure of chemical units X,. X2. X3
`and X4, respectively. Thus, there is a correspondence in the
`identifier sequence between a chemical unit X at position :1
`
`
`
`5,573,905
`
`7
`Pl-(Z, .-P2, where P] and P2 are nucleotide sequences that
`provide polymerase chain reaction (PCR) primer binding
`sites adapted to amplify the polymer identifier oligonucle-
`otide. The requirements for PCR primer binding sites are
`generally well known in the art. but are designed to allow a
`PCR amplification product (a PCR-amplified duplex DNA
`fragment) to be formed that contains the polymer identifier
`oligonucleotide sequences.
`The presence of the two PCR primer binding sites, P1 and
`P2, flattlcing the identifier oligonucleotide sequence (Zn),
`provides a means to produce a PCR-amplified duplex DNA
`fragment derived from the bifunctional molecule using PCR.
`This design is useful to allow the amplification of the tag
`sequence present on a particular bifunctional molecule for
`cloning and sequencing purposes in the process of reading
`the identifier code to determine the structure of the chemical
`moiety in the bifunctional molecule.
`More preferred is a bifunctional molecule where one or
`both of the nucleotide sequences P1 and P2 are designed to
`contain a means for removing the PCR primer binding sites
`from the identifier oligonucleotide sequences. Removal of
`the flanldng P1 and P2 sequences is desirable so that their
`sequences do not contribute to a subsequent hybridization
`reaction. Preferred means for removing the PCR primer
`binding sites from a PCR amplification product is in the
`tom of a restriction endonuclease site within the PCR-
`arnplified duplex DNA fragment
`Restriction endonucleases are well known in the art and
`are enzymes that recognize specific lengths of duplex DNA
`and cleave the DNA in a sequence-specific manner.
`Preferably. the restriction endonuclease sites should be
`positioned proximal to (2,), relative to the PCR primer
`binding sites to maximize the amount of PI and P2 that is
`removed upon treating a bifunctional molecule to the spe-
`cific restriction endonuclesse. More preferably, P] and P2
`each are adapted to form a restriction endonuclease site in
`the resulting PCR-arnplified duplex DNA. and the two
`restriction sites, when cleaved by the restriction endonu-
`clease, form non-overlapping cohesive termini to facilitate
`subsequent manipulations.
`.
`Particularly preferred are restriction sites that when
`cleaved provide overhanging termini adapted for termini-
`specific modifications such as incorporation of a biotinylated
`nucleotide (e.g., biotinyl deoxy-UTP) to facilitate subse-
`quent manipulations.
`The above described preferred embodiments in an iden-
`tifier oligonucleotide are summarized in a specific embodi-
`ment shown in FIG. 1.
`In FIG. 1, a PCR-amplified duplex DNA is shown that is
`derived from anidentiiier oligonucleotide described in the
`Examples. The (Z,,) sequence is illustrated in the brackets as
`the coding sequence and its complementary strand of the
`duplex is indicated in the brackets as the anticoding strand.
`The P1 and P2 sequences are shown in detail with a Sty I
`restriction endonuclease site defined by the Pl sequence
`located 5' to the bracket and an Apy I restriction endonu-
`clease site defined by the P2 sequence located 3' to the
`bracket.
`
`illustrates the cleavage of the PCR-amplified
`Step 1
`duplex DNA by the enzymes Sty I and Apa I to form a
`modified identifier sequence with cohesive termini. Step 2
`illustrates the specific biotinylation of the anticoding strand
`at the Sty I site, whereby the incorporation of biotinylated
`UTP is indicated by a B.
`The presence of non-overlapping cohesive termini alter
`Step I in FIG. 1 allows the specific and directional cloning
`of the restriction-digested PCR-amplified fragment into an
`
`35
`
`5
`
`ID
`
`20
`
`'
`
`8
`appropriate vector, such as a sequencing vector. In addition,
`the Sty I was designed into Pl because the resulting over-
`hang is a substrate for a filling-in reaction with dCTP and
`biotinyl-dUTP (BTP) using DNA polymerase Klenow frag-
`ment. The other restriction site, Apa I, was selected to not
`provide substrate for the above biotinylation, so that only the
`anticoding strand can be biotinylated.
`Once biotinylated, the duplex fragment can be bound to
`immobilized avidin and the duplex can be denatured to
`release the coding sequence containing the identifier nucle-
`otide sequence, thereby providing purified anticoding strand
`that is useful as a hybridization reagent for selection of
`related coding strands as described further herein.
`c. Linker Molecules
`A linker molecule in a bifunctional molecule of this
`invention is represented by B in the above fonnula A-B-C
`and can be any molecule that performs the function of
`operatively linking the chemical moiety to the identifier
`oligonucleotide.
`Preferably, a linker molecule has a means for attaching to
`a solid support. thereby facilitating synthesis of the bifunc-
`tional molecule in the solid phase. In addition, attachment to
`a solid support provides cenain features in practicing the
`screening methods with a library of bifunctional molecules .
`as described herein. Particularly preferred are linker mol-
`ecules in which the means for attaching to a solid support is
`reversible, namely, that the linker can be separated from the
`solid support.
`A linker molecule can vary in structure and length, and
`provide at least two features: (1) operative linkage to chemi-
`cal moiety A, and (2) operative linkage to identifier oligo-
`nucleotide C. As the nature of chemical linkages is diverse,
`any of a variety of chemistries may be utilized to effect the
`indicated operative linkages to A and to C, as the nature of
`the linkage is not considered an essential feature of this
`invention. The size of the linker in terms of the length
`betwecnA and C can vary widely, but for the purposes of the
`invention, need not exceed a length sufiicient to provide the
`linkage functions indicated. Thus, a chain length of from at
`least one to about 20 atoms is preferred.
`A preferred linker molecule is described in Example 3
`herein that contains the added, preferred, element of a
`reversible means for attachment to a solid support. That is,
`the bifunctional molecule is removable from the solid sup-
`port after synthesis.
`Solid suppons for chemical synthesis are generally well
`known. Particularly preferred are the synthetic resins used in
`oligonucleotide and in polypeptide synthesis that are avail-
`able from a variety of commercial sources including Glen
`Research (Hemdon, Va), Bachem Biosciences, (Philadel-
`phia, Pa). and Applied Biosystems (Foster City, Calif.).
`Most preferred are tefion supports such as that described in
`Example 2.
`2. Libraries
`A library of this invention is a repertoire of chemical
`diversity comprising a plurality of species of bifunctional
`molecules according to the present invention. The plurality
`of species in a library defines a family of chemical diversity
`whose species each have a different chemical moiety. Thus
`the library can define a family of peptides. lipids, oligosac-
`charides or any of the other classes of chemical polymers
`recited previously.
`The number of diflerent species in a library represents the
`complexity of a library and is defined by the polymer length
`of the chemical moiety, and by the size of the chemical unit
`alphabet that can be used to build the chemical unit polymer.
`The number of difierent species referred to by the phrase
`
`
`
`“plurality of species” in a library can be defined by the
`formula V‘, i.e., V to power of a (exponent a). V represents
`the alphabet size. i.e., the number of different chemical units
`X available for use in the chemical moiety. “a” is an
`exponent to V and represents the number of chemical units
`of X formin