throbber
Proc. Natl. Acad. Sci. USA
`Vol. 89. PP. 5381-5383, June 1992
`Chemistry
`
`Encoded combinatorial chemistry
`(chemical repertoire/encoded libraries/commaless code)
`
`SYDNEY BRENNER AND RICHARD A. LERNER
`
`Departments of Chemistry and Molecular Biology, The Scripps Research Institute, 10666 North Torrey Pines. La Jolla, CA 92037
`
`Contributed by Sydney Brenner, March 3, 1992
`
`The diversity of chemical synthesis and the
`ABSTRACT
`power of genetics are linked to provide a powerful, versatile
`method for drug screening. A process of alternating parallel
`combinatorial synthesis is used to encode individual members
`of a large library of chemicals with unique nucleotide se-
`quences. After the chemical entity is bound to a target, the
`genetic tag can be amplified by replication and utilized for
`enrichment of the bound molecules by serial hybridization to a
`subset of the library. The nature of the chemical structure
`bound to the receptor is decoded by sequencing the nucleotide
`tag.
`
`There is an increasing need to find new molecules that can
`efiectively modulate a wide range of biological processes, for
`applications in medicine and agriculture. A standard way to
`search for novel chemicals is to screen collections of natural
`materials, such as fermentation broths, plant extracts, or
`libraries of synthesized molecules. Assays can range in
`complexity from simple binding reactions to elaborate phys-
`iological preparations. The screens often only provide leads,
`which then require further improvement either by empirical
`methods or by chemical design. The process is time-
`consuming and costly but is unlikely to be replaced totally by
`rational methods even when they are based on detailed
`knowledge of the three-dimensional structure of the target
`molecules. Thus, what we might call “irrational drug de-
`sign”—the process of selecting the correct molecules from
`large ensembles or repertoires—requires continual improve-
`ment both in the generation of repertoires and in the methods
`of selection.
`Recently there have been several developments in using
`peptides or nucleotides to provide libraries of compounds for
`discovery of leads. The methods were originally developed to
`speed up the determination of epitopes recognized by mono-
`clonal antibodies. For example, the standard serial process of
`stepwise search of synthetic peptides has been replaced by a
`variety of highly sophisticated methods in which large arrays
`of peptides are synthesized in parallel and screened with
`acceptor molecules labeled with fluorescent or other reporter
`groups (1, 2). The sequence of any elfective peptide can be
`decoded from its address in the array. In another approach,
`combinatorial libraries of peptides are synthesized on resin
`beads such that each resin bead contains about 20 pmol of the
`same peptide (3). The beads are exposed to labeled acceptor
`molecules. Those with bound acceptor are identified by
`visual inspection and physically removed, and the peptide is
`sequenced directly. In principle, this method could be used
`with other chemical entities, provided one has a sensitive
`method for sequence determination.
`A difierent method of solving the problem of identification
`in a combinatorial peptide library is used by Houghten er al.
`(4). For hexapeptides of the 20 natural amino acids, separate
`libraries are synthesized, each with the first two amino acids
`
`The publication costs of this article were defrayed in part by page charge
`payment. This article must therefore be hereby marked “adverti.rement"
`in accordance with 18 U.S.C. §1734 solely to indicate this fact.
`
`fixed and the remaining four positions occupied by all pos-
`sible combinations. An assay, based on competition for
`binding or some other activity, is then used to find the library
`with an active peptide. On the basis of this result, 20 new
`libraries are synthesized and assayed to determine the effec-
`tive amino acid in the third position. The process is reiterated
`in this fashion until the active hexapeptide is defined. This is
`analogous to the method used in searching a dictionary: the
`peptide is decoded by using a series of sieves, and this makes
`the search logarithmic. A powerful biological method has
`recently been described in which the library of peptides is
`presented on the surface of a bacteriophage such that each
`phage displays a particular peptide and contains within its
`genome the corresponding DNA sequence (5, 6). The library
`is prepared by synthesizing a repertoire of random oligonu-
`cleotides to generate all combinations, followed by their
`insertion into a phage vector. Each of the sequences is cloned
`in one phage and the relevant peptide can be selected by
`finding those that bind to the particular target. The phages
`recovered in this way can be amplified and the selection
`repeated. The sequence of the peptide is decoded by se-
`quencing the DNA. Another “genetic” method has been
`applied by Tuerk and Gold (7) and Ellington and Szostak (8),
`using libraries of synthetic oligonucleotides that themselves
`are selected for binding to an acceptor and then amplified by
`the polymerase chain reaction (PCR). In this case, however,
`the repertoire is limited to nucleotides or nucleotide ana-
`logues that preserve specific Watson-Crick pairing and can
`be copied by a polymerase.
`The main advantages of the genetic methods reside in the
`capacity for cloning and amplification of DNA sequences,
`which allows enrichment by serial selection and provides a
`facile method for decoding the structure of active molecules.
`However, the genetic repertoires are restricted to nucleotides
`and peptides composed of natural amino acids, whereas a
`more extensive chemical repertoire is required to populate
`the entire universe of binding sites. In contrast, chemical
`methods can provide limitless repertoires, but they lack the
`capacity for serial enrichment and there are difficulties in
`discovering the structures of selected active molecules. We
`have now devised a way of combining the virtues of both
`methods through the construction of encoded combinatorial
`chemical
`libraries,
`in which each chemical sequence is
`labeled by an appended “genetic" tag, itself constructed by
`chemical synthesis. In effect, we implement a “retrogenetic”
`way of specifying each chemical structure.
`In outline, we perform two alternating parallel combina-
`torial syntheses so that the genetic tag is chemically linked to
`the chemical structure being synthesized. In each case,
`addition of a monomeric chemical unit to a polymeric struc-
`ture is followed by addition of an oligonucleotide sequence
`which is defined as “encoding” that chemical unit. The
`library is built up by the repetition of this process after
`pooling and division. Active molecules are selected by bind-
`ing to a receptor, and amplified copies of their retrogenetic
`tags are obtained by the PCR. DNA strands with the appro-
`priate polarity can then be used to enrich for a subset of the
`
`5381
`
`

`
`5382
`
`Chemistry: Brenner and Lerner
`
`Proc. Natl. Acad. Sci. USA 89 (1992)
`
`library by hybridization with the matching tags, and the
`process can then be repeated on this subset. Thus serial
`enrichment is achieved by a process of purification, exploit-
`ing linkage to a nucleotide sequence that can be amplified.
`Finally, the structures of the chemical entities are decoded by
`cloning and sequencing the products of PCR.
`
`DesignoftheCodeandtIIeGeneticTag
`
`It is essential to choose a coding representation in such a way
`that no significant part of the sequence can occur by chance
`in some other unrelated combination. Suppose we allocate a
`triplet to each of the chemical units used. Then, because the
`method allows us to "cover all combinations and permutations
`of an alphabet of chemical units, unless we are careful, we
`could find that two difierent combinations have closely
`related sequences which difier only by a frame shift and
`which could not be easily distinguished by hybridization.
`This, potentially the greatest source of errors, can be elim-
`inated by choosing a commaless code (9). The particular
`commaless triplet code that we have chosen allows 20 unique
`representations, as shown in Table 1.
`The sequences for the PCR primers must be chosen so that
`they do not occur within any coding segment and so that they
`can be readily removed from the final PCR product because
`we do not want them to dominate the selective hybridization.
`This can be achieved by building in sites for restriction
`enzymes with the appropriate polarity of cutting. One of the
`restriction enzymes should cut at a site that permits the
`incorporation of a biotinylated nucleotide, such as biotinyl-
`dU'I'P, into the strand complementary to the coding strand.
`All of the above conditions have been met in the following
`design:
`
`S’-AGCTACTTCCCIIGG [coding sequence] GGGCCCTATTCTTAG-3'
`3‘-TCGA'.l'GARGGG'.|!!§§[anticoding smm ATRAGAATC-5’
`Sty I
`Apa I
`
`After cleavage with both restriction enzymes we have
`5'-AGCTACTTCC
`CIIG6 [coding sequence] GGGCC
`CTITTCTTAG-3'
`3'-TCGITGIAGGGIIC
`Qllnticoding strandlg
`CCGGGATAIGIBTC-5’
`
`The internal fragment can be cloned in an appropriate vector
`to sequence the individuals. The temiinal overhang of the Sty
`
`Table 1. Commaless code used in this study
`ttt
`tct
`tat
`TTC
`tcc
`tac
`TTA
`tea
`taa
`TTG
`tcg
`tag
`
`ctt
`CTC
`CTA
`CTG
`
`att
`ATC
`ATA
`ATG
`
`cct
`ccc
`cca
`ccg
`
`act
`ACC
`ACA
`ACG
`
`cat
`cac
`caa
`cag
`
`at
`aac
`aaa
`aag
`
`tgt
`tgc
`tga
`tgg
`
`cgt
`cgc
`cga
`egg
`
`agt
`agc
`aga
`agg
`
`ggt
`gat
`gct
`gtt
`ggc
`gac
`GCC
`GTC
`gga
`GAA
`GCA
`GTA
`ggg
`GAG
`GCG
`GTG
`“Sense triplets” a.re XYZ; nonsense triplets are xyz.
`
`I site can be filled in with dCTP and biotinyl-dUTP (BTP)
`which, because an asymmetric site was chosen, will ap-
`pend the biotinylated nucleotides to only one of the cleavage
`products.
`CIlGG[codin8 SGCIIIGIIGGIGGOCC CTITTCTTAG-3'
`5’-AGCTACTTCCC
`3'-TCGATGAAGGGITC 3BC§[amicoding strandlg
`CCGQGATAAGAATC-5'
`
`The biotinylated fragment can be bound to avidin and, alter
`denaturation, provides the strand suitable for hybridization
`and selection of the appropriate coding strands:
`
`Avidin-BB CQ[anticoding strand]Q
`
`The two PCR primers are the two sequences 5’-AGCTACT-
`TCCCAAGG (Sty I primer) and 5'-CTAAGAATAGGGCCC
`(Apa I primer). Adding a biotin to the 5’ end of the Apa I
`primer would allow the isolation of the whole strand con-
`taining the anticoding sequence.
`We should have at least 15 nucleotides in the coding region
`for effective hybridization. Thus, in a library of degree d 2
`5, that is, composed of five or more successive chemical
`units, we could code each unit by a triplet. That would allow
`an alphabet (A) of up to 20 difl‘erent units, each corresponding
`to one of the triplets defined above. The complexity of the
`combinatorial library is A‘. Libraries with a smaller degree,
`say d = 3, should be coded by sextuplets, which, in the
`simplest case, could be a repeated triplet (this size is chosen
`because any combination of triplets still obeys the commaless
`condition). In the same way, the size of the alphabet can be
`extended by using combinations of triplets to code for the
`chemical units.
`
`AFormalExarnple
`
`As an illustration we discuss how a library of degree d = 3 is
`made with an alphabet of two amino acids, glycine and
`methionine. In this case, we use sextuplets to give us a
`reasonable length of coding sequence. To make the se-
`quences as difierent as possible we code each amino acid by
`a combination of two different triplets as follows:
`
`Gly = CACATG, Met = ACGGTA.
`
`Step I. We begin with some appropriate linker, LINK,
`attached to some solid-state surface and synthesize the first
`PCR oligonucleotide sequence on one end,
`in the usual
`3’-to-5' direction, to give
`GGGCCCTITTCTTIG-LINK
`
`Step 2. This product is divided into two aliquots for parallel
`synthesis. In each synthesis, one amino acid is added to
`LINK and the oligonucleotide sequence is extended by the
`corresponding code to give the following products:
`
`CACATGGGGCCCTITTCTTIG-LI NK—G1y
`ACGGTllGGGCCC'.l!l'.l!'.l!C'.l!!l!AG—LINK—Het
`
`Step 3. The elongated products are pooled and again split
`into two parts for parallel synthesis, yielding
`
`CllCATGCACATGGGGCCCIITTCTTlG—LIllK—Gly—G1y
`CACATGllCGGTllGGGCCC!l!l!l!C!|!'l!lG—LINK-net-G1y
`ACGGTACACATGGGGCCC'£lTTCT'l!lG—LIIIK-G1y-—llet
`AcccrnAcccneeccccnncr-no-L1m<-net-net
`
`Steps 4 and 5. Once more the products are pooled and
`divided into two aliquots for parallel synthesis. This results
`in an ensemble of eight tripeptide sequences, each encoded
`by a unique sequence of 18 nucleotides. The second PCR
`oligonucleotide is added to the ensemble of products to give
`
`

`
`Chemistry: Brenner and Lerner
`
`Proc. Natl. Acad. Sci. USA 89 (1992)
`
`5383
`
`IGCTACTTCCCLLGGCACATGCACATGCACATGGGGCCCTITICTTAG-LINK-Gly—Gly—G1y
`
`AGCTACTTCCCAAGGCACATGCACATGACGGTAGGGCCCTATTCTTAG-LINK-Met-Gly—G1y
`
`IGCTACTTCCCAAGGCACATGACGGTACACATGGGGCCCTITTCTTlG—LINK—G1y-Met—Gly
`
`IGCTACTTCCCAAGGCACATGACGGTAACGGTAGGGCCCTATTCTTAG—LINK-Met-Met-G1y
`
`IGCTICTTCCCAAGGACGGTACACATGCACATGGGGCCCTATTCTTAG-LINK—G1y—Gly-Met
`
`IGCTACTTCCCAAGGACGGTACACATGACGGTAGGGCCCTATTCTTAG-LINK-Met-Gly—Het
`
`AGCTICTTCCCAAGGACGGTAACGGTACACATGGGGCCCTATTCTTAG—LINK—Gly—Met-Met
`AGCTACTTCCCALGGACGGTAACGGTAACGGTAGGGCCCTATTCTTAG-LINK—Met—Met—Het
`
`Implementation
`
`Although natural amino acids are used in the example dis-
`cussed above, the system is not limited to these, nor, for that
`matter,
`to peptides. The chemistry required for making
`encoded libraries is constrained only by the compatibility of
`the two alternating syntheses. Partly this involves the choice
`of the protecting groups, and the methods used to deprotect
`one chain while the other remains blocked. And, of course,
`each product needs to survive through the synthesis of the
`other. One can imagine many different ways of joining the
`chemical entities together. and one could even use mixed
`syntheses, provided that the rules of mutual compatibility are
`obeyed.
`We have recently, in principle, solved the synthetic pro-
`cedures for peptides (K. Janda, S. Ramcharitar, S.B., and
`R.A.L., unpublished results). Even within this field there is
`a choice of alphabets that extends well beyond the 20 natural
`a-amino acids. The only requirement is that we be able to
`make an amide bond. Thus, the amino and carboxylic groups
`can be located on a wide variety of compounds so that we can
`make libraries with many different backbone structures. We
`can also combine different backbones, if we define alphabets
`where, for example, both the number of carbon atoms and
`their configurations in the backbone are varied. New amino
`acids can be easily invented with unusual heterocyclic rings,
`such as thiazole-alanine or purine-alanine. These rings are
`components of natural effector molecules and often provide
`core chemical functions for important drugs. Libraries made
`with such alphabets will allow us to explore the combinatorial
`association of known effector chemical functions.
`It is also useful to consider how large the combinatorial
`library should be. The PCR provides a very sensitive detec-
`tion method, allowing even a few molecules to be seen.
`However, we need to have some reasonable concentration of
`each of the species present to cross the binding threshold of
`the acceptor molecule being assayed. If, for example, we set
`this as 1 p.M and want 1 ml of the library, then we need to
`make at least 1 nmol of each of the species. Libraries with
`complexities of up to 10‘, giving us a total amount of 10 pmol
`of product, would seem reasonable. Because of this recip-
`rocal relationship, more complex libraries could be made if
`the binding threshold is lowered.
`
`Discussion
`
`Traditional chemical synthesis proceeds by careful design,
`sequentially linking atoms or groups of atoms to a growing
`core structure. The process has the advantage that the
`product of each step can be analyzed, thereby allowing
`continuous evaluation ofthe effectiveness ofa given strategy.
`Indeed, the analyzed results of these individual steps ulti-
`mately become the corpus of synthetic organic chemistry. A
`major technical revolution occurred with the advent of solid-
`state methods for the synthesis of polymeric molecules (10).
`Here, since a limited number of suitably protected oligomeric
`
`units are added via a common covalent bond, the results of
`the individual transformations can be predicted, and, to first
`approximation,
`it is necessary to analyze only the final
`product. In addition, the relationship of the monomeric units
`to each other and the extent of conformational space that is
`occupied can be estimated. Our method permits the study of
`the efficacy of combinatorial associations of diverse chemical
`units without the necessity of either synthesizing them one at
`a time or knowing their interactions in advance. It also allows
`easy identification of the most effective molecules through a
`common method of nucleic acid sequencing. Once the chem-
`ical polymers are decoded, more precise questions about
`critical
`interactions and conformations can be asked by
`reversion to classical chemical methods. Further, we expect
`that many receptors will interact with sets of related but not
`identical chemical entities such that major clues as to critical
`interactions can be deduced from the shared features of the
`sets.
`
`Our method also provides a method of amplification, again
`by exploiting a common procedure of nucleic acid hybrid-
`ization. In any screening procedure where large libraries of
`compounds or effector molecules are being studied,
`the
`absolute number of different nonspecific interactions may be
`large, but the specific ligand or efiector is represented many
`more times than any individual background molecule. In such
`a situation the signal-to-noise ratio rapidly increases after
`repeated cycles of amplification and selection, and the spe-
`cific molecule becomes highly enriched after only a few
`iterations. For both identification and selection, our method
`exploits the power of genetic systems. By coupling genetics
`and the versatility of organic chemical synthesis we have
`extended the range of analysis to chemicals that are not
`themselves part of biological systems.
`
`We thank Kim Janda, Bernie Gilula, and Jerry Joyce for helpful
`comments on the manuscript.
`
`4.
`
`1. Fodor, S. P. A., Read, J. L., Pirrung, M. C.. Stryer, L., Tsar’,
`Lu, A. & Solas, D. (1991) Science 251, 767-773.
`2. Geysen, H. M., Meloen, R. H. & Barteling, S. J. (1984) Proc.
`Natl. Acad. Sci. USA 81, 3998-4002.
`3. Lam, K. S., Salmon, S. E., Hersh, E. M., Hruby, V. J., Kaz-
`mierski, W. M. & Knapp, R. J. (1991) Nature (London) 354,
`82-84.
`I-loughten, R. A., Pinilla, C., Blondelle, S. E., Appel, J. R.,
`Dooley, C. T. & Cuervo, J. H. (1991) Nature (London) 354,
`84-86.
`Scott, J. K. & Smith, G. P. (1990) Science 249, 386-390.
`Cwirla, S. E., Peters, E. A., Barrett, -R. W. & Dower, W. J.
`(1990) Proc. Natl. Acad. Sci. USA 87, 6378-6382.
`Tuerk, C. & Gold, L. (1990) Science 24, 505-510.
`Ellington, A. D. & Szostak, J. W. (1990) Nature (London) 346,
`818-822.
`9. Crick, F. H. C., Griffith, J. S. & Orgel, L. E. (1957) Proc.
`Natl. Acad. Sci. USA 43, 416-421.
`10. Merrifield, B. (1984) Les Prix Nobel (Almqvist & Wiksell
`lntemational, Stockholm), pp. 127-153.
`
`
`
`9°.".°‘§"

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket