`
`Computer Design of Bioactive Molecules: A Method for
`Receptor-Based de Novo Ligand Design
`Joseph B. Moon and W. Jeffrey Howe
`Computational Chemistry, Upjohn Laboratories, Kalamazoo, Michigan 49001
`
`ABSTRACT
`The design of molecules to
`bind specifically to protein receptors has long
`been a goal of computer-assisted molecular de(cid:173)
`sign. Given detailed structural knowledge of
`the target receptor, it should be possible to con(cid:173)
`struct a model of a potential ligand, by algorith(cid:173)
`mic connection of small molecular fragments,
`that will exhibit the desired structural and elec(cid:173)
`trostatic complementarity with the receptor.
`However, progress in this area of receptor(cid:173)
`based, de novo ligand design has been ham(cid:173)
`pered by the complexity of the construction
`process, in which potentially huge numbers of
`structures must be considered. By limiting the
`scope of the structure-space examined to one
`particular class of ligands-namely, peptides
`and peptide-like compounds-the problem com(cid:173)
`plexity has been reduced to the point that suc(cid:173)
`cessful, de novo design is now possible. The
`methodology presented employs a large tem(cid:173)
`plate set of amino acid conformations which
`are iteratively pieced together in a model of the
`target receptor. Each stage of ligand growth is
`evaluated according to a molecular mechanics(cid:173)
`based energy function, which considers van der
`Waals and coulombic interactions, internal
`strain energy of the lengthening ligand, and de(cid:173)
`solvation of both ligand and receptor. The
`search space is managed by use of a data tree
`which is kept under control by pruning accord(cid:173)
`ing to the energy evaluation. Ligands grown by
`this procedure are subjected to follow-up eval(cid:173)
`uation in which an approximate binding en(cid:173)
`thalpy is determined. This methodology has
`proven useful as a precise model-builder and
`has also shown the ability to design bioactive
`ligands.
`
`INTRODUCTION
`The ability of a molecule, such as a drug, to exert
`a desired biological effect is often related to its af(cid:173)
`finity for one or more endogeneous receptor mole(cid:173)
`cules. For a ligand to interact optimally with a re(cid:173)
`ceptor, it must be able to attain a shape which is at
`least partly complementary to that of a binding lo(cid:173)
`cation on the receptor. Additionally, other factors
`such as electrostatic interactions, hydrogen bond(cid:173)
`ing, hydrophobic interactions, desolvation effects,
`and cooperative motions of ligand and receptor all
`
`© 1991 WILEY-LISS, INC.
`
`influence the binding event and should be taken
`into account in attempts to design bioactive ligands.
`Processes such as distribution and metabolism,
`while they play a critical role in the delivery of the
`putative ligand to the receptor location, do not re(cid:173)
`flect a compound's "intrinsic activity" and lie out(cid:173)
`side the scope of the current discussion.
`In principle, it should be possible to design mole(cid:173)
`cules that will bind to a preselected site on a recep(cid:173)
`tor. This is not a simple undertaking, since in most
`design situations little or no structural information
`exists to characterize the receptor. One can, how(cid:173)
`ever, use "indirect" methods 1 to exploit what is
`known about molecules that elicit the desired bio(cid:173)
`logical response (assuming that they interact with
`the same receptor) to generate a structural and elec(cid:173)
`tronic hypothesis of what the receptor recognizes or
`will accept. Various computer-based methods have
`8
`been developed to assist in this kind of study .1
`-
`Once the hypothesis has been generated it can be
`used to suggest molecular modifications to improve
`the activity of known ligands or to identify entirely
`new structural classes (lead compounds) for study as
`potential ligands. The latter can be accomplished
`via searches over large databases of 3D molecular
`structures to identify molecules which match the hy(cid:173)
`16
`pothesized requirements for activity. 9
`-
`The increasing availability of biomacromolecule
`structures that have been solved crystallographi(cid:173)
`cally has prompted the development of "direct" com(cid:173)
`putational methods for molecular design, in which
`the steric and electronic properties of receptor bind(cid:173)
`ing sites are used to guide the design of potential
`11
`12
`17
`ligands. 1
`19 Direct methods generally fall
`•
`•
`•
`-
`into two categories: (1) design by analogy, in which
`3D structures of known molecules (such as from a
`crystallographic database) are placed in the receptor
`structure and scored for goodness-of-fit; and (2) de
`novo design, in which the ligand model is con(cid:173)
`structed piecewise in the receptor. The latter ap(cid:173)
`proach, in particular, offers considerable promise for
`the development of novel molecules, uniquely de(cid:173)
`signed to bind to the target.
`
`Received August 23, 1990; revision accepted March 15, 1991.
`Address reprint requests to either author, The Upjohn Com(cid:173)
`pany, Computational Chemistry, 301 Henrietta St., Kalama(cid:173)
`zoo, MI 49001.
`
`
`
`COMPUTER DESIGN OF BIOACTIVE MOLECULES
`
`315
`
`While examples of successful, computer-assisted,
`de novo design can be found, 20 there are no examples
`of automated, or computer-driven, de novo construc(cid:173)
`tion in the literature (although Wise et al. 21 have
`reported using the structure-building program
`GENOA22 to generate molecules to match a require(cid:173)
`ments hypothesis). The term "automated de novo
`design" is used here to refer to the algorithmic con(cid:173)
`struction of a putative ligand from small fragments,
`guided by steric and electronic constraints imposed
`by the receptor, plus appropriate consideration of
`solvation effects and internal strain energy of the
`ligand.
`26 Dean and co(cid:173)
`In a recent series of papers,23
`-
`workers describe a four-step strategy for automated,
`de novo drug design. Although their goal has not yet
`been achieved, there has been considerable progress
`in algorithm development. Furthermore, their stud(cid:173)
`ies make clear the complexity of the de novo con(cid:173)
`struction problem as well as the importance of de(cid:173)
`veloping noncombinatorial approaches. In our work,
`we have chosen to focus on one particular region of
`the large structure-space that is ultimately the de(cid:173)
`sign territory of such methods. By confining the
`search space to consider only amino acids and re(cid:173)
`lated fragments as the molecular building blocks,
`the construction problem has become quite tracta(cid:173)
`ble, and we are able to report the first examples of
`bioactive ligands designed by automated de novo
`methods. The putative ligands that result from this
`construction method are peptides and peptide-like
`compounds rather than the small organic molecules
`that are typically the goal of drug design research.
`The appeal of the peptide building approach is not
`that peptides are preferable to organics as potential
`pharmaceutical agents, but rather that: (1) they can
`be generated relatively rapidly de novo; (2) their en(cid:173)
`ergetics can be studied by well-parameterized force
`field methods; (3) they are much easier to synthesize
`than are most organics; and ( 4) they can be used in
`a variety of ways, for peptidomimetic inhibitor de(cid:173)
`sign, protein-protein binding studies, and even as
`shape templates in the more commonly used 3D or(cid:173)
`ganic database search approach described above. We
`also show that the method need not be restricted to
`just the 20 natural amino acids; it can easily be ex(cid:173)
`tended to include other related fragments of interest
`to the medicinal chemist.
`
`METHODS
`Description of the GROW Method
`
`Overview
`The de novo peptide design method has been in(cid:173)
`corporated in a software package called GROW. In a
`typical design session, standard interactive graphi(cid:173)
`cal modeling methods (using the Mosaic software
`system,27 which is based on MacroModel28
`) are em(cid:173)
`ployed to define the structural environment in which
`
`GROW is to operate. The environment could be the
`active site cleft of an enzyme, or it could be a set of
`features on a protein surface to which the user wishes
`to bind a peptide-like molecule. The GROW program
`then operates independently of the user to generate
`a set of potential ligand molecules. Interactive mod(cid:173)
`eling methods then come into play again, for exam(cid:173)
`ination of the resulting molecules, and for selection
`of one or more of them for further refinement.
`The method is designed to construct peptide models
`from a user-selected starting position by iteratively
`piecing together amino acids in conformations which
`will interact most favorably with the atoms in the
`receptor site. For input, GROW operates on an atomic
`coordinate file generated by the user in the interac(cid:173)
`tive modeling session, plus a small fragment (an
`acetyl group) positioned in the receptor to provide a
`starting point for peptide growth. These are referred
`to as "site" atoms and "seed" atoms, respectively. A
`second file provided by the user contains a number of
`control parameters to guide the peptide growth.
`The operation of the GROW algorithm is concep(cid:173)
`tually fairly simple, and is summarized in Figure 1.
`GROW proceeds in an iterative fashion, to system(cid:173)
`atically attach to the seed fragment each amino acid
`template in a large preconstructed library of amino
`acid conformations. When a template has been at(cid:173)
`tached, it is scored for goodness-of-fit to the receptor
`site, and then the next template in the library is
`attached to the seed. After all the templates have
`been tested, only the highest scoring ones are re(cid:173)
`tained for the next level of growth. This procedure is
`repeated for the second growth level; each library
`template is attached in turn to each of the bonded
`seed/amino acid molecules that were retained from
`the first step, and is then scored. Again, only the
`best of the bonded seed/dipeptide molecules that re(cid:173)
`sult are retained for the third level of growth. The
`growth of peptides can proceed in the N-to-C direc(cid:173)
`tion only, the reverse direction only, or in alternat(cid:173)
`ing directions, depending on the initial control spec(cid:173)
`ifications supplied by the user. Successive growth
`levels therefore generate peptides that are length(cid:173)
`ened by one residue. The procedure terminates when
`the user-defined peptide length has been reached, at
`which point the user can select from the constructed
`peptides those to be studied further. The resulting
`data provided by the GROW procedure include not
`only residue sequences and scores, but also atomic
`coordinates of the peptides, related directly to the
`coordinate system of the receptor site atoms. In the
`following sections we examine in more detail the
`individual components that comprise the basic pro(cid:173)
`cedure just described.
`
`Library construction
`Because most amino acids are quite flexible, a
`large number of template structures must be tested
`during the growth procedure to ensure adequate
`
`
`
`316
`
`J.B. MOON AND W.J. HOWE
`
`SETUP:
`(a) Interactive modeling: select site atoms ~
`
`(b) Select seed position - - - - - -
`
`(c) Specify control parameters t
`GROW:
`
`Template
`Library
`
`monopeptides
`
`..__ B: keep 10 best constructs
`
`....
`
`A: attach each template to seed; score
`
`dipeptides
`
`n-peptides
`
`....__ C: attach each template to each construct
`kept; score
`
`..._ D: keep 10 best
`
`...- E: iterate over C and D
`
`F: stop at requested peptide length, keep
`10 best
`
`EVALUATE: Interactive modeling, batch energy minimization:
`(a) Minimize ligand/site together and separately
`
`(b) Determine approximate binding energy
`
`Fig. 1. Schematic overview of the operation of the GROW
`algorithm. The site-and seed coordinate file and the command file
`(described later) are provided to the GROW procedure by the
`user. Grow1h can be visualized as a tree process in which each
`library template is attached to the seed (A) and then evaluated by
`the scoring function. Of the resulting 6000 + constructs, the 10
`best are kept for the next level (B). 10 is the default retention; a
`command file keyword can be used to broaden the search at any
`stage. To each retained monopeptide/seed construct are attached
`
`all library templates, which are again scored (C). After pruning (D),
`the process is repeated (E) until the specified peptide length
`(specified in the command file, see Fig. 5) is reached (F). In this
`tree diagram, circles represent those nodes selected (based on
`highest scores across the entire level) for further grow1h. Uncir(cid:173)
`cled nodes are pruned. Horizontal dots denote continuation
`across all template additions, and vertical dots represent the iter(cid:173)
`ative process of tree grow1h.
`
`coverage of the conformational space accessible to
`each residue. The template library was generated
`with the Mosaic modeling program in conjunction
`with the MacroModel/BatchMin28 (version 2.5) im(cid:173)
`plementation of the AMBER29 forcefield. The same
`forcefield implementation was used for all energy(cid:173)
`related work described herein. Starting models of
`the 20 standard amino acids were constructed as N(cid:173)
`acetyl-N'-methylamides (Fig. 2A), followed by en(cid:173)
`ergy minimization.* The models were then subjected
`to a search procedure in which conformers were gen(cid:173)
`erated by varying all flexible torsion angles in the
`amino acids by random increments. Any conformer
`
`*Unless otherwise indicated, the convergence criterion used
`for all energy minimizations discussed in this paper was an
`rms gradient of <0.1 kealiA, with the BatchMin/MacroModel
`PRCG minimizer.
`
`which contained two nonbonded heavy atoms at a
`separation of <2.0 A was discarded. After 3,000 to
`5,000 viable conformations were produced for each
`amino acid, the structures were subjected to a par(cid:173)
`tial energy minimization (15 iterations of block di(cid:173)
`agonal Newton-Raphson minimization) to relieve
`significant internal strain energies. At this point,
`each conformation was compared to every other con(cid:173)
`formation so that duplicate structures would be dis(cid:173)
`carded. Two conformations were considered to be
`identical if no atomic positions differed by more than
`0.3 A when the structures were aligned by superpo(cid:173)
`sitioning of their N -terminal amide atoms. The re(cid:173)
`maining conformations were sorted in ascending en(cid:173)
`ergy order and were stored in the template library
`along with their energies. Templates of nonstandard
`amino acids, pseudodipeptides, and organic terminal
`groups were constructed in the same manner, em-
`
`
`
`COMPUTER DESIGN OF BIOACTIVE MOLECULES
`
`317
`
`0
`
`H Y"
`
`0
`
`~-----------------------
`-----------,---- '...
`template 2
`:
`template 1
`... ...
`R
`1
`1
`0
`
`2
`
`N
`H
`.........
`0
`'----L-----------1
`----------------------1
`
`H01
`Y N
`
`I
`I
`I
`,
`
`...
`R,
`
`'
`
`0
`
`... ~ ... , H :
`
`N ...........
`
`I
`I
`I
`I
`I
`I
`I
`
`I
`
`I
`I
`I
`I
`I
`
`I
`
`A
`
`8
`
`c
`
`r----,
`
`0:
`
`R
`
`01 )!
`)1N--l -~,
`01 H, n
`
`:
`
`0
`
`N/
`H
`
`Fig. 2.
`(A) Template generation method using phenylalanine
`as an example: bonds marked with arrows are rotated by random
`increments to generate additional conformations. This is followed
`by contact filtering, partial minimization, and duplicate elimination.
`(B) Template connection method: amide end groups are super-
`
`imposed to connect two templates together in the proper geom(cid:173)
`etries to form peptides. (C) Template alignment method: the two
`alignments of a template with the seed group are shown. The
`alignment used depends on the direction in which the peptide is to
`be grown.
`
`ploying the extended parameter set (in addition to
`the original29 AMBER parameters) provided by the
`MacroModel/BatchMin implementation.
`Figure 3 lists the contents of the template library
`and the number of unique conformations stored for
`each residue. During a GROW run, from 300 to
`1,000 lowest energy conformations are typically uti(cid:173)
`lized for each amino acid; the default is 300. For
`comparison, values in parentheses indicate the
`number of initial conformations generated for the
`residues during library construction. Of the 2,000
`trial conformations of alanine, for example, partial
`energy minimization and duplicate elimination re-
`
`duced the set to 171 unique conformations. As might
`be expected, this type of reduction in the number of
`conformations was not seen with the pseudodipep(cid:173)
`tides and certain of the other residues, due to their
`extreme flexibility. The implications of template
`flexibility will be discussed in a later section.
`Application of a partial energy minimization dur(cid:173)
`ing library construction produces structures that lie
`near, but not generally at, energetic minima. Since
`energetic minima of a bound ligand will not neces(cid:173)
`sarily correspond to minima of an unbound ligand,
`restriction of templates to unbound minimum(cid:173)
`energy conformations represents an unwarranted
`
`
`
`318
`
`J.B. MOON AND W.J. HOWE
`
`Standard Amino Acids
`
`Non-standard Amino Acids
`
`y~ylN/
`0 R H
`
`ALA 171 (2000)
`
`LEU 1108 (5000)
`
`AAG 4987 (5000)
`
`LYS 4743 (5000)
`
`ASN 2706 (5000) MET 4661 (5000)
`
`ASP 1505 (5000)
`
`PH E 3485 (5000)
`
`CYS 2123 (3000)
`
`P A 0 53 (2000)
`
`GLN 3734 (5000)
`
`SEA 1598 (5000)
`
`GLU 3213 (5000)
`
`THA 1702 (5000)
`
`GLY 271 (1000)
`
`TAP 4537 (5000)
`
`HIS 4026 (5000)
`
`TVA 4732 (5000)
`
`ILE 1478 (5000)
`
`VAL 346 (5000)
`
`1 (1)
`
`Terminal Groups
`H
`ACE YN'
`0 H
`BOC .:YOrN'-
`H
`TBA hN'- 88 (1000)
`AMP ~ y~ N 540 (2000)
`
`170 (1000)
`
`0
`
`0 N ~
`LNH
`
`BMH y:~r
`H 2318 (5000)
`,__ N NH
`..v
`IMG Y:_:¢( 1132 (5000)
`H Y'/ 3392 (5000)
`
`CHA
`
`Pseudodlpeptldes ~
`H
`H
`
`'"'1[' ,, 5000 (5000)
`
`H 0
`
`0
`
`~ #
`
`5000 (5000)
`
`0
`
`Fig. 3. Contents of the template library. At present, the tem(cid:173)
`plate library contains standard L- and o-amino acids, several non(cid:173)
`standard residues, organic terminators, and pseudodipeptides,
`some of which are shown here. The table indicates, for each
`fragment, its 3-character identifier, which can be specified in the
`control file for running GROW in restricted mode, a parenthesized
`
`value which indicates the number of initial conformations gener(cid:173)
`ated for that fragment during library construction, and an unpa(cid:173)
`renthesized value which indicates the number of conformations
`that survived the partial minimization and duplicate elimination
`steps during library construction. Data shown for standard amino
`acids apply equally for L-and o-forms.
`
`constraint. The collection of amino acid templates
`that resulted from the procedure just outlined rep(cid:173)
`resents a broad sampling over low-energy conforma(cid:173)
`tional space. The assumption made is that such frag(cid:173)
`ments can be connected together to form peptides
`with low internal conformational energy; adverse
`interactions between residues are dealt with at a
`later stage.
`The acetyl and amide end groups placed on the
`amino acid models serve two purposes. First, they
`produce some of the conformational restriction ex(cid:173)
`perienced by individual amino acids when they are
`connected in a polypeptide chain. They also provide
`a convenient way to connect the templates during
`peptide construction; two templates can be joined
`
`together simply by superimposing the N-terminal
`amide of one template onto the C-terminal amide of
`another (Fig. 2B).
`
`Seed fragment positioning
`The placement of the seed fragment, while sepa(cid:173)
`rate from the GROW method itself, has a great in(cid:173)
`fluence on the outcome of a GROW procedure. A
`poorly positioned seed can prevent designed peptides
`from reaching important interaction sites in the re(cid:173)
`ceptor. Because of this sensitivity, we have exam(cid:173)
`ined a number of techniques for choosing reasonable
`seed positions. In the few cases in which an X-ray
`crystallographic structure of a bound ligand is avail(cid:173)
`able, atoms within the ligand can be used to form a
`
`
`
`COMPUTER DESIGN OF BIOACTIVE MOLECULES
`
`319
`
`seed position. If the ligand is a peptide, choosing a
`seed group from the ligand backbone will greatly
`improve the chances of producing meaningful re(cid:173)
`sults. Without an X-ray structure of a receptor(cid:173)
`ligand complex, however, other methods must be
`used to generate seed positions. It is possible with
`most modeling systems to manually dock an acetyl(cid:173)
`containing seed fragment into a receptor site. How(cid:173)
`ever, identifying the optimal placement of seed frag(cid:173)
`ments is not a trivial problem. Work is in progress to
`develop additional methods for this purpose.
`The technique that we have found most promis(cid:173)
`ing in this situation is to employ an automatic frag(cid:173)
`ment docking algorithm which positions numerous
`copies of a small amide-containing fragment in the
`specified site, at locations calculated to provide the
`strongest interactions between the fragment and
`site atoms. Originally based on the shape-matching
`DOCK methodology developed by Kuntz and co(cid:173)
`12 the algorithm has been extended to
`workers, 11
`•
`score each potential docking orientation based on
`van der Waals and electrostatic interactions be(cid:173)
`tween a docked fragment and the site atoms. Orien(cid:173)
`tations which score below a specified threshold are
`discarded. A large number of overlapping copies of
`the amide-containing fragment generally results
`from this procedure, tracing out pathways in the site
`where ligand interactions with the receptor would
`be strongest. From this map one or more seed posi(cid:173)
`tions can be obtained from the location of the amide
`bonds within the docked fragments. Each seed se(cid:173)
`lected defines a separate GROW run.
`
`Scoring
`Before describing the method by which peptides
`are iteratively constructed during a GROW opera(cid:173)
`tion, we first examine the scoring function that is
`applied as each new library template is attached to
`a growing peptide. The scoring function is based on
`potentials from the AMBER force field (as imple(cid:173)
`mented in MacroModel!BatchMin v2.5), with the
`addition of a solvation treatment developed by
`Scheraga. 30 When a template has been properly ori(cid:173)
`ented to connect to a growing peptide, a score is cal(cid:173)
`culated which is the sum of five terms:
`
`SCORE = -[Evdw + Ees + Econr +
`Esolv(templ) + Esolv(rec)].
`
`The more negative the individual energy terms, the
`greater the estimated binding affinity of the tem(cid:173)
`plate, and the higher the score.
`Evdw is the van der Waals energy calculated be(cid:173)
`tween the positioned template and the receptor at(cid:173)
`oms, and between the template and peptide atoms
`already grown in the receptor site. This is calculated
`using a modified Lennard-Jones 6-12 potential
`with BatchMin's AMBER parameters, for atoms
`within 6 A of the template. A small amount of "free
`
`0.1 A) has been incorpo(cid:173)
`play" (default amount
`rated into the repulsive term of this potential, so
`that van der Waals penetrations of less than that
`amount are not penalized. This softening of the re(cid:173)
`pulsion is necessary to compensate for the use of
`template sets which are not continuous through con(cid:173)
`formational space. The degree of softening is con(cid:173)
`trollable by the user, via a parameter passed in the
`control file.
`Ees is the coulombic electrostatic interaction be(cid:173)
`tween the template atoms and those in both the re(cid:173)
`ceptor site and the portion of the ligand already
`grown. Again, a 6 A cutoff is employed for the sake
`of speed. E con£ is the conformational strain energy of
`the template. This is a precalculated value which
`represents the energy difference between the tem(cid:173)
`plate's conformation and the lowest energy confor(cid:173)
`mation found, among the 3,000-5,000 conforma(cid:173)
`tions examined, for that particular amino acid
`during library creation. The value is retrieved from
`the library during ligand growth. E 801/templ) and
`Esolv(rec) are solvation terms calculated with Scher(cid:173)
`aga's method30
`: they depend on the changes in sol(cid:173)
`vent-accessible surface area caused by moving the
`template and receptor from their fully hydrated un(cid:173)
`bound state to their partially hydrated bound state.
`The solvation terms favor the hydration of polar at(cid:173)
`oms, and penalize hydration of hydrophobic atoms,
`thereby simulating a hydrophobic binding effect.
`Solvent-accessible surface areas are determined by
`the approximate method ofStill,31 because of its tre(cid:173)
`mendous speed advantage over analytical calcula(cid:173)
`tions. To further enhance the speed of the scoring
`procedure, the unbound solvation energies of the
`templates and of the receptor are calulated only
`once, the former during library construction and the
`latter at the beginning of the GROW operation.
`Only the changes in these energies are calculated
`during scoring.
`
`Ligand growth
`Having described the nature of the template li(cid:173)
`brary and the method by which templates are eval(cid:173)
`uated during ligand construction, we can now exam(cid:173)
`ine the ligand growth procedure, per se. To begin,
`GROW aligns all of the library templates in turn
`with the seed fragment. This superimposition can be
`accomplished in either of two ways, as indicated in
`Figure 2C. For the sake of this discussion, if one
`considers just the 20 standard amino acids, with on
`average 300 library conformations for each, a total
`of 6,000 templates must be evaluated. These are re(cid:173)
`tained on a "data tree" which, by this point, contains
`one node for the seed and 6,000 branches, each one
`representing a template (see Fig. 1). Each node is
`scored according the method described in the preced(cid:173)
`ing section. If GROW were to continue in this man(cid:173)
`ner, an exhaustive search for the best dipeptide
`would involved 3.6 x 107 evaluations, and 1.3 x
`
`
`
`320
`
`J.B. MOON AND W.J. HOWE
`
`1015 evaluations would be needed to find the best
`tetrapeptide. Instead, the program retains and grows
`from only a certain number of the highest scoring
`branches, usually from 10 to 100 (the default is 10),
`after each tree-level of growth. All other branches
`are pruned from the tree. This cycle of attaching
`templates, scoring templates, and pruning struc(cid:173)
`tures is repeated until peptides of the desired length
`are built. Because of the pruning step, the confor(cid:173)
`mational search performed by GROW covers a very
`small fraction of the total number of possible confor(cid:173)
`mations, but is heavily biased toward peptides
`which have low conformational energy and which
`will interact most favorably with the receptor site.
`An attractive feature of this tree structure is that
`increasing the number of templates used by the pro(cid:173)
`gram, or the length of the peptides grown, results in
`only a linear increase in the processor time required
`for the calculations.
`The ligands grown by this procedure may have the
`original seed location at either the C-orN-terminal
`end, or somewhere in the middle of the peptides,
`depending on which directional control option was
`specified by the user. In follow-up visual examina(cid:173)
`tions to determine how well the peptides appear to
`fit the target site, it has been our experience that the
`match is extremely good, often rivalling or surpass(cid:173)
`ing the visual fit in X-ray structures of inhibitors
`and enzymes. It has also been our experience, how(cid:173)
`ever, that visual fit can be highly misleading. For
`that reason we routinely subject the results of a
`GROW analysis to more detailed evaluation.
`
`Evaluation of Results
`Although the scoring method used by GROW is
`very effective for guiding the pruning process during
`peptide growth, it is still only a rough estimator of
`binding affinity. Before the synthesis and testing of
`any structures designed by GROW are warranted, it
`is necessary to further prune the results by more
`detailed energy-based methods. For this we use a
`two-part process. First, the peptide/receptor com(cid:173)
`plex, the unbound receptor, and the unbound pep(cid:173)
`tide are subjected to energy-based optimizations
`with Batchmin's AMBER
`implementation and
`Scheraga's solvation model. From these energies, an
`estimation of the energy of binding can be made:
`
`E(binding) = E(complex) -
`E(unbound receptor) - E(unbound peptide).
`
`If this estimated binding energy is unfavorable, the
`structure can be rejected and the next one tested, or
`a new seed position can be selected for another
`GROW procedure. Otherwise, a second step is car(cid:173)
`ried out, in which conformational analysis ofthe un(cid:173)
`bound peptide is performed to attempt to find its
`lowest energy conformation. This can be accom(cid:173)
`plished relatively quickly using simulated anneal-
`
`ing methods. 32 If conformations are found which are
`so low in energy that the estimated binding energy
`is no longer favorable, the structure is rejected. In
`other words, if too great an energy penalty must be
`paid to take the ligand from its low-energy solution
`conformation(s) to a solution conformation which is
`close to that of its bound state, then this energy cost
`can more than offset any energy gained due to bind(cid:173)
`ing.
`This follow-up energy evaluation is not actually
`part of the GROW algorithm and could be replaced
`by more lengthy calculations of binding free energy.
`Regardless of the method used, however, it is essen(cid:173)
`tial that it take into account the effects of solvation
`on peptide/receptor interaction. These effects are
`quite large, and in most cases represent a major
`driving force for binding. It is for the same reason
`that the solvation terms have been included in
`GROW's scoring function.
`
`Growth Options and Control
`If the GROW algorithm is allowed to consider on
`an equal basis each member of the template library
`during template selection then it is effectively sam(cid:173)
`pling very broad regions of the conformational space
`accessible to each residue. This method of operation
`is referred to as "unrestricted growth." There are
`also instances in which it is desirable to restrict the
`ligand design process, to guide the outcome to sat(cid:173)
`isfy certain constraints. For example, if the struc(cid:173)
`ture of an enzyme-peptide complex is known, it has
`proven useful to have GROW design new ligands
`which have the same general conformation as the
`known structure, but with different amino acid se(cid:173)
`quences. This is referred to as "restricted growth."
`The appropriate restriction parameters (in this case,
`the backbone <I> and ljJ angles ofthe known structure)
`are specified through the use of keywords in the
`command file passed to the program along with the
`site-and-seed coordinate file mentioned earlier. Dur(cid:173)
`ing peptide growth, only those templates that satisfy
`the specified constraints are selected for scoring and
`attachment to the evolving ligand. If an active pep(cid:173)
`tide ligand's sequence is known, but its bound con(cid:173)
`formation is not, it can be useful to specify that se(cid:173)
`quence and let GROW generate feasible binding
`conformations for the peptide. In this case, GROW
`functions more as a ligand model-building tool than
`as a ligand design program. In both of the preceding
`examples of restricted growth, the procedure usually
`takes less than 5 min (on a VAX 8800 computer) to
`generate peptides of length 6-8 residues. Unre(cid:173)
`stricted growth, such as would be appropriate in the
`common situation where the structures of bound
`ligands are not known, generally takes 40-50 min
`for similar-length peptides.
`The user may also control (through command file
`keywords) the shape of the search tree that defines
`the number of peptides to be retained at each stage
`
`
`
`COMPUTER DESIGN OF BIOACTIVE MOLECULES
`
`321
`
`Command
`
`DIELd
`
`NRES n
`
`SUB s1,":!····sn
`
`CONF c1 .~ .... cn
`
`DIR x
`
`NINB
`
`PSIS v 1.v2 .... ljln
`
`PSTL ov1.ov2 ... ovn
`
`NROT res,nconf
`
`NBFFf
`
`ANNL n,t1 .~.tm
`
`Parameter Description
`
`Dielectric constant to be used in the electrostatic energy calculations.
`
`Length of the peptide (number of residues) to be grown.
`
`A numerical sequence specifying the order in which the residues are to
`be connected.
`
`r; is the name of a residue to be
`Residue sequence to be grown.
`c