`
`Computer Design of Bioactive Molecules: A Method for
`Receptor-Based de Novo Ligand Design
`Joseph n. 1\loon and ,V, Jeffrey Howe
`Computational Cheniistry, Upjohn Laboratories, Kalamazoo, /i.fichigan 19001
`
`ABSTRACT
`The design of n1olecules to
`bind specifically to protein receptors has long
`been a goal of cornputer-ussisted molecular de(cid:173)
`sign. Given detailed structural kno\vlcdge of
`the target receptor, it should be possible to con(cid:173)
`struct a model of a potential ligand, by algorith·
`mic connection of small molecular fragments,
`that \Vilt exhibit the desired structural and elec(cid:173)
`trostatic compleinentarity with the i·eceptor.
`llo\vever, progress in this area of receptor·
`based, de novo ligand design has been ham(cid:173)
`pered by the con1plexity of the construction
`process, in \vhich potentially huge nurnbers of
`structures nn1st be considered. By limiting the
`scope of the structure-space examined to one
`particular class of ligands-nan1ely, peptides
`and peptide-like corn pounds-the problem con1-
`plexity has been reduced to the point that suc(cid:173)
`cessful, de novo design is no\V possible. 'l'he
`n1ethodology presented employs a large tem(cid:173)
`plate set of amino acid confol'mations \vhich
`are iteratively pieced together in n n1odel of the
`tal'get receptor. Each stage of ligand gro\vth is
`evaluated according to a 1nolecular mechanics(cid:173)
`based energy function, which considers van der
`'Vaals and coulombic interactions, internal
`strain energy of the lengthening ligand, and de(cid:173)
`solvation of both ligand and receptor. The
`search space is managed by use of a data tree
`which is kept under control by pruning accord(cid:173)
`ing to the energy evaluation. Ligands gro\vn by
`this procedure arc subjected to follo,v·up eval(cid:173)
`uation in \vhich an approxin1ate binding en(cid:173)
`thalpy is deter1nined. This n1ethodology has
`proven useful ns a precise n1odcl-builder and
`has also sho\vn the ability to design bionctive
`ligands.
`
`INTRODUCTION
`The ability of a molecule, such as a drug, to exert
`a desired biological effect is often related to its af(cid:173)
`finity for one or more endogeneous receptor n1ole(cid:173)
`cules. For a ligand to interact optimally \vith a re(cid:173)
`ceptor, it 1nust be able to attain a shape which is at
`least partly co1nplen1entary to that of a binding lo(cid:173)
`cation on the receptor. Additionally, other factors
`such as electrostatic interactions, hydrogen bond(cid:173)
`ing, hydrophobic interactions, desolvation effects,
`and cooperative motions of ligand and receptor all
`
`© 1091 \VILEY-LJSS, INC.
`
`influence the binding event and should be taken
`into account in attempts to design bioactive ligands.
`Processes such as distribution and n1etabolisn1,
`'vhilc they play a critical role in the delivery of the
`putative ligand to the receptor location, do not re(cid:173)
`flect a con1pound's "intrinsic activity" and lie out(cid:173)
`side the scope of the current discussion.
`ln principle, it should be possible to design mole(cid:173)
`cules that \viii bind to a preselected site on a recep(cid:173)
`tor. This is not a simple undertaking, since in n1ost
`design situations little or no structural infonnation
`exists to characterize the receptor. One can, ho\V·
`ever, use "indirect" methods 1 to exploit \Vhat is
`kno\vn about molecules that elicit the desired bio(cid:173)
`logical response (assun1ing that they interact with
`the same receptor) to generate a structural and elec(cid:173)
`tronic hypothesis of \Vhat the receptor recognizes or
`\Vill accept. Various computer-based methods have
`been developed to assist in this kind of study. 1
`-R
`Once the hypothesis has been generated it can he
`used to suggest n1olecular modifications to improve
`the activity of kno,vn ligands or to identify entirely
`new structural classes (lead compounds) for study as
`potential ligands. 'fhe latter can be accon1plished
`via searches over large databases of 3D molecular
`structures to identify molecules \vhich match the hy(cid:173)
`pothesized requirernents for activity_fl-tB
`The increasing availability of biomacro1nolecule
`structures that have been solved crystallographi(cid:173)
`cally has prompted the development of "direct" con1-
`putational methods for n1olecular design, in \Vhich
`the steric and electronic properties of receptor bind(cid:173)
`ing sites are used to guide the design of potential
`17
`12
`ligands. 1
`19 Direct n1ethods generally fall
`•11 ·
`•
`-
`into two categories: (1) design by analogy, in \Vhich
`3D structures of kno\vn molecules (such as fron1 a
`crystallographic database) are placed in the receptor
`structure and scored for goodness-of-fit; and (2) de
`novo design, in which the ligand model is con(cid:173)
`structed piece,vise in the receptor. The latter ap(cid:173)
`proach, in particular, offers considerable promise for
`the development of novel molecules, uniquely de(cid:173)
`signed to bind to the target.
`
`Re<:eivedAugust 23, 1900; revision aro::pted r-.farch 15, 1991.
`Address reprint requests to either author, 'fhe Upjohn Com(cid:173)
`pany, Computational Chemistry, 301 Henrietta St., Kalama(cid:173)
`zoo, r-,[1 4900 I.
`
`Breckenridge Exhibit 1013
`Breckenridge v. Novartis AG
`
`
`
`COMPUTER DESIGN OF BIOACTIVE MOLECULES
`
`315
`
`\Vhile exan1ples of successful, computer-assisted,
`de novo design can he found, 20 there are no examples
`of auton1ated, or computer-driven, de novo construc(cid:173)
`tion in the literature (although \Vise et al. 21 have
`reported using
`the structure-building program
`GENOA 22 to generate molecules to match a require(cid:173)
`n1ents hypothesis). The term "automated de novo
`design" is used here to refer to the algorithmic con(cid:173)
`struction of a putative ligand from sn1a1l fraginents,
`guided by sWric and electronic constraints irnposed
`by the receptor, plus appropriate consideration of
`salvation effects and internal strain energy of the
`ligand.
`In a recent series of pnpers,23
`- 2
`(1 Dean and co(cid:173)
`\vorkers describe a four-step stratebT)' for automated,
`de novo drug design. Although their goal has not yet
`been achieved, there has been considerable progress
`in algorithm development. Furthermore, their stud(cid:173)
`ies make clear the con1plexity of the de novo con·
`struction problen1 as \vell as the importance of de(cid:173)
`veloping noncombinatorial approaches. In our \vork,
`we have chosen to focus on one particular region of
`the large structure-space that is ultimately the de(cid:173)
`sign territory of such n1ethods. By confining the
`search space l-0 consider only amino acids and re(cid:173)
`lated fragments as the molecular building blocks,
`the construction proble1n has become quite tracta·
`hie, and we are able to report the first examples of
`bioactive ligands designed by auton1ated de novo
`n1ethods. The putative ligands that result fro1n this
`construction method are peptides and peptide-like
`con1pounds rather than the small organic molecules
`that are typically the goal of drug design research.
`The appeal of the peptide building approach is not
`that peptides are preferable to organics as potential
`pharn1aceutical agents, but rather that: (1) they can
`be generated relatively rapidly de novo; (2) their en(cid:173)
`ergetics can be studied by well-parameterized force
`field methodsi (3) they are much easier to synthesize
`than are most organics; and (4) they can be used in
`a variety of ways, for peptidomimetic inhibitor de(cid:173)
`sign, protein-protein binding studies, and even as
`shape templates in the more commonly used 3D or·
`ganic database search approach described above. \Ve
`also sho\v that the method need not be restricted to
`just the 20 natural amino acids; it can easily be ex(cid:173)
`tended to include other related fragments of interest
`to the medicinal chemist.
`
`METHODS
`Description of the GRO\V Method
`
`Overview
`The de novo peptide design method has been in(cid:173)
`corporated in a soft.ware package called GRO\V. In a
`typical design session, standard interactive gTaphi(cid:173)
`cal modeling methods (using the l\tosaic soft\vare
`systeni,27 which is based on 11acro~fodel28) are em(cid:173)
`ployed to define the structural environment in which
`
`GRO\V is to operate. The environment could be the
`active site cleft of an enzyme, or it could be a set of
`features on a protein surface to which the user \Vishes
`to bind a peptide-like molecule. The GRO\V program
`then operates independently of the user to generate
`a set of potential ligand molecules. Interactive mod(cid:173)
`eling methods then come into play again, for exam·
`ination of the resulting molecules, and for selection
`of one or more of them for further refinement.
`'l'hemethod is designed to construct peptide models
`from a user-selected starting position by iteratively
`piecing together an1ino acids in conformations which
`will interact most favorably with the atoms in the
`receptor site. For input, GRO\V operates on an atomic
`coordinate file generated by the user in the interac(cid:173)
`tive 1nodeling session, plus a small fragment (an
`acetyl gi·oup) positioned in the receptor to provide a
`starting point for peptide growth. These are referred
`to as "site" aton1s and "seed" atoms, respectively. A
`second file provided by the user contains a nun1ber of
`control parameters to guide the peptide growth.
`The operation of the GRO\V algorithm is concep(cid:173)
`tually fairly sin1ple, and is summarized in Figure 1.
`GROW proceeds in an iterative fashion, to system(cid:173)
`atically attach to the seed fraginent each an1ino acid
`template in a large preconstructcd library of an1ino
`acid conforn1ntions. \Vhen a template has been at(cid:173)
`tached, it is scored for goodness-of-fit to the receptor
`site, and then the next te1nplate in the library is
`attached to the seed. After all the templates have
`been tested, only the highest scoring ones are re(cid:173)
`tained for the next level of growth. This procedure is
`repeated for the second growth level; each library
`template is attached in turn to each of the bonded
`seed/a1nino acid 1nolecules that were retained from
`the first step, and is t.hen scored. Again, only the
`be.st of the bonded seed/di peptide molecules that re·
`sult are retained for the third level of gro\vth. The
`growth of peptides can proceed in the N-to-C direc(cid:173)
`tion only, the reverse direction only, or in alternat(cid:173)
`ing directions, depending on the initial control spec(cid:173)
`ifications supplied by the user. Successive growth
`levels therefore generate peptides that are length(cid:173)
`ened by one residue. The procedure terminates when
`the user-defined peptide length has been reached, at
`\Vhich point the user can select from the constructed
`peptides those to be studied further. The resulting
`data provided by the GRO\V procedure include not
`only residue sequences and scores, but also ato1nic
`coordinates of the peptides, related directly to the
`coordinate system of the receptor site atonlS. In the
`follo\ving sections we examine in n1ore detail the
`individual con1ponents that co1nprise the basic pro(cid:173)
`cedure just described.
`
`Library co11struction
`Because 1nost amino acids are quite flexible, a
`large number of ten1plate structures must be tested
`during the gro\vth procedure to ensure adequate
`
`
`
`316
`
`J.B. ~fOON AND \V.J. HOWE
`
`SETUP:
`(a) Interactive modeling: select site atoms ~
`(b) Select seed position - - - - - -
`
`(c) Specify control parameters t
`
`Template
`Library
`
`GROW:
`
`monopeptides
`
`dipeptides
`
`n-peptides
`
`------
`
`j\ ;[\ ~;[\
`
`A
`
`at1ach ench template to seed; score
`
`" c,
`"' .,
`
`keep 10 best C(Jnstructs
`
`at1ach each template to each construct
`kept: score
`
`keep 10 best
`
`ilera1o over C and D
`
`0
`
`o-- F' slop al roquos!od peptide length, keep
`
`10 best
`
`EVALUATE: ln\eraclive modeling, ba\ch energy minimization:
`(a) Minimize ligand/site together and separately
`(b) Determine approximate binding energy
`
`Fig. 1. Schornatic overview of the operation of the GROW
`algorithm. The site·and seed coordinate file and the c<Jmmand file
`(described later) are provkled to the GROW procedure by the
`user. Growth can be visualized as a tree process in which each
`library template is attached to the seed (A) and then evaluated by
`the scoring function. Of the resulting 6000+ constructs, the 10
`best are kepi for the no;d level (B). 10 is the dofault retention; a
`command file keyword can be used to broaden the search al any
`stage. To each retained monopeptide/seed construct are attached
`
`all library templatos, which are again scored (C). After pruning (0),
`the process is repeated (E) unN the specified peptide length
`(specified in the command file, see Fig. 5) is reached (F). In this
`treo diagram, circles represent thos.o nodes se!octed (based on
`highest scorns across tho onliro lave!) for further gro·Nth. Unclr·
`cled nodes are pruned. Horizonlal dots deno!o e-0ntinuation
`aCIOSS all temp!a!e additions, and vertical dots ropresent the iter(cid:173)
`ative pro<:Bss of tree growth.
`
`coverage of the conformational space accessible to
`each residue. The template library \Vas generated
`\vith the Mosaic modeling prograrn in conjunction
`with the 1facro1fodel/BatchNiin28 (version 2.5) im(cid:173)
`plen1entation of the AMBER 29 forcefield. The san1e
`forcefield implementation was used for all energy(cid:173)
`related work described herein. Starting models of
`the 20 standard a1nino acids were constructed as N(cid:173)
`acetyl-N'-methylamides (Fig. 2A), follo,ved by en(cid:173)
`ergy n1inimizat.ion. * 'fhe models were then subjected
`to a search procedure in which conformers were gen(cid:173)
`erated by varying all flexible torsion angles in the
`amino acids by rando1n increments. Any conformer
`
`*Unless otherwise indicated, the convergence criterion used
`for all energy minimizations discussed in this paper was an
`rms gradient of <0.1 kcal/A, with the BaV:hMinfltfacro11odcl
`PRCG minimi~.cr.
`
`which contained t\VO nonbonded heavy ato1ns at a
`separation of <2.0 A was discarded. After 3,000 to
`5,000 viable conformations were produced for each
`amino acid, the structures were subjected to a par(cid:173)
`tial energy minimization (15 iterations of block di(cid:173)
`agonal Ne\vton-Raphson 1ninilnization) to relieve
`significant internal strain energies. At this point,
`each conformation was compared to every other con(cid:173)
`formation so that duplicate structures would be dis(cid:173)
`carded. ~vo conforn1ations \vere considered to be
`identical if no atomic posit.ions differed by more thnn
`0.3 A when the structures \Vere aligned by superpo(cid:173)
`sitioning of their N.terminal amide at<ims. The re·
`maining conformations were sorted in ascending en(cid:173)
`ergy order and \Vere stored in the template library
`along with their energies. Templates of nonstandard
`amino acids, pseudodipeptidcs, and organic terminal
`groups \Vere constructed in the same manner, e1n-
`
`
`
`COMPUTER DESIGN OF BIOACTlVE MOLECULES
`
`317
`
`0
`
`A
`
`B
`
`c
`
`-----------,----......
`template 1
`... ..
`1
`0
`
`N
`
`I
`
`R
`
`I
`
`template2
`
`N
`
`:
`1
`
`I
`
`.-----------------------
`:
`H0: -~,,' H
`"'-l
`Y ~ N:
`H
`I
`I
`: 0
`:
`'----L-----------1
`'
`
`..
`R 1 ' ..... ,
`______________________ !
`
`0
`
`r----,
`
`R
`
`) o!
`oi
`-~ '-
`)l}N )__
`0 • H .. II
`
`0
`:
`' /
`•N
`1.- ---J H
`
`(A) Template generation method using phenylalanine
`Flg. 2.
`as an example: hoods marked with arrows are rotated by random
`increments to generate additional conl0<mations. This is followed
`by contact filtering, partial minimization, and dupllcate etimlnalion.
`(B) Template connection method: amkfe end groups are super·
`
`imposed to connect two lemp!ates together in the proper geom·
`elries to form peptides. (C) Template alignment method: the two
`alignments of a template with the seed group are shown. The
`alignment used depends on the direction In which the peptide is to
`be grown.
`
`R
`
`ploying the extended parameter set (in addition to
`the original29 AMBER parameters) provided by the
`Macro~Iodel/Batchl\.fin implementation.
`Figure 3 lists the contents of the template library
`and the number of unique conformations stored for
`each residue. During a GRO\V run, from 300 to
`1,000 lowest energy conforn1ations are typically uti(cid:173)
`lized for each amino acid; the default is 300. For
`comparison, values in parentheses indicate the
`number of initial conformations generated for the
`residues during library construction. Of the 2,000
`trial conformations of alanine, for example, partial
`energy n1inin1ization and duplicate elimination re-
`
`duced the set to 171 unique conforn1ations. As might
`be expected, this type of reduction in the number of
`conformations was not seen with the pseudodipep(cid:173)
`tides and certain of the other residues, due to their
`extreme flexibility. The itnplications of ternplate
`flexibility \Vill be discussed in a later section.
`Application of a partial energy n1inimization dur(cid:173)
`ing library construction produces structures that lie
`near, but not generally at, energetic tninima. Since
`energetic 1ninin1a of a hound ligand \Vill not neces(cid:173)
`sarily correspond to minima of an unbound ligand,
`restriction of templates
`to unbound minimum(cid:173)
`energy conformations represents an un\varrantcd
`
`
`
`318
`
`J.B. MOON AND W.J. HOWE
`
`Standard Amino Acids
`
`Non-standard Amino Acids
`
`H
`
`0
`
`YIJ0~/
`
`0
`
`R
`
`ALA 171 (2000)
`
`LEU 1108 (5000)
`
`ARG 4987 (5000)
`
`LYS 4743 (5000)
`
`ASN 2706 (5000) MET 4661 (5000)
`
`ASP 1505 (5000)
`
`PH E 3485 (5000)
`
`CYS 2123 (3000)
`
`PR 0 53 (2000)
`
`GLN 3734 (5000)
`
`SER 1598 (5000)
`
`GLU 3213 (50-00)
`
`THR 1702 (5000)
`
`GLY 271 (1000)
`
`TRP 4537 (5000)
`
`HIS 4026 (5000)
`
`TYR 4732 (5000)
`
`ILE
`
`1478 {5000)
`
`VAL 346 (5000)
`
`Terminal Groups
`H
`
`ACE
`
`'("'
`
`0
`
`H
`
`1 (1)
`
`BOC ~Or ti..._,
`
`170 {1000)
`
`H
`
`TBA n'1, 88 (1000)
`'yN
`~ H
`
`AMP
`
`N
`
`540 (2000)
`
`BMH '(~;¢/
`
`0
`
`H
`
`2318 (5000)
`
`~i.,,,,)~H
`
`IMG 'y~:(N/
`
`H
`
`0
`
`1132 (5000)
`
`H ""
`L~iH
`
`CHA
`
`H Y"' 3392 (5000)
`Pseudodlpeplldes e.
`
`H
`
`H
`
`FRFd~ N-._
`
`0
`
`0
`
`5000 (500-0)
`
`#~ H
`
`H
`
`NL2 'yr~ r1, 5000 (5000)
`
`0
`
`0
`
`0
`
`Fig. 3. Contents of tho lemplato library. Al present, tho tem(cid:173)
`plate library confalns standard L· and o-amlno acids, several non·
`standard residues, organic terminators, and pseudodipeplides,
`some of which are shown here. The table indicates, for each
`fragment, its 3-character identifier, which can be specified in the
`control file for running GROW in restricted mode, a parenthesized
`
`value which Indicates the number of initial conformalkms gener(cid:173)
`ated for that fragment during library construction, and an unpa(cid:173)
`renthesized valuo which ind:cates the number or conlormatlons
`that survived the partial minimization and duplicate elimination
`steps during library conslruc!ion. Data shown for standard amino
`acids apply equally for L· and D·lorms.
`
`constraint. The collection of amino acid ton1plates
`that resulted from the procedure just outlined rep(cid:173)
`resents a broad san1pling over low-energy conforma(cid:173)
`tional space. The assumption made is that such frag(cid:173)
`ments can be connected together to form peptides
`with low internal conforn1ational energy; adverse
`interactions between residues are dealt \Vith at a
`later stage.
`The acetyl and amide end groups placed on the
`amino acid models serve two purposes. First, they
`produce some of the conformational restriction ex(cid:173)
`perienced by individual amino acids \Vhen they are
`connected in a polypeptide chain. '!'hey also provide
`a convenient way to connect the templates during
`peptide construction; t\VO templates can be joined
`
`together simply by superimposing the N-terroinal
`amide of one te1nplate onto the C-terminal amide of
`another (Fig. 2B).
`
`Seed fragnient positioning
`The placement of the seed fragment, \Vhile sepa(cid:173)
`rate from the GRO\V method itself, has a great in(cid:173)
`fluence on the outcome of a GRO\V procedure. A
`poorly positioned seed can prevent designed peptides
`from reaching important interaction sites in the re(cid:173)
`ceptor. Because of this sensitivity, \Ve have exam(cid:173)
`ined n number of techniques for choosing reasonable
`seed positions. In the fe\V cases in ·which an X-ray
`crystallographic structure of a bound ligand is avail~
`able, atoms within the ligand can be used to form a
`
`
`
`cor.fPUTER DESIGN OF BIOACTIVE hlOLECULES
`
`seed position. If the ligand is a peptide, choosing a
`seed group from the ligand backbone \vill greatly
`in1prove the chances of producing meaningful re(cid:173)
`sults. \Vithout an X-ray structure of a receptor(cid:173)
`ligand Complex, ho\vever, other nlethods must be
`used to generate seed positions. It is possible \Vith
`most modeling systems to manually dock an acetyl(cid:173)
`containing seed fragment into a receptor site. Ho\v(cid:173)
`ever, identifying the optimal placement of seed frag(cid:173)
`n1ents is not a trivial problem. \Vork is in progress to
`develop additional methods for this purpose.
`The technique that we have found most pron1is(cid:173)
`ing in this situation is to employ an auto1natic frag(cid:173)
`ment docking algorithm which positions numerous
`copies of a sn1all amide-containing fraginent in the
`specified site, at locations calculated to provide the
`strongest interactions between the fragment and
`site atoms. Originally based on the shape-matching
`DOCK 1nethodology developed by Kuntz and co(cid:173)
`12 the algorithm has been extended to
`"'orkers, 1
`1,
`score each potential docking orientation based on
`van der \Vaals and electrostatic interactions be(cid:173)
`tween a docked frag1nent and the site ato1ns. Orien(cid:173)
`tations which score belo'v a specified threshold are
`discarded. A large number of overlapping copies of
`the amide-containing fragment generally results
`from this procedure, tracing out pathways in the site
`\vhere ligand interactions with the receptor would
`be st.rongest. From this map one or 1nore seed posi(cid:173)
`tions can be obtained from the location of the amide
`bonds \vithin the docked fragments. Each seed se(cid:173)
`lected defines a separate GRO\V run.
`
`Scoring
`Before describing the method by \Vhich peptides
`are iteratively constructed during a GRO\V opera(cid:173)
`tion, we first examine the scoring function that is
`applied as each ne\v library te1nplate is attached to
`a gro\ving peptide. The scoring function is based on
`potentials from the AMBER force field (as imple(cid:173)
`mented in Macrol'vlodel/BatchMin v2.5), with the
`addition of a salvation treatment developed by
`Scheraga.30 \Vhen a template has been properly ori(cid:173)
`ented to connect to a gro\ving peptide, a score is cal(cid:173)
`culated \Vhich is the sum of five terms:
`
`SCORE = -[Ewlw + E,._,. + Ecnnf +
`EsQlv(te1npl) + Eso1v(re<::)].
`
`The more negative the individual energy tern1s, the
`greater the estimated binding affinity of the tem(cid:173)
`plate, and the higher the score.
`Evdw is the van der Waals energy calculated be~
`t\veen the positioned template and the receptor at(cid:173)
`oms, and between the template and peptide aton1s
`already grown in the receptor site. This is calculated
`using a modified Lennard-Jones 6-12 potential
`with BatchMin's AMBER parameters, for atoms
`within 6 A of the template. A sn1all amount of "free
`
`319
`play" (default nn1ount = 0.1 A) has been incorpo(cid:173)
`rated into the repulsive term of this potential, so
`that van der \Vaals penetrations of less than that
`amount are not penalized. This softening of the re(cid:173)
`pulsion is necessary to compensate for the use of
`template sets which are not continuous through con(cid:173)
`formational space. The degree of softening is con(cid:173)
`trollable by the user, via a parameter passed in the
`control file.
`Ees is the coulombic electrostatic interaction be(cid:173)
`tween the template atoms and those in both the re(cid:173)
`ceptor site and the portion of the ligand already
`grown. Again, a 6 A cutoff is employed for the sake
`of speed. Econf is the conformational strain energy of
`the template. This is a precalculated value which
`represents the energy difference between the ten1-
`plate's conformation and the lowest energy confor(cid:173)
`mation found, a1nong the 3,000-5,000 conforma(cid:173)
`tions examined, for that particular amino acid
`during library creation. 'fhe value is retrieved frorn
`the library during ligand growth. E.'wiv<ten1pl) and
`E.wiv(rec) are salvation terms calculated with Scher(cid:173)
`aga's n1ethod 30
`: they depend on the changes in sol(cid:173)
`vent-accessible surface area caused by moving the
`tcn1plate and receptor from their fully hydrated un(cid:173)
`bound state to their partially hydrated bound state.
`The salvation terms favor the hydration of polar at~
`oms, and pennlize hydration of hydrophobic atoms,
`thereby simulating a hydrophobic binding effect-.
`Solvent-accessible surface areas are determined by
`the approxilnate 1nethod of Still,31 because of its tre(cid:173)
`mendous speed advantage over analytical calcula(cid:173)
`tions. To further enhance the speed of the scoring
`procedure, the unbound salvation energies of the
`te1nplates and of the receptor are calulated only
`once, the forn1er during library construction and the
`latter at the beginning of the GRO\V operation.
`Only the changes in these energies are calculated
`during scoring.
`
`Ligand growth
`Having described the nature of the teinplate li~
`brary and the method by which templates are eval(cid:173)
`uated during ligand construction, \Ve can now exam(cid:173)
`ine the ligand growth procedure, per se. To begin,
`GHO\V aligns all of the library templates in turn
`\vith the seed fragn1ent. This superin1position can be
`accon1plished in either of two ways, as indicated in
`Figure 2C. For the sake of this discussion, if one
`considers just the 20 standard amino acids, with on
`average 300 library conformations for each, a total
`of 6,000 teinplates 1nust be evaluated. These are re(cid:173)
`tained on a "data tree" \Vhich, by this point, contains
`one node for the seed and 6,000 brunches, each one
`representing a template (see Fig. 1). Each node is
`scored according the 1ncthod described in the preced(cid:173)
`ing section. If GRO\V were to continue in this man(cid:173)
`ner, an exhaustive search for the best dipeptide
`would involved 3.6 x 107 evaluations, and 1.3 x
`
`
`
`320
`
`J.B. MOON AND \V.J. HO\\'E
`
`1016 evaluations would be needed to find the best
`tetrapeptide. Instead, the program retains and grows
`from only a certain number of the highest scoring
`branches, usually from 10 to 100 (the default is 10),
`after each tree-level of growth. All other branches
`are pruned from the tree. This cycle of attaching
`te1nplates, scoring templates, and pruning struc·
`tures is repeated until peptides of the desired length
`are bui_lt. Because of the pruning step, the confor~
`mational search performed by GRO\V covers a very
`small fraction of the total number of possible confor(cid:173)
`n1ations, but is heavily biased to,vard peptides
`\vhich have lo\v collforma.tional energy and \vhich
`will interact most favorably \Vith the receptor site.
`Ai-i. attractive feature of this tree structure is that
`increasing the number of templates used by the pro(cid:173)
`gram, or the length of the peptides grown, results in
`only a linear increase in the processor time required
`for the calculations.
`The ligands grown by this procedure may have the
`original seed location at either the C- or N·terminal
`end, or somewhere in the middle of the peptides,
`depending on which directional control option was
`specified by the user. In follow-up visual examina(cid:173)
`tions to determine ho\v well the peptides appear to
`fit the target site, it has been our experience that the
`match is extre1nely good, often rivalling or surpass·
`ing the visual fit in X-ray structures of inhibitors
`and enzymes. It has also been our experience, how(cid:173)
`ever, that visual fit can he highly misleading, For
`that reason \Ye routinely subject the results of a
`GRO\V analysis to more detailed evaluation.
`
`Evaluation of Results
`Although the scoring method used by GRO\V is
`very effective for guiding the pruning process during
`peptide gro\vth, it is still only a rough estimator of
`binding affinity. Before the synthesis and testing of
`any structures designed by GRO\V are warranted, it
`is necessary to further prune the results by more
`detailed energy-based methods. For this we use a
`two-part process. First, the peptide/receptor com·
`plex, the unbound receptor, and the unbound pep(cid:173)
`tide are subjected to energy.based optimizations
`'vith Batchmin's AMBER
`implementation and
`&heraga's solvation model. From these energies, an
`estimation of the energy of binding can be made:
`
`E(binding) = E(complex) ~
`E(unbound receptor) - E(unbound peptide),
`
`If this estimated binding energy is unfavorable, the
`structure can be rejected and the next one tested, or
`a ne'v seed position can he selected for another
`GRO\V procedure. Othenvise, a second step is car(cid:173)
`ried out, in 'vhich conforn1ational analysis of the un(cid:173)
`bound peptide is performed to attempt to find its
`lo\vest energy conformation. This can be accom(cid:173)
`plished relatively quickly using simulated anneal·
`
`ing methods. 32 If conformations are found which are
`so low in energy that the estimated binding energy
`is no longer favorable, the structure is rejected. In
`other words, if too great an energy penalty must be
`paid to take the ligand from its low-energy solution
`conformntion(s) to a solution conformation which is
`close to that of its bound state, then this energy cost
`can more than offset any energy gained due to bind(cid:173)
`ing.
`This follow-up energy evaluation is not actually
`part of the GRO\V algorithm and could be replaced
`by n1ore lengthy calculations of binding free energy.
`Regardless of the method used, however, it is essen(cid:173)
`tial that it take into account the effects of salvation
`on peptideJreceptor interact.ion. These effects are
`quite large, and in rnost cases represent a major
`driving force for binding. It is for the same reason
`that the solvation terms have been included in
`GRO\V's scoring function.
`
`Growth Options and Control
`If the GRO\V algorithm is allo\ved to consider on
`an equal basis each member of the template library
`during template selection then it is effectively sam(cid:173)
`pling very broad regions of the conformational space
`accessible to each residue. 'fhis method of operation
`is referred to as "unrestricted growth." There are
`also instances in which it is desirable to restrict the
`ligand design process, to guide the out.cotne to sat(cid:173)
`isfy certain constraints. For example, if the struc(cid:173)
`ture of an enzyrne-peplide complex is known, it has
`proven useful to have GHO\V design new ligands
`which have the same general conformation as the
`known structure, but \vith different ainino acid se(cid:173)
`quences. 'l'his is referred to as "restricted growth."
`The appropriate restriction parameters (in this case,
`the backbone 4i and !{I angles of the kno\vn structure)
`are specified through the use of keywords in the
`command file passed to the program along with the
`site-and-seed coordinate file mentioned earlier. Dur(cid:173)
`ing peptide growth, only those templates that satisfy
`the specified constraints are selected for scoring and
`attachment to the evolving ligand, If an active pep(cid:173)
`tide ligand's sequence is known, but its bound con·
`formation is not, it can be useful to specify that se(cid:173)
`quence and let GRO\V genel'ate feasible binding
`conformations for the peptide. In this case, GRO\V
`functions ntore as a ligand model-building tool than
`as a ligand design program. In both of the preceding
`examples of restricted growth, the procedure usually
`takes less than 5 min (on a VAX 8800 con1putcr) to
`generate peptides of length 6-8 residues. Unre(cid:173)
`stricted growth, such as \Vould be appropriate in the
`common situation whore the structures of bound
`ligands are not known, generally takes 40-50 min
`for sin1ilar·length pept.ides,
`The user n1ay also control (through command file
`key,vords) the shape of the search tree that defines
`the nu1nber of peptides to be retained at each stage
`
`
`
`COMPUTER DESIGN OF BIOACTIVE .MOLECULES
`
`321
`
`Command
`
`DIELd
`
`11AES n
`
`CONF c1,c,i •.•• c.,
`
`DIR x
`
`NINB
`
`Parameter Description
`
`Oie~odric COffl!anl to Ix! us&d in tho e:octros1atic energy calculations.
`
`Longth of tho poplid