TtlraMdton Vol.47, No. 43, pp. 8985·8990, 1991
`Printed in 0101 Britain
`OOW-4020,91 S3.oo+.oo
`© 1991 Pcrga.mon Pre;~ pk
`Automatic Creation of Drug Candidate Structures Based
`on Receptor Structure.
`Starting Point for Artificial Lead Generation.
`Yoshlhiko Nishibata and Akiko ltai•
`Faculty of Pharme.ceuticaI Scie~es. University of Tokyo
`7-3·1. Bunkyo·ku, Tokyo, 113, Japan
`(Received in Japan 12 Septen1ber 1991)
`Key Words: Lead Generation; Computational Chemi1oy. Drug Design
`\Ve ha~·e dn·eloped a new method for automatic general/on of drug candidate structutes
`In out method, \'OriOUI slructures which /ii well la /he receplor
`based on a known receptor suue1u1e.
`r:=ity are gtnetatcd, by adding atoms on.t by on.e using a forct field and random numbers. Tht ustfulnes.s
`of tht program wos txernplified by application 10 the£. coli dihydrofolate rtductast systtm. From
`dote/IS of gen,uottd str11ctwes, we could obtain, sn·era/ promi.Jing new structures with considerable inttr·
`nal stability and hav/n,g fa\'On1b!e interactions wilh the receptor cavity. It is o:pecttd that this method
`will become an esu111ial starling poin.tfor artificial /tad gtntrarion, which hru bun impossible so far.
`For the purpose of developing excellent drugs efficiently, it is necessary to establish rational
`approaches for drug design. To develop 1nethods for finding lead compounds artificially and ra(cid:173)
`tionally is especially important, since such compounds have mostly been found by chance so far.
`111e drug-receptor interaction seems to be the most useful basis for that purpose. The real nature of
`the drug-receptor interaction has been clarified in a number of cases by detcnnination of 1he three-di(cid:173)
`mensional structures of protein-ligand complexes at the atomic level by X-ray crystal analyses.
`There are many drug molecules which are known to bind to the same site of a protein, in spite of
`large discrepancies in chemical structures. This is because it is neither the chemical structure nor the
`molecular skeleton, but rather the complementaritics in molecular shape and submolecular physical
`and chemical properties that are important for specific binding to the same receptor site. This
`strongly suggests that if we can design and synthesize a molecule with a molecular shape and sub(cid:173)
`molecular properties which will complement those of the receptor cavity, it should be able to bind to
`the receptor specifically. But, it is not easy to design such molecules with new skeletal sUUctures
`manually due to lack of objectivity.
`In recent years, techniques for solving protein crystal struclures have made remarkable
`progress. Moreover, there have been great advances in biochemical techniques such as isolation,
`purification and protein engineering. A number of biologically important protein slructures have
`been elucidated or are being elucidated. 1be 1h.ree-dimenslonal structures of proteins have been used
`for interpreting the biological activities and elucidating the biochemical mechanisms involved by
`docking simulation. One of the rational approaches to the modification of ligand structures {mainly
`by replacing or adding substitucnt groups) has been the docking sinmlation, although it has not been
`utilized directly for designing new structures so far.
`Breckenridge Exhibit 1028
`Breckenridge v. Novartis AG

`ln order to generate lead compounds with new skeletal structures ab inirio. we require new
`strategies. If computers can provide us with possible ligand structures which can strongly bind 10
`the protein, we would be able to generate new lead compounds artificially, based on the individual
`features of the receptor structure.
`\Ve have developed a new me1hod and a computer program for this purpose, 1 i.e., for generat(cid:173)
`ing drug candidate structures which fit well to the receptor cavity, based on the receptor structure.
`\Ve named the program "LEGEND". A large number of possible structures generated are selected by
`another program, named "LORE", based on energetic and structural considerations. Here, we de(cid:173)
`scribe the method and the results of its applica1ion lo the£. coli dihydrofolate reductase system.
`For the de novo generation of a molecular structure, we must prepare posi1ions and types of
`atoms, and types of bonds in the nlolecule. The program provides sixteen atom types which
`discriminate the combination of atomic element and its hybridized state, i.e., sp3 carbon, aromatic
`carbon, carbonyl oxygen, amino nitrogen and so on. It provides five bond types, which are single,
`double, triple, aromatic and amide.
`The size of the structure to be generated is specified by a number of atoms in the input data.
`The relative ratio of appearance in a structure for each atom type is given in the program, Internal
`atomic positions can be defined by geometrical parameters (bond lengths and angles) and confom1a(cid:173)
`tional parruneters (dihedral angles). The fonner can be assunied to be the standard values used in the
`conventional force field. But, the latter should be detcnnined by using random numbers, as should
`the atotn t)'pe and the bond type. In this program, we make use of random numbers in order to de(cid:173)
`tennine all the unsettled quantities or to choose items from those prepared in the program. Random
`numbers used are those output from the computer sequentia1ly.
`TI1e atomic coordinates of a protein molecule are read in PDB file format.2 The preparations
`before starting LEGEND are as follows: for the high-speed calculation of the intermolecular interac(cid:173)
`tion energies using tabulated data, a three-dimensional grid is generated inside the ligand binding site
`of the protein,3 Then, at each grid point, van der Waals interaction energies are caJculated between
`protein atoms and a probe (carbon, nitrogen, oxygen, hydrogen) a1om located on the grid. The pro(cid:173)
`gram uses the Ml\i24 force field and parameters. The electrostatic potcntia1 at each grid point is also
`computed using the atomic charges on the protein atoms which are taken from those for individual
`amino acid residues in the AMBER program.~ The tabulated data are used for energy estimation at
`every step of new atom generation and also for structure optimization of the generated raw molecule.
`The fundamental process of a structure generation by the LEGEND program consists of three
`steps, as follows. The process is shown in detail in the flow chart in Fig. 1.
`Stage I · Generatjon of the first atom. An anchor atom is selected by use of a random number from
`among several hydrogen-bonding heteroatoms in the protein, specified beforehand. The position and
`the atom type of the first at01n are detem1ined so as to make a hydrogen bond to the anchor atom.
`Step 2 : Subsequent generntjon of atoms The second atom and subsequent atoms are generated one
`by one by the following procedure, up to the specified number of atoms for a molecule. For every
`new atom, a root atom is chosen from all the previously generated atoms by using a random number.
`The atom type of the new atom and the bond type of the bond between the root awm and the new
`atom are also given by random numbers. Then, the position of the atom is determined by random
`numbers by choosing a point on the circle which is defined by the bond length and the bond angle
`from the root atom. The values of the bond length and angle used are assigned according to the array
`of related atom types. taken from the MM2 program. If the position of the atom is nol acceptable due
`to the violation of van der \Vaals radii of the previously generated atoms or unstable intennolecular
`ViiJl dcr \Vaals interaction energy, the program reassigns lhe root atom and auempts to find an accept·
`able new atom. If 1he attempts fail after a given number of repeats, the program tracks back to the

`Creation of drug candidate structures
`preceding step, i.e., it withdraws the last one of previously generated atoms and re-generates a new
`Step 3 · Cornoletjon of the molecular structure The program completes by adding missing carbon
`atoms to fragmentary aromatic rings, and supplies hydrogen atoms for all remaining valencies of all
`nonhydrogcn atoms, The atomic charges in 1he molecule is calculated by Del Re's method,6 Finally,
`the structure is optimized by the Simplex. method.
`'ANCHOR' Atom
`I" -
`- - - - - - -
`ROOI Atom
`- - - -
`- - ,
`Return To
`P1avious Step
`~~~--•(MotofN·1 Atom) ,
`Too ti.any
`NeJ;tStep(tMlofN+1 A!om)
`Stlge 2:Gro-,i,th of the Stnx:11.1rc
`Stage 3: Completion or the Stnlcture
`Ag. 1. Flow chart of the LEGEND program
`Thus, the LEGEND program goes on generating structures one after another up to the maxi(cid:173)
`muin nun1ber of structures specified in the input data. From among the generated structures, a rather
`small nun1ber of structures arc selected by the program LORE. Selections can be made on the basis
`of various energetic values, as well as some indices related to structural features,
`A11 Application lo E. coli dihydrofolate reductase
`In order to verify the usefulness of our method, we have applied the program system to£. coli
`dihydrofolate reducrase, whose three-dimensional structure has been elucidated by X-ray crystallo(cid:173)
`graphic analysis as a ternary complex with coenzyme NADPH and folic acid. 7 The atomic coordi(cid:173)
`nates are available from the Protein Data Bank. \Ve have used the protein structure bound with
`NADPH, removing the folic acid molecule. Three hydrogen-bonding atoms, the carbox.yl oxygen of
`ASP 27, carbonyl oxygen of ILE 5 and carbonyl ox.ygen of ILE 94 were chosen as candidates for the
`anchor atom. A full automatic srrucmre generation by the LEGEND program was perfonned using
`the follo\ving conditions: the number of atoms in a molecule 30; the number of molecules to be gen(cid:173)
`erated 300; the minimum and maximum threshold energies 6.0 kcal/mol and 12.0 kcal/n1ol, respec(cid:173)
`tively; the maximum number of iterations in atom generation 20; the number of iterations in back(cid:173)
`tracking 3; the minimum number of rings in a molecule 2.
`A total of 300 structures were generated by the LEGEND program. Nine structures were se(cid:173)
`lected with the LORE program by using the following criteria: minimum number of hydrogen bonds

`2; the maximum inter-molecular van der Waals and electros1atic energies 50.0 kcaVmol and the
`maximum total (imer· and intra-molecular) van der 'Vaals energies 50.0 kcal/mo!.
`He. Ril<G
`Vtt.I DIERG\'
`-2 .122
`2. 267
`-s. 850
`-1. 531
`-91, 556
`' Legend O\Jtput Retrieval Engine .....
`..... LORE
`Select ' !!OlS I<= 300 Xols
`--- ----- --- -- -- --- --- --- ------ ----------- ---
`-97.lH _,, m
`-113.576 _,, 124
`-116 "'
`-195 "'
`-6. sso
`-2. 519
`-0. 036
`s. 250
`18. 722
`" 510
`20 086
`50 606
`12. 993
`lS. 9)7
`11 .442
`11 . 284
`"· m
`'· 110
`11/fORYATION !or lroLf
`NI 11
`HllOND: NI 11
`HBONO: lH 11
`"' 21
`"' 21
`H{ 21
`HI 21
`H{ 21
`HI 2)
`HI 2)
`c o< ILES
`o o< ILES
`OH of TYR100
`"' ILES
`0 o< ILES
`ll of '"''
`CA of '"''
`Oil of TYRlOO <
`H(CG2l o<
`ll(CA) o< '"''
`Fig. 2. OutpUI from the LORE program
`2. 586
`2. 592
`In addition to the file output of the three-dimensional atomic coordinates of the selected struc·
`turcs, the LORE program outputs a summary of the selected s1ructures as shown in Fig. 2. The
`various energy values and some structural features for the nine structures (upper), and the inter(cid:173)
`molecular distance infonnation (and interaction type) for each structure (lower) are listed up. Some
`of chemical structures of the output structures from LORE are shown in Fig. 3.
`HO'CetJ) H~-{J--t{ HOm
`J "'
`I "'
`~ ! ...-:N
`I fl
`H~CH, H:¥
`HO 0
`Fig, 3. Chemical s!ructures chosen wilh LORE from 300 generated structures

`Creation of drug candidate structures
`In order to examine the conformational and geometrical stability of the generated structures, \Ve
`have optimized one of the structures by the PM3 method using the MOPAC program. 8 (In this ex(cid:173)
`a1nple, structure #8 was selected.) The optimized structure was compared with the non-optimized
`one by the least-squares superposing method. A stereoview of the superposed structures is shown in
`Fig. 4, The solid line and the dotted line shows the non-optimized and optimized structures respec(cid:173)
`tively. From the high similarity of the two structures, it is strongly suggested that the original, non(cid:173)
`optimii.ed structure is sufficiently stable.
`Fig. 4. Comparison of the PM3-optimized and non-optimized structures
`In Fig. 5, aspects of the intermolecular interactions with the target protein are shown for struc(cid:173)
`ture #8. The hydrogen lx>nds reported in the LEGEND output are represented by dotted lines. These
`hydrogen bonds were searched by atom type, distance of heteroatoms and bond angle of hydrogen
`bonds. In this case, the anchor atom is the carboxyl oxygen of ASP 27. Besides the hydrogen bond
`between the first atom and lhe anchor aton1, the structure fonTis additional hydrogen lx>nds to the car(cid:173)
`bonyl oxygen of ALA 6 and the nitrogen of the indole group of TRP 30.
`ASP27 o ... ~ -<c -··---HN
`!{ ....... HQ
`r(N~ ·.,.,\--{
`Fig. 5. Aspecls of Intermolecular hydrogen bonding with target protein
`Results and Discussions
`The purpose of this study was to develop a n1ethod to obtain diverse receptor-binding struc(cid:173)
`tures with suitable molecular shapes and with suitable functional groups at the proper positions and
`orientations in the molecule, covering all possible structures without prejudice. Lewis has proposed
`a method for the same purpose.9 This is the only paper related to the present problem so far
`published. But, because his method places atoms in a molecule on the lattice points of a diamond
`lattice with a ridge line of the carbon-carbon covalent bond length, it produces only a limited kind of
`the structures cannot contain sp2 hybridized atoms, geometrics apart from exact
`tetrahedral angles or confonnations except for exact trans or gauche torsion angles. As our method is
`based on a new algorithm using random numbers and a force field, the structures generated arc not
`only unlin1ited, but also not unstable internally.

`In the application of our method to the dihydrofolate reductase system, we have established that
`our program can genera1e a wide variety of structures without any chemical inconsistency. All of
`them \vere proved to have shapes well fitting the receptor cavity. They were also considered to have
`stable geon1etries and confonnations based on a comparison of one of the structures with the struc(cid:173)
`ture optimized by using the PM3 program.
`The minimum requirement for a drug to bind specifically to its receptor is a good
`co1nplementary fit of the molecule to the ligand-binding cavity of the receptor. In the case of a
`flexible molecule, the requirement is to be able to adopt a confonnation which results in such a
`molecular shape \vith reasonable stability. In addition, it is desirable 1hat the structure has as many
`functional groups as possible \Vhich can interact with those in the receptor molecule, by hydrogen
`bonding, electrostatic interactions, hydrophobic interactions and others,
`111e program starts generating a n1olecule from a hydrogen·bonding heteroatom \vith an anchor
`group on the receptor. Other intennolecular inleractions are not positively taken into account during
`the ato1n generalion process. However some heteroatoms could be placed at favorable positions for
`polar interactions by chance. Several attempts to positively incorporate polar interactions \Vith the
`receptor into the structure·generation process by LEGEND are under investigation.
`Out of a large nun1ber of structures generated by the LEGEND program, a rather small number
`of pron1ising structures should be chosen by the LORE program, because graphical selection from a
`large number of structures is difficult at the present stage. In the example, we have used energetic
`values (intramolecular and intennolecular vdw and electrostatic energy, as well as total energy) and
`some indices including structural features (number of intennolecular hydrogen bonds, number of
`rings in the structure) as criteria for selection. Further, improvements of the criteria in the LORE
`progrrun are desirable in order to choose promising structures efficienlly.
`Starling with the structures selected by the LORE program, structural modification and
`selection should be made fron1 the viewpoint of synthetic chemistry. It is also very important to
`modify the structures by replacing aton1s so as to fonn favorable interactions \Vith receptor, on an
`interactive graphic display.
`Computer simulations of s1abilities, physical properties and molecular interactions would be
`useful for further modification and selection, before synthesizing the most promising compounds. If
`a synthesized compound were proved to be active by a receptor-binding assay, even if its potency
`were low, it 1night become a lead con1pound. After that, an elaborate optimization process for bio(cid:173)
`logical activity would be necessary just as in conventional drug developn1enL
`\Ve have developed a new method, for artificial lead generalion based on the threc·ditnen(cid:173)
`sional slructure of the relevant receptor. The method should becon1e an essential tool for rational
`drug design.
