`Printed in Great Britain
`
`0040-40201')1 $3.00+.00
`© 1991 Pergamon Press pic
`
`Automatic Creation of Drug Candidate Structures Based
`on Receptor Structure.
`Starting Point for Artificial Lead Generation.
`
`Yoshihiko Nishibata and Akiko ltai*
`
`Faculty of Pharmaceutical Sciences, University of Tokyo
`7-3-1, Bunkyo-ku, Tokyo, I 13, Japan
`
`(Received in Japan 12 September 1991)
`
`Key Words: Lead Generation; Computational Chemisuy; Drug Design
`
`Abstract: We have developed a new method for automatic generation of drug candidate structures
`based on a known receptor structure.
`In our method, various structures which fit well to the receptor
`cavity are generated, by adding atoms one by one using a force field and random numbers. The usefulness
`of the program was exemplified by application to the E. coli dihydrofolate reductase system. From
`dozens of generated structures, we could obtain several promising new structures with considerable inter(cid:173)
`nal stability and having favorable interactions with the receptor cavity. It is expected that this method
`will become an essential starting point for artificial lead generation, which has been impossible so far.
`
`Introduction
`For the purpose of developing excellent drugs efficiently, it is necessary to establish rational
`approaches for drug design. To develop methods for finding lead compounds artificially and ra(cid:173)
`tionally is especially important, since such compounds have mostly been found by chance so far.
`The drug-receptor interaction seems to be the most useful basis for that purpose. The real nature of
`the drug-receptor interaction has been clarified in a number of cases by determination of the three-di(cid:173)
`mensional structures of protein-ligand complexes at the atomic level by X-ray crystal analyses.
`There are many drug molecules which are known to bind to the same site of a protein, in spite of
`large discrepancies in chemical structures. This is because it is neither the chemical structure nor the
`molecular skeleton, but rather the complementarities in molecular shape and submolecular physical
`and chemical properties that are important for specific binding to the same receptor site. This
`strongly suggests that if we can design and synthesize a molecule with a molecular shape and sub(cid:173)
`molecular properties which will complement those of the receptor cavity, it should be able to bind to
`the receptor specifically. But, it is not easy to design such molecules with new skeletal structures
`manually due to lack of objectivity.
`In recent years, techniques for solving protein crystal structures have made remarkable
`progress. Moreover, there have been great advances in biochemical techniques such as isolation,
`purification and protein engineering. A number of biologically important protein structures have
`been elucidated or are being elucidated. The three-dimensional structures of proteins have been used
`for interpreting the biological activities and elucidating the biochemical mechanisms involved by
`docking simulation. One of the rational approaches to the modification of ligand structures (mainly
`by replacing or adding substituent groups) has been the docking simulation, although it has not been
`utilized directly for designing new structures so far.
`
`8985
`
`
`
`8986
`
`Y. NISHffiATA and A. IT AI
`
`In order to generate lead compounds with new skeletal structures ab initio, we require new
`strategies. If computers can provide us with possible ligand structures which can strongly bind to
`the protein, we would be able to generate new lead compounds artificially, based on the individual
`features of the receptor structure.
`We have developed a new method and a computer program for this purpose, 1 i.e., for generat(cid:173)
`ing drug candidate structures which fit well to the receptor cavity, based on the receptor structure.
`We named the program "LEGEND". A large number of possible structures generated are selected by
`another program, named "LORE", based on energetic and structural considerations. Here, we de(cid:173)
`scribe the method and the results of its application to the £. coli dihydrofolate reductase system.
`
`Methods
`For the de novo generation of a molecular structure, we must prepare positions and types of
`atoms, and types of bonds in the molecule. The program provides sixteen atom types which
`discriminate the combination of atomic element and its hybridized state, i.e., sp3 carbon, aromatic
`carbon, carbonyl oxygen, amino nitrogen and so on. It provides five bond types, which are single,
`double, triple, aromatic and amide.
`The size of the structure to be generated is specified by a number of atoms in the input data.
`The relative ratio of appearance in a structure for each atom type is given in the program. Internal
`atomic positions can be defined by geometrical parameters (bond lengths and angles) and conforma(cid:173)
`tional parameters (dihedral angles). The former can be assumed to be the standard values used in the
`conventional force field. But, the latter should be determined by using random numbers, as should
`the atom type and the bond type. In this program, we make use of random numbers in order to de(cid:173)
`termine all the unsettled quantities or to choose items from those prepared in the program. Random
`numbers used are those output from the computer sequentially.
`The atomic coordinates of a protein molecule are read in PDB file format.2 The preparations
`before starting LEGEND are as follows: for the high-speed calculation of the intermolecular interac(cid:173)
`tion energies using tabulated data, a three-dimensional grid is generated inside the ligand binding site
`of the protein.3 Then, at each grid point, van der Waals interaction energies are calculated between
`protein atoms and a probe (carbon, nitrogen, oxygen, hydrogen) atom located on the grid. The pro(cid:173)
`gram uses the MM24 force field and parameters. The electrostatic potential at each grid point is also
`computed using the atomic charges on the protein atoms which are taken from those for individual
`amino acid residues in the AMBER program. 5 The tabulated data are used for energy estimation at
`every step of new atom generation and also for structure optimization of the generated raw molecule.
`The fundamental process of a structure generation by the LEGEND program consists of three
`steps, as follows. The process is shown in detail in the flow chart in Fig. I.
`Stage 1 : Generation of the first atom. An anchor atom is selected by use of a random number from
`among several hydrogen-bonding heteroatoms in the protein, specified beforehand. The position and
`the atom type of the first atom are determined so as to make a hydrogen bond to the anchor atom.
`Step 2 : Subseguent generation of atoms. The second atom and subsequent atoms are generated one
`by one by the following procedure, up to the specified number of atoms for a molecule. For every
`new atom, a root atom is chosen from all the previously generated atoms by using a random number.
`The atom type of the new atom and the bond type of the bond between the root atom and the new
`atom are also given by random numbers. Then, the position of the atom is determined by random
`numbers by choosing a point on the circle which is defined by the bond length and the bond angle
`from the root atom. The values of the bond length and angle used are assigned according to the array
`of related atom types, taken from the MM2 program. If the position of the atom is not acceptable due
`to the violation of van der Waals radii of the previously generated atoms or unstable intermolecular
`van der Waals interaction energy, the program reassigns the root atom and attempts to find an accept(cid:173)
`able new atom. If the attempts fail after a given number of repeats, the program tracks back to the
`
`
`
`Creation of drug candidate structures
`
`8987
`
`preceding step, i.e., it withdraws the last one of previously generated atoms and re-generates a new
`atom.
`Steo 3 : Comoletion of the molecular structure. The program completes by adding missing carbon
`atoms to fragmentary aromatic rings, and supplies hydrogen atoms for all remaining valencies of all
`nonhydrogen atoms. The atomic charges in the molecule is calculated by DelRe's method.6 Finally,
`the structure is optimized by the Simplex method.
`
`+ I
`
`Return To
`:
`Previous Step
`,--.,_.::-:"=:--.,(Mol of N-1 Atom) 1
`I
`
`Next Step (MOl of N+1 Atom)
`
`----------------------J
`
`1
`
`Saage I: Generation of I st Atom
`
`Stage 2:Growth of the Snucture
`
`Saage 3: Completion of the Snucture
`
`Fig.1. Flow chart of the LEGEND program
`
`Thus, the LEGEND program goes on generating structures one after another up to the maxi(cid:173)
`mum number of structures specified in the input data. From among the generated structures, a rather
`small number of structures are selected by the program LORE. Selections can be made on the basis
`of various energetic values, as well as some indices related to structural features.
`
`An Application to E. coli dihydrofolate reductase
`In order to verify the usefulness of our method, we have applied the program system to E. coli
`dihydrofolate reductase, whose three-dimensional structure has been elucidated by X-ray crystallo(cid:173)
`graphic analysis as a ternary complex with coenzyme NADPH and folic acid.7 The atomic coordi(cid:173)
`nates are available from the Protein Data Bank. We have used the protein structure bound with
`NADPH, removing the folic acid molecule. Three hydrogen-bonding atoms, the carboxyl oxygen of
`ASP 27, carbonyl oxygen of ILE 5 and carbonyl oxygen of ILE 94 were chosen as candidates for the
`anchor atom. A full automatic structure generation by the LEGEND program was performed using
`the following conditions: the number of atoms in a molecule 30; the number of molecules to be gen(cid:173)
`erated 300; the minimum and maximum threshold energies 6.0 kcal/mol and 12.0 kcal/mol, respec(cid:173)
`tively; the maximum number of iterations in atom generation 20; the number of iterations in back(cid:173)
`tracking 3; the minimum number of rings in a molecule 2.
`A total of 300 structures were generated by the LEGEND program. Nine structures were se(cid:173)
`lected with the LORE program by using the following criteria: minimum number of hydrogen bonds
`
`
`
`8988
`
`Y. NISffifJATAandA. ITA!
`
`2; the maximum inter-molecular van der Waals and electrostatic energies 50.0 kcal/mol and the
`maximum total (inter- and intra-molecular) van der Waals energies 50.0 kcal/mol.
`
`***** LORE
`
`: Legend Output Retrieval Engine *****
`
`LOAD
`
`: Select 9 Mols from 300 Mols
`
`NO
`
`ID
`
`HB RING
`
`VOW ENERGY
`Intra
`Inter
`
`ELECTROSTATIC
`Inter
`Intra
`
`TOTAL
`
`l
`2
`3
`4
`5
`6
`7
`8
`9
`
`198
`165
`130
`279
`73
`120
`48
`275
`269
`
`2
`2
`2
`2
`2
`3
`
`2
`2
`
`-6.580
`-2.519
`-0.036
`-0.794
`8.250
`18.722
`23.510
`20.086
`50.606
`
`12.993
`38.937
`20.224
`13.442
`28.537
`11.284
`9.330
`7.782
`9.926
`
`-8.631
`0.034
`-2.122
`2.267
`-5.850
`-13.345
`-1.531
`4.122
`-7.485
`
`-97.114
`-113.576
`-134.858
`-210.191
`-178.076
`-131.287
`-128.866
`-113.913
`-85.902
`
`-99.332
`-77.124
`-116.793
`-195.276
`-147.140
`-114.626
`-97.556
`-81.924
`-32.855
`
`INFORMATION for MOLi 2
`Nl 1)
`VOW:
`c of ILE5
`HBOND: Nl 1)
`o of ILE5
`HBOND: Nl 1)
`OH of TYR100
`HI 2)
`VOW:
`c of ILE5
`HI 2)
`VOW:
`0 of ILE5
`HI 2)
`VOW:
`N of ALA6
`HI 2)
`VOW:
`CA of ALA6
`HI 2)
`VOW:
`OH of TYR100
`HI 2)
`VOW:
`HICG2) of ILE5
`HI 2)
`VOW:
`HICA) of ALA6
`
`r
`
`3.617
`
`r =
`r
`r
`r
`r =
`r
`r
`
`2.588
`2.029
`3.120
`3.047
`2.805
`2.592
`2.124
`
`Fig. 2. Output from the LORE program
`
`In addition to the file output of the three-dimensional atomic coordinates of the selected struc(cid:173)
`tures, the LORE program outputs a summary of the selected structures as shown in Fig. 2. The
`various energy values and some structural features for the nine structures (upper), and the inter(cid:173)
`molecular distance information (and interaction type) for each structure (lower) are listed up. Some
`of chemical structures of the output structures from LORE are shown in Fig. 3.
`
`#2
`
`#4
`
`#5
`
`HO : co -
`1 .&
`
`CH3
`
`HO
`
`CH3
`
`"~~
`
`HO
`
`0
`
`#6
`
`#8
`
`#9
`
`Fig. 3. Chemical structures chosen with LORE from 300 generated structures
`
`
`
`Creation of drug candidate structures
`
`8989
`
`In order to examine the conformational and geometrical stability of the generated structures, we
`have optimized one of the structures by the PM3 method using the MOPAC program.8 (In this ex(cid:173)
`ample, structure #8 was selected.) The optimized structure was compared with the non-optimized
`one by the least-squares superposing method. A stereoview of the superposed structures is shown in
`Fig. 4. The solid line and the dotted line shows the non-optimized and optimized structures respec(cid:173)
`tively. From the high similarity of the two structures, it is strongly suggested that the original, non(cid:173)
`optimized structure is sufficiently stable.
`
`Fig. 4. Comparison of the PM3-optimized and non-optimized structures
`
`In Fig. S, aspects of the intermolecular interactions with the target protein are shown for struc(cid:173)
`ture #8. The hydrogen bonds reported in the LEGEND output are represented by dotted lines. These
`hydrogen bonds were searched by atom type, distance of heteroatoms and bond angle of hydrogen
`bonds. In this case, the anchor atom is the carboxyl oxygen of ASP 27. Besides the hydrogen bond
`between the first atom and the anchor atom, the structure forms additional hydrogen bonds to the car(cid:173)
`bonyl oxygen of ALA 6 and the nitrogen of the indole group of TRP 30.
`
`CH3
`
`I
`
`0
`
`ASP27 o .... ~ -<c- ·-._ HN
`·· ... \---<'
`r1"D
`~ I
`ALAS
`TRP30
`Fig. 5. Aspects of intermolecular hydrogen bonding with target protein
`
`0
`
`H.------·HO
`
`0
`
`CH3
`
`o
`
`Results and Discussions
`The purpose of this study was to develop a method to obtain diverse receptor-binding struc(cid:173)
`tures with suitable molecular shapes and with suitable functional groups at the proper positions and
`orientations in the molecule, covering all possible structures without prejudice. Lewis has proposed
`a method for the same purpose.9 This is the only paper related to the present problem so far
`published. But, because his method places atoms in a molecule on the lattice points of a diamond
`lattice with a ridge line of the carbon-carbon covalent bond length, it produces only a limited kind of
`the structures cannot contain sp2 hybridized atoms, geometries apart from exact
`structures:
`tetrahedral angles or conformations except for exact trans or gauche torsion angles. As our method is
`based on a new algorithm using random numbers and a force field, the structures generated are not
`only unlimited, but also not unstable internally.
`
`
`
`8990
`
`Y. NISHffiATA and A. IT AI
`
`In the application of our method to the dihydrofolate reductase system, we have established that
`our program can generate a wide variety of structures without any chemical inconsistency. All of
`them were proved to have shapes well fitting the receptor cavity. They were also considered to have
`stable geometries and conformations based on a comparison of one of the structures with the struc(cid:173)
`ture optimized by using the PM3 program.
`The minimum requirement for a drug to bind specifically to its receptor is a good
`complementary fit of the molecule to the ligand-binding cavity of the receptor. In the case of a
`flexible molecule, the requirement is to be able to adopt a conformation which results in such a
`molecular shape with reasonable stability. In addition, it is desirable that the structure has as many
`functional groups as possible which can interact with those in the receptor molecule, by hydrogen
`bonding, electrostatic interactions, hydrophobic interactions and others.
`The program starts generating a molecule from a hydrogen-bonding heteroatom with an anchor
`group on the receptor. Other intermolecular interactions are not positively taken into account during
`the atom generation process. However some heteroatoms could be placed at favorable positions for
`polar interactions by chance. Several attempts to positively incorporate polar interactions with the
`receptor into the structure-generation process by LEGEND are under investigation.
`Out of a large number of structures generated by the LEGEND program, a rather small number
`of promising structures should be chosen by the LORE program, because graphical selection from a
`large number of structures is difficult at the present stage. In the example, we have used energetic
`values (intramolecular and intermolecular vdw and electrostatic energy, as well as total energy) and
`some indices including structural features (number of intermolecular hydrogen bonds, number of
`rings in the structure) as criteria for selection. Further, improvements of the criteria in the LORE
`program are desirable in order to choose promising structures efficiently.
`Starting with the structures selected by the LORE program, structural modification and
`selection should be made from the viewpoint of synthetic chemistry. It is also very important to
`modify the structures by replacing atoms so as to form favorable interactions with receptor, on an
`interactive graphic display.
`Computer simulations of stabilities, physical properties and molecular interactions would be
`useful for further modification and selection, before synthesizing the most promising compounds. If
`a synthesized compound were proved to be active by a receptor-binding assay, even if its potency
`were low, it might become a lead compound. After that, an elaborate optimization process for bio(cid:173)
`logical activity would be necessary just as in conventional drug development.
`
`Conclusion
`We have developed a new method, for artificial lead generation based on the three-dimen(cid:173)
`sional structure of the relevant receptor. The method should become an essential tool for rational
`drug design.
`
`REFERENCES AND NOTES
`1 . These programs are written in computer language C to run on the several type of workstations.
`2. Bernstein, F.C. et al. J. Mol. Bioi. 1977, 112, 535-542
`3. Tomioka, N.; Itai, A.; Iitaka, Y. J. Comput.-Aided Mol. Design 1987, J, 197-210
`4. Allinger, N.L. J. Am. Chern. Soc. 1977, 99, 3279
`5. Weiner, S.J.; Kollman, P.A.; Nguyen, D.T; Case, D.A. J. Comp. Chem.1986, 7, 230-252
`6. DelRe, G. J. Chern. Soc. 1958, 4031-4040
`7. Bystroff, C.; Oatley, S.J.; Kraut, J. Biochemistry 1990, 20, 3267-3277
`8.
`Stewart, J.J.P. J. Comp. Chern. 1989, 10, 209-220
`9.
`Lewis, R.A. J. Comput.-Aided Mol. Design 1990, 4, 205-210