`
`ADMET IN SILICO MODELLING:
`TOWARDS PREDICTION PARADISE?
`
`Han van de Waterbeemd* and Eric Gifford‡
`
`Following studies in the late 1990s that indicated that poor pharmacokinetics and toxicity were
`important causes of costly late-stage failures in drug development, it has become widely
`appreciated that these areas should be considered as early as possible in the drug discovery
`process. However, in recent years, combinatorial chemistry and high-throughput screening have
`significantly increased the number of compounds for which early data on absorption, distribution,
`metabolism, excretion (ADME) and toxicity (T) are needed, which has in turn driven the develop-
`ment of a variety of medium and high-throughput in vitro ADMET screens. Here, we describe how
`in silico approaches will further increase our ability to predict and model the most relevant pharma-
`cokinetic, metabolic and toxicity endpoints, thereby accelerating the drug discovery process.
`
`Why is in silico ADMET needed?
`Traditionally, drugs were discovered by testing com-
`pounds synthesized in time-consuming multi-step
`processes against a battery of in vivo biological screens.
`Promising compounds were then further studied in
`development, where their pharmacokinetic properties,
`metabolism and potential toxicity were investigated.
`Adverse findings were often made at this stage1 (FIG. 1),
`with the result that the project would be halted or re-
`started to find another clinical candidate — an unac-
`ceptable burden on the research and development bud-
`get of any pharmaceutical company.
`Today, this paradigm has been re-worked in several
`ways. The testing of drug metabolism, pharmaco-
`kinetics and toxicity is today done much earlier; that is,
`before a decision is taken to evaluate a compound in the
`clinic. However, the rate at which biological screening
`data are obtained has dramatically increased, and
`(ultra)high-throughput screening (HTS) facilities are
`now common at large pharmaceutical companies and
`at specialized biotechs2. In response to these develop-
`ments, a new approach to chemistry — combinatorial
`chemistry — has been adopted to feed these highly
`efficient hit-finding machines. Combinatorial chem-
`istry makes it possible to synthesize large series of
`closely related libraries of chemicals using the same
`chemical reaction and appropriate reagents. Such
`
`libraries are then run through the HTS to find hits
`around which further, more focused, series are designed
`and synthesized in a next round.
`As the capacity for biological screening and chemical
`synthesis have dramatically increased, so have the
`demands for large quantities of early information on
`absorption, distribution, metabolism, excretion
`(ADME) and toxicity data (together called ADMET
`data). Various medium and high-throughput in vitro
`ADMET screens are therefore now in use. In addition,
`there is an increasing need for good tools for predicting
`these properties to serve two key aims — first, at the
`design stage of new compounds and compound
`libraries so as to reduce the risk of late-stage attrition;
`and second, to optimize the screening and testing by
`looking at only the most promising compounds.
`
`Drug-like properties. Which properties make drugs dif-
`ferent from other chemicals? A number of studies have
`been performed with the aim of answering this question
`(for examples, see REFS 3–6). A particularly influential
`example — the analysis of the World Drug Index
`(WDI)5, which lead to Lipinski’s ‘rule-of-five’ — identi-
`fies several critical properties that should be considered
`for compounds with oral delivery in mind. These prop-
`erties, which are usually viewed more as guidelines rather
`than absolute cutoffs, are molecular mass <500 daltons
`
`*Pfizer Global Research
`& Development, PDM,
`Sandwich, Kent CT13 9NJ,
`UK.
`‡Pfizer Global Research &
`Development, Discovery
`Research Informatics,
`2,800 Plymouth Road,
`Ann Arbor, Michigan
`48105, USA.
`Correspondence to H.v.d.W.
`e-mail: han_waterbeemd@
`sandwich.pfizer.com
`doi:10.1038/nrd1032
`
`192 | MARCH 2003 | VOLUME 2
`
`www.nature.com/reviews/drugdisc
`
`© 2003 Nature Publishing Group
`
`1 of 15
`
`PENN EX. 2061
`CFAD V. UPENN
`IPR2015-01836
`
`
`
`R E V I E W S
`
`When is ADMET data needed? The need for ADMET
`information starts with the design of new compounds.
`This information can influence the decision to proceed
`with synthesis either via traditional medicinal chemistry
`or combinatorial chemistry strategies. Obviously, at this
`stage, computational approaches are the only option for
`getting this information, but it is also acceptable that the
`predictions are not perfect at this point. Once a series of
`molecules is focused around a lead and is further opti-
`mized towards a clinical candidate, more robust mecha-
`nistic models will be required.
`
`What ADME properties do we want to predict? A deeper
`understanding of the relationships between important
`ADME parameters and molecular structure and proper-
`ties has been used to develop in silico models that allow
`the early estimation of several ADME properties10–17.
`Among other important issues, we want to predict
`properties that provide information about dose size and
`dose frequency (BOX 1), such as oral absorption, bioavail-
`ability, brain penetration, clearance (for exposure) and
`volume of distribution (for frequency).
`As a result of the availability of experimental data in
`the literature, considerable effort has gone into the
`development of models to predict physicochemical
`properties relevant to ADME, such as lipophilicity.
`However, despite its importance, the prediction of phar-
`macokinetic properties such as clearance, volume of dis-
`tribution and half-life directly from molecular structure
`is making slower progress owing to a lack of published
`data. Similarly, the prediction of various aspects of
`metabolism and toxicity is also underdeveloped.
`
`What computational tools are used? Here, there are two
`aspects to consider: data modelling and molecular
`modelling, which have different toolboxes. Molecular
`modelling includes approaches such as protein
`modelling18, which uses quantum mechanical methods
`to assess the potential for interaction between the small
`molecules under consideration and proteins known to
`be involved in ADME processes, such as cytochrome
`P450s. This requires three-dimensional structural
`information on the protein, which can be built by
`homology modelling of related structures if the human
`protein structure is not available. If no structural infor-
`mation on the protein is available, an alternative way of
`assessing the potential of a small molecule to interact
`with a particular protein is to use PHARMACOPHORE models,
`which are built from a superposition of known sub-
`strates of the protein.
`For data modelling, quantitative structure–activity
`relationship (QSAR) approaches19 are typically applied.
`QSAR and quantitative structure–property relationship
`(QSPR) studies have been performed since the 1960s
`with a variety of biological and physicochemical data.
`These studies use statistical tools to search for correla-
`tions between a given property and a set of molecular
`and structural descriptors of the molecules in question.
`Once such a QSAR model has been ‘TRAINED’ using a set
`of molecules for which experimental data on the prop-
`erty in question are available, it can be used to make
`
`30%
`
`39%
`
`5%
`
`10%
`
`5%
`
`11%
`
`Pharmacokinetics
`
`Adverse effects in man
`
`Animal toxicity
`
`Miscellaneous
`
`Commercial reasons
`
`Lack of efficacy
`
`Figure 1 | An analysis of the main reasons for attrition in
`drug development1. In this analysis, published five years ago,
`half of all failures were attributed to poor pharmacokinetics
`(39%) and animal toxicity (11%). Such analyses clearly
`indicated that these two areas should be focused on as early
`as possible in the drug-discovery process (although it should
`be noted that the interpretation of such data is often hampered
`by the fact that compounds may have more than one flaw and,
`as the project was halted, these might not always have been
`identified). An even better approach would be to use predictive
`tools in the design phase of the synthesis of compounds and
`compound libraries.
`
`(Da), calculated octanol/water partition coefficient
`(CLOGP) <5, number of hydrogen-bond donors <5 and
`number of hydrogen-bond acceptors <10. In general,
`such studies, and others not cited here, point to the most
`important physicochemical and structural properties
`characteristic of a good drug in the context of our current
`knowledge. These properties are then typically used to
`construct predictive ADME models and form the basis
`for what has been called property-based design7. To a
`certain extent, similar molecules can be expected to have
`similar ADME properties8. This concept is the basis of
`software called SLIPPER-2001, in which physicochemical
`DESCRIPTORS and molecular similarity are used for the
`prediction of properties such as lipophilicity, solubility
`and fraction absorbed in humans9.
`
`How are ADMET data obtained? The quest for early,
`fast and relevant ADMET data is tackled in three ways.
`First, a variety of in vitro assays have been further auto-
`mated through the use of robotics and miniaturization.
`Second, in silico models are being used to assist in the
`selection of both appropriate assays, as well as in the
`selection of subsets of compounds to go through these
`screens. Third, predictive models have been developed
`that might ultimately become sophisticated enough to
`replace in vitro assays and/or in vivo experiments.
`
`DESCRIPTOR
`A structural or physicochemical
`property of a molecule or part
`of a molecule. Examples include
`log P, molecular mass and polar
`surface area.
`
`PHARMACOPHORE
`A pharmacophore is the
`ensemble of steric and electronic
`features that are necessary to
`ensure the optimal
`supramolecular interactions
`with a specific biological target
`structure and to trigger (or to
`block) its biological response.
`
`TRAINING
`The building of a model using
`part of the data (that is, the
`training set), followed by
`validation of the model using
`the rest of the data (that is, the
`validation set). Finally, the
`model is tested using
`compounds (the test set) not
`used for training and validation.
`
`NATURE REVIEWS | DRUG DISCOVERY
`
`VOLUME 2 | MARCH 2003 | 1 9 3
`
`© 2003 Nature Publishing Group
`
`2 of 15
`
`PENN EX. 2061
`CFAD V. UPENN
`IPR2015-01836
`
`
`
`R E V I E W S
`
`Box 1 | Pharmacokinetics
`
`Pharmacokinetics is the study of the time course of a drug within the body and
`incorporates the processes of absorption, distribution, metabolism and excretion
`(ADME)76. Pharmacokinetic parameters are derived from the measurement of drug
`concentrations in blood or plasma. The key pharmacokinetic parameters and their
`importance for the dose regimen and dose size are shown in the figure80.
`Most drugs are given orally for reasons of convenience and compliance. Typically, a drug
`dissolves in the gastro-intestinal tract, is absorbed through the gut wall and then passes the
`liver to get in to the blood circulation. The percentage of the dose reaching the circulation
`is called the bioavailability. From there, the drug will be distributed to various tissues and
`organs in the body. The extent of distribution will depend on the structural and
`physicochemical properties of the compound. Some drugs can enter the brain and central
`nervous system by crossing the blood–brain barrier. Finally, the drug will bind to its
`molecular target, for example, a receptor or ion channel, and exert its desired action.
`• Volume of distribution (Vd) is a theoretical concept that connects the administered
`dose with the actual initial concentration (C0) present in the circulation:
`Vd = Dose/C0
`Most drugs will bind to various tissues and in particular to proteins in the blood, such
`as albumin. As only the free (unbound) drug will bind to the molecular target, the
`concept of unbound volume of distribution is used:
`Vdu = Vd/fu, where fu is the fraction unbound.
`• Clearance (Cl) of the drug from the body mainly takes place via the liver (hepatic
`clearance or metabolism, and biliary excretion) and the kidney (renal excretion).
`By plotting the plasma concentration against time, the area under the curve (AUC)
`relates to dose, bioavailability and clearance.
`AUC = F x Dose/Cl
`• Half-life (t1/2) — the time taken for a drug concentration in the plasma to reduce by
`50% — is a function of the clearance and volume of distribution, and determines how
`often a drug needs to be administered.
`t1/2 = 0.693 Vd/Cl
`
`Volume of
`distribution
`
`Clearance
`
`Absorption
`
`Half-life
`
`Oral
`bioavailability
`
`Dosing regimen:
`How often?
`
`Dosing regimen:
`How much?
`
`predictions on molecules not in the training set,
`although, in general, reliable predictions are only possible
`for molecules similar to those in the training set.
`A wide variety of descriptors for use in QSAR
`studies have been developed over the last 40 years20
`(for example, those available in the program Dragon).
`A subset of these descriptors is potentially useful for
`predicting ADME properties. Indeed, with the
`increased interest in the prediction of ADME proper-
`ties, specifically tailored descriptors have already been
`reported, for example, those in the VolSurf program21.
`Some of the descriptors used are close to the chemist’s
`
`MULTIVARIATE ANALYSIS
`A subset of statistical techniques
`that can deal with larger sets of
`molecular descriptors that is
`aimed at finding relationships or
`patterns in data sets. Examples
`include multiple linear
`regression (MLR) and partial
`least squares (PLS).
`
`intuition, such as molecular size and hydrogen bonding.
`Other descriptors are merely topological or quantum
`chemical concepts, but can produce highly predictive
`models, although these might be ‘black boxes’ for
`most people.
`Using appropriate descriptors, QSAR approaches —
`ranging from simple multiple linear regression to
`modern MULTIVARIATE ANALYSIS techniques, such as partial
`least squares (PLS) — are now being applied to the
`analysis of ADME data22. Data-mining and machine-
`learning methods originally developed and used in other
`fields are now also successfully being used for this pur-
`pose. Examples of such methods include NEURAL NETWORKS
`(NN), self-organizing maps (SOM; also called Kohonen
`networks), RECURSIVE PARTITIONING (RP) and support
`vector machines (SVM).
`Good predictive models for ADMET parameters
`depend crucially on selecting the right mathematical
`approach, the right molecular descriptors for the particu-
`lar ADMET endpoint, and a sufficiently large set of
`experimental data relating to this endpoint for the valida-
`tion of the model (BOX 2). Insight is growing as to which
`of the available descriptors and QSAR tools are most
`appropriate, although there often seems to be different
`options with similar predictive power. In particular, more
`needs to be learnt about how the size of the training set
`influences the choice of the most capable model.
`
`Prediction of physicochemical properties
`The physicochemical properties of a drug have an
`important impact on its pharmacokinetic (BOX 1) and
`metabolic (BOX 3) fate in the body, and so a good under-
`standing of these properties, coupled with their mea-
`surement and prediction, are crucial for a successful
`drug discovery programme.
`
`Lipophilicity. Poor biopharmaceutical properties — in
`particular, poor aqueous solubility and slow dissolu-
`tion rate — can lead to poor oral absorption and
`hence low oral bioavailability. In general, poor solubility
`is related to high lipophilicity, whereas hydrophilic
`compounds generally show poor permeability and
`hence low absorption. Therefore, the measurement of
`solubility and lipophilicity, as well as ionization con-
`stants affecting these two properties, has been auto-
`mated and integrated in the high-throughput drug
`discovery paradigm.
`The relationship between lipophilicity and phar-
`macokinetic properties has been discussed by various
`workers in the field23–25. Lipophilicity is the key
`physicochemical parameter linking membrane per-
`meability — and hence drug absorption and distribu-
`tion — with the route of clearance (metabolic or
`renal). Measuring the lipophilicity of a compound is
`readily amenable to automation. The gold standard
`for expressing lipophilicity is the partition coefficient
`P (or log P to have a more convenient scale) in an
`octanol/water system; alternatives include applica-
`tions of immobilized artificial membranes (IAM),
`immobilized liposome chromatography (ILC) and
`liposome/water partitioning.
`
`194 | MARCH 2003 | VOLUME 2
`
`www.nature.com/reviews/drugdisc
`
`© 2003 Nature Publishing Group
`
`3 of 15
`
`PENN EX. 2061
`CFAD V. UPENN
`IPR2015-01836
`
`
`
`Box 2 | The need for good data
`
`Clearly, larger databases of marketed drugs are required to establish more robust models
`to predict various ADME properties, including drug–drug interactions. Several
`published ADME data sets are available for data modelling13,36,47,105–107, but the quality of
`the data and the number of available training examples remain important issues. In the
`future, service providers such as Cerep, Novascreen and Cyprotex will be able to offer
`larger data sets with which to build more robust models.
`In a recent symposium, the question of whether the Internet can help as a resource
`to collect relevant ADME data was addressed108. The current opinion is that there are
`some good and well-maintained websites available, but unfortunately also many of
`questionable use in research owing to a lack of control of data quality or reference to
`the original data.
`
`There is continued interest in developing and
`improving log P calculation programs, and there are
`many such programs available. Most calculation
`approaches rely on fragment values, although simple
`methods based on molecular size and hydrogen-bonding
`indicators for functional groups to calculate log P values
`have also been shown to be extremely versatile22.
`However, log P values can only be a first estimate of
`the lipophilicity of a compound in a biological environ-
`ment. For partition processes in the body, the distribu-
`tion coefficient D (log D) — for which an aqueous
`buffer at pH 7.4 (blood pH) or 6.5 (intestinal pH) is
`used in the experimental determination — often pro-
`vides a more meaningful description of lipophilicity,
`especially for ionizable compounds. However, in our
`experience, programs that can reliably predict log D are
`scarce at present.
`
`Solubility. The first step in the drug absorption process
`is the disintegration of the tablet or capsule, followed by
`the dissolution of the active drug. Obviously, low solu-
`bility is detrimental to good and complete oral absorp-
`tion, and so the early measurement of this property is of
`great importance in drug discovery. Reflecting this need,
`rapid, robust methods reliant on turbidimetry and
`nephelometry have been developed to efficiently measure
`the solubility of large numbers of compounds6,26.
`Ideally, only soluble compounds would be synthe-
`sized in a drug-discovery programme. Predictive solu-
`bility methods — for example, neural networks —
`might assist in this effort. However, at present, no
`approaches are robust enough to accurately predict low
`solubility. Many current predictive solubility programs27
`use training data from different laboratories with varying
`quality and different experimental conditions. Hopefully,
`by measuring many compounds under standardized
`conditions, current predictive models can be improved28.
`
`pKa. As ionization can also affect the solubility,
`lipophilicity (log D), permeability and absorption of a
`compound, approaches have been developed for the
`rapid measurement of pKa values of sparingly soluble
`drug compounds. Using experimental data reported in
`the literature, several approaches have been used to
`develop pKa calculators. Programs include ACD/pKa
`(ACD), Pallas/pKa (Compudrug) and SPARC29.
`
`NEURAL NETWORKS
`Neural networks are
`computational models that are
`based on the principles of the
`functioning of the brain. They
`can be used to model nonlinear
`relationships between dependent
`(biological endpoint to be
`predicted) and independent
`(molecular and structural
`descriptors) variables. Examples
`include back-propagation and
`self-organising maps (SOM;
`also called Kohonen neural
`networks).
`
`RECURSIVE PARTITIONING
`OR DECISION TREES
`A supervised learning method
`producing a tree-structured
`series of rules to predict a
`particular property using a set of
`molecular descriptors as input.
`
`R E V I E W S
`
`Hydrogen bonding. The hydrogen-bonding capacity of a
`drug solute is now recognized as an important determi-
`nant of permeability. In order to cross a membrane, a
`drug molecule needs to break hydrogen bonds with its
`aqueous environment. The more potential hydrogen
`bonds a molecule can make, the more energy this bond-
`breaking costs, and so high hydrogen-bonding potential
`is an unfavourable property that is often related to low
`permeability and absorption.
`Initially, ∆logP — the difference between octanol/
`water and alkane/water partitioning — was used as a
`measure for solute hydrogen-bonding, but this tech-
`nique is limited by the poor solubility of many com-
`pounds in the alkane phase. A variety of computational
`approaches have addressed the problem of estimating
`hydrogen-bonding capacity, ranging from simple hetero-
`atom (O and N) counts, the consideration of molecules
`in terms of the number of hydrogen-bond acceptors
`and donors, and more sophisticated measures that take
`into account such parameters as free-energy factors30
`and (dynamic) polar surface area (PSA)31. The latter
`are easily calculated, and it is now believed that a
`single minimum-energy conformation is sufficient to
`compute the PSA, instead of the more computation-
`ally demanding and time-consuming dynamic polar-
`surface-area calculation31. A fast fragment-based
`algorithm for PSA has been reported32, which allows
`PSA calculations to be implemented in virtual screen-
`ing approaches.
`
`Permeability. Efforts have been undertaken to predict
`the permeability of compounds through Caco-2 cells,
`which serve as a model for human intestinal absorption,
`in an approach called membrane-interaction quantita-
`tive structure–activity relationships (MI-QSAR)33. But
`one could ask the question,“Why model the model of
`human absorption?”. A more direct approach is to
`model processes that would address ‘pure’ measures of
`permeability. These include octanol/water partitioning,
`liposome partitioning, retention on immobilized arti-
`ficial membranes (IAM), the parallel artificial mem-
`brane-permeability assay (PAMPA) and binding to
`liposomes measured by surface-plasmon-resonance
`(SPR) biosensors.
`
`Prediction of ADME and related properties
`Absorption. For a compound crossing a membrane by
`purely passive diffusion, a reasonable permeability
`estimate can be made using single molecular properties,
`such as log D or hydrogen-bonding capacity. However,
`besides the purely physicochemical component con-
`tributing to membrane transport, many compounds are
`affected by biological events, including the influence of
`transporters and metabolism (further discussed in later
`sections). Many drugs seem to be substrates for trans-
`porter proteins, which can either promote or hinder
`permeability. In particular, the combined role of
`cytochrome P450 3A4 (CYP3A4) and P-glycoprotein
`(P-gp) in the gut as a barrier to drug absorption has
`been well studied34. Currently, no theoretical SAR basis
`exists to account for these effects.
`
`NATURE REVIEWS | DRUG DISCOVERY
`
`VOLUME 2 | MARCH 2003 | 1 9 5
`
`© 2003 Nature Publishing Group
`
`4 of 15
`
`PENN EX. 2061
`CFAD V. UPENN
`IPR2015-01836
`
`
`
`R E V I E W S
`
`Box 3 | Metabolism
`
`The body will eventually try to eliminate xenobiotics, including drugs. For many drugs,
`this first requires metabolism or biotransformation, which takes place partly in the gut
`wall during uptake, but primarily in the liver. The figure shows where metabolism occurs
`during the absorption process. The fraction of the initial dose appearing in the portal
`vein is the fraction absorbed, and the fraction reaching the blood circulation after the
`first-pass through the liver defines the bioavailability of the drug.
`Traditionally, a distinction is made between phase I and phase II metabolism, although
`these do not necessarily occur sequentially. In phase I metabolism, a molecule is
`functionalized, for example, through oxidation, reduction or hydrolysis. The most
`important enzymes involved are the cytochrome P450s. In particular, CYP3A4, CYP2D6,
`CYP2C9 and CYP2C19 are important for the metabolism of drugs in humans. In phase II
`metabolism, the functionalized drug molecule is further transformed in so-called
`conjugation reactions. These include for example, glucuronidation and sulfation, as well as
`conjugation with glutathione. It should be noted that the metabolism in animals might be
`different from that in humans, and therefore the prediction of human pharmacokinetics
`and metabolism from animal data might not be straightforward.
`
`Dose
`
`Absorption
`
`Portal
`vein
`
`Gut
`wall
`
`Liver
`
`Bioavailability
`
`To faeces
`
`Metabolism
`
`Metabolism
`
`In vitro methods, such as Caco-2 or Madin-Darby
`canine kidney (MDCK) monolayers, are widely used to
`make oral absorption estimates. These cells also express
`transporter proteins, but only express very low levels of
`metabolizing enzymes. Similarly, there is a continued
`interest in finding a relevant in vitro screen for estimating
`the permeability of drugs for diseases of the central
`nervous system (CNS). The bovine microvessel endo-
`thelial cell (BMEC) model has been explored as a possible
`in vitro model of the blood–brain barrier35.
`Considerable effort has also gone into the develop-
`ment of in silico models for the prediction of oral
`absorption36–42. The simplest models are based on a single
`descriptor, such as log P or log D, or polar surface area,
`which is a descriptor of hydrogen-bonding potential31.
`Different multivariate approaches, such as multiple linear
`regression, partial least squares and artificial neural
`networks41, have been used to develop quantitative
`structure–human-intestinal-absorption relationships. In
`all approaches, hydrogen bonding is considered to be a
`property with an important effect on oral absorption.
`Absorption-simulation programs, such as Gastro-
`Plus43 and Idea44, might eventually become a valuable
`tool in lead optimization and compound selection.
`These programs, which have recently been compared45,
`are computer simulation models developed and vali-
`dated to predict ADME outcomes, such as rate of
`absorption and extent of absorption, using a limited
`
`number of in vitro data inputs. They are based on
`advanced compartmental absorption and transit
`(ACAT) models, in which physicochemical concepts,
`such as solubility and lipophilicity, are more readily
`incorporated than physiological aspects involving trans-
`porters and metabolism. In more recent versions,
`attempts are being made to model the influence of
`transporters, in addition to gut-wall metabolism, on
`gastrointestinal uptake. For example, the oral bioavail-
`ability of ganciclovir in dogs and humans was simulated
`using a physiologically based model that utilized many
`biopharmaceutically relevant parameters, such as the
`concentration of ganciclovir in the duodenum,
`jejunum, ileum and colon at a variety of dose levels and
`solubility values. The simulation results demonstrated
`that the low bioavailability of ganciclovir is limited by
`compound solubility rather than permeability due to
`partitioning, as previously speculated44.
`
`Bioavailability. Recently, the first attempts to predict
`bioavailability directly from molecular structure have
`been published. However, this is not an easy task, as
`bioavailability depends on a superposition of two
`processes: absorption and liver first-pass metabolism.
`Absorption in turn depends on the solubility and per-
`meability of the compound, as well as interactions with
`transporters and metabolizing enzymes in the gut wall.
`Important properties for determining permeability
`seem to be the size of the molecule, as well as its capacity
`to make hydrogen bonds, its overall lipophilicity and
`possibly its shape and flexibility. Molecular flexibility,
`for example, as evaluated by counting the number of
`rotatable bonds, has been identified as a factor influenc-
`ing bioavailability in rats46.
`Yoshida and Topliss47 trained a QSAR model with log
`D at pH 7.4 and 6.5 as inputs for the physicochemical
`properties and the presence/absence of typical functional
`groups most likely to be involved in metabolic reactions
`as the structural input. This approach used ‘fuzzy adap-
`tive least squares’, and drugs could be classified into one of
`four predefined bioavailability ranges. Using this
`approach, a new drug can be assigned to the correct class
`with an accuracy of 60%.An unpublished effort based on
`classification using the SIMCA approach and which
`seems to achieve similar success has also been reported12.
`In another approach, regression and recursive parti-
`tioning have been used48. In this study, 591 compounds
`were included and a set of 85 structural descriptors was
`generated. The authors noted that the mean error in the
`experimental data used to generate the model is ~12%,
`with an increase in error for well-absorbed drugs.
`Therefore, the models should not be expected to gener-
`ate predictions that are more accurate than the variabil-
`ity inherent in the biological measurements.
`Genetic programming, which is a specific form of
`evolutionary programming, has recently been used for
`predicting bioavailability49. The results show a slight
`improvement compared with the Yoshida-Topliss
`approach, although a direct comparison is difficult
`owing to a different selection of the bioavailability
`ranges of the four classes.
`
`196 | MARCH 2003 | VOLUME 2
`
`www.nature.com/reviews/drugdisc
`
`© 2003 Nature Publishing Group
`
`5 of 15
`
`PENN EX. 2061
`CFAD V. UPENN
`IPR2015-01836
`
`
`
`A method for predicting bioavailability using adap-
`tive fuzzy partitioning (AFP) has recently been pre-
`sented at conferences50. The best molecular descriptors
`were selected with a genetic algorithm, and in the next
`step SOMs were used for the classification, which cor-
`rectly classified the molecule in the right bioavailability
`class in 64% of cases.
`The methods described above demonstrate that at
`least qualitative (binned) predictions of oral bioavail-
`ability seem tractable directly from molecular structure.
`Approaches using in vitro data are also under continual
`development. For example, a graphical approach for
`bioavailability prediction based on the combined mea-
`surement of Caco-2 flux and microsomal stability was
`recently presented51 that uses a reference plot to make a
`prediction of bioavailability for a new compound.
`Typically, the prediction will classify a compound as
`0–20%, 20–50% or 50–100% bioavailable. Extending
`this approach to include solubility, for example, might
`increase its predictive power.
`
`Blood–brain barrier penetration. Drugs that act in the
`CNS need to cross the blood–brain barrier (BBB) to
`reach their molecular target. By contrast, for drugs
`with a peripheral target, little or no BBB penetration
`might be required in order to avoid CNS side effects. A
`key issue in the development of models to predict BBB
`penetration is the use of appropriate data to describe
`brain uptake of compounds. There is an ongoing dis-
`cussion about the use of total-brain data versus extra-
`cellular fluid (ECF) or cerebro-spinal fluid (CSF) data
`or data generated by microdialysis52. Another point of
`debate relates to the time point of measurement, which
`is clearly crucial. Overall, data in the literature are
`rather limited in number, and are also generated from
`different experimental protocols. All of these factors
`limit the development of highly predictive models of
`BBB penetration.
`Nevertheless, a variety of models for the prediction
`of uptake into the brain have been developed53–59.
`‘Rule-of-five’-like recommendations regarding the
`molecular parameters that contribute to the ability of
`molecules to cross the BBB have been made to aid
`BBB-penetration predictions53; for example, molecules
`with a molecular mass of <450 Da or with PSA <100 Å2
`are more likely to penetrate the BBB. Most of the early
`predictive models are based on a multiple linear
`regression approach and many use physicochemical
`properties60. One example of such a model is based on
`the combination of only three descriptors, namely the
`calculated octanol/water partition coefficient, the
`number of hydrogen-bond acceptors in an aqueous
`medium and the polar surface area55. More recently,
`other multivariate techniques have been tried using new
`ADME-tailored properties, such as the Volsurf approach,
`in which