throbber
BMC Structural Biology
`
`BioMed Central
`
`Open Access
`Research article
`Antibody-protein interactions: benchmark datasets and prediction
`tools evaluation
`Julia V Ponomarenko*1,2 and Philip E Bourne1,2
`
`Address: 1San Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA and 2Skaggs
`School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA
`
`Email: Julia V Ponomarenko* - jpon@sdsc.edu; Philip E Bourne - bourne@sdsc.edu
`* Corresponding author
`
`Published: 2 October 2007
`
`BMC Structural Biology 2007, 7:64
`
`doi:10.1186/1472-6807-7-64
`
`This article is available from: http://www.biomedcentral.com/1472-6807/7/64
`
`Received: 9 April 2007
`Accepted: 2 October 2007
`
`© 2007 Ponomarenko and Bourne.; licensee BioMed Central Ltd.
`This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),
`which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
`
`Abstract
`Background: The ability to predict antibody binding sites (aka antigenic determinants or B-cell
`epitopes) for a given protein is a precursor to new vaccine design and diagnostics. Among the
`various methods of B-cell epitope identification X-ray crystallography is one of the most reliable
`methods. Using these experimental data computational methods exist for B-cell epitope prediction.
`As the number of structures of antibody-protein complexes grows, further interest in prediction
`methods using 3D structure is anticipated. This work aims to establish a benchmark for 3D
`structure-based epitope prediction methods.
`Results: Two B-cell epitope benchmark datasets inferred from the 3D structures of antibody-
`protein complexes were defined. The first is a dataset of 62 representative 3D structures of protein
`antigens with inferred structural epitopes. The second is a dataset of 82 structures of antibody-
`protein complexes containing different structural epitopes. Using these datasets, eight web-servers
`developed for antibody and protein binding sites prediction have been evaluated. In no method did
`performance exceed a 40% precision and 46% recall. The values of the area under the receiver
`operating characteristic curve for the evaluated methods were about 0.6 for ConSurf, DiscoTope,
`and PPI-PRED methods and above 0.65 but not exceeding 0.70 for protein-protein docking
`methods when the best of the top ten models for the bound docking were considered; the
`remaining methods performed close to random. The benchmark datasets are included as a
`supplement to this paper.
`Conclusion: It may be possible to improve epitope prediction methods through training on
`datasets which include only immune epitopes and through utilizing more features characterizing
`epitopes, for example, the evolutionary conservation score. Notwithstanding, overall poor
`performance may reflect the generality of antigenicity and hence the inability to decipher B-cell
`epitopes as an intrinsic feature of the protein. It is an open question as to whether ultimately
`discriminatory features can be found.
`
`Background
`A B-cell epitope is defined as a part of a protein antigen
`recognized by either a particular antibody molecule or a
`
`particular B-cell receptor of the immune system [1]. The
`main objective of B-cell epitope prediction is to facilitate
`the design of a short peptide or other molecule that can be
`
`Page 1 of 19
`(page number not for citation purposes)
`
`Lassen - Exhibit 1036, p. 1
`
`

`

`BMC Structural Biology 2007, 7:64
`
`http://www.biomedcentral.com/1472-6807/7/64
`
`synthesized and used instead of the antigen, which in the
`case of a pathogenic virus or bacteria, may be harmful to
`a researcher or experimental animal [2]. A B-cell epitope
`may be continuous, that is, a short contiguous stretch of
`amino acid residues, or discontinuous, comprising atoms
`from distant residues but close in three-dimensional space
`and on the surface of the protein.
`
`tivity of antibodies due to the presence of denatured or
`degraded proteins [32,33], or due to conformational
`changes in the protein caused by residue substitutions
`that may even lead to protein mis-folding [34]. Therefore,
`structural methods, particularly X-ray crystallography of
`antibody-antigen complexes, generally identify B-cell
`epitopes more reliably than functional assays [35].
`
`Synthetic peptides mimicking epitopes, as well as anti-
`peptide antibodies, have many applications in the diag-
`nosis of various human diseases [3-7]. Also, the attempts
`have been made to develop peptide-based synthetic pro-
`phylactic vaccines for various infections, as well as thera-
`peutic vaccines for chronic infections and noninfectious
`diseases, including autoimmune diseases, neurological
`disorders, allergies, and cancers [8-10]. The immunoinfor-
`matics software and databases developed to facilitate vac-
`cine design have previously been reviewed [11,12].
`
`During the last 25 years B-cell epitope prediction methods
`have focused primarily on continuous epitopes. They
`were mostly sequence-dependent methods based upon
`various amino acid properties, such as hydrophilicity
`[13], solvent accessibility [14], secondary structure [15-
`18], and others. Recently, several methods using machine
`learning approaches have been introduced that apply hid-
`den Markov models (HMM) [19], artificial neural net-
`works (ANN) [20], support vector machine (SVM) [21],
`and other techniques [22,23]. Recent assessments of con-
`tinuous epitope prediction methods demonstrate that
`"single-scale amino acid propensity profiles cannot be
`used to predict epitope location reliably" [24] and that
`"the combination of scales and experimentation with sev-
`eral machine learning algorithms showed little improve-
`ment over single scale-based methods" [25].
`
`As crystallographic studies of antibody-protein complexes
`have shown, most B-cell epitopes are discontinuous. In
`1984, the first attempts at epitope prediction based on 3D
`protein structure was made for a few proteins for which
`continuous epitopes were known [26-28]. Subsequently,
`Thornton and colleagues [29] proposed a method to
`locate potential discontinuous epitopes based on a pro-
`trusion of protein regions from the protein's globular sur-
`face. However, until the first X-ray structure of an
`antibody-protein complex was solved in 1986 [30], pro-
`tein structural data were mostly used for prediction of
`continuous rather than discontinuous epitopes.
`
`In cases where the three-dimensional structure of the pro-
`tein or its homologue is known, a discontinuous epitope
`can be derived from functional assays by mapping onto
`the protein structure residues involved in antibody recog-
`nition [31]. However, an epitope identified using an
`immunoassay may be an artefact of measuring cross-reac-
`
`B-cell epitopes can be thought of in a structural and func-
`tional sense. Structural epitopes (also called antigenic
`determinants) are defined by a set of residues or atoms in
`the protein antigen contacting antibody residues or atoms
`[33,36]. In contrast, a functional epitope consists of anti-
`gen residues that contribute significantly to antibody
`binding [36,37]. Functional epitopes are determined
`through functional assays (e.g., alanine scanning muta-
`genesis) or calculated theoretically using known struc-
`tures of antibody-protein complexes [38,39]. Thus,
`functional and structural epitopes are not necessary the
`same. Functional epitopes in proteins are usually smaller
`than structural epitopes; only three to five residues of the
`structural epitope contribute significantly to the antibody-
`antigen binding energy [40]. This work focuses on struc-
`tural epitopes inferred from known 3D structures of anti-
`body-protein complexes available in the Protein Data
`Bank (PDB) [41].
`
`Antibody-protein complexes can be categorized as inter-
`mediate transient non-obligate protein-protein com-
`plexes [40,42]. Non-obligate complexes, implying that
`individual components can be found on their own in vivo,
`are classified as either permanent or transient depending
`on their stability under particular physiological and envi-
`ronmental conditions [43]. For example, many enzyme-
`inhibitor complexes are permanent non-obligate com-
`plexes. Transient non-obligate complexes range from
`weak (e.g., electron transport complexes), to intermediate
`(e.g., signal transduction complexes), and to strong (e.g.,
`bovine G protein forming a stable trimer upon GDP bind-
`ing) [44]. Most antibodies demonstrate intermediate
`affinity for their specific antigens [45]. Based on this clas-
`sification, general methods for the prediction of interme-
`diate transient non-obligate protein-protein interactions
`have been applied to the prediction of structural epitopes
`[40,42]. For example, Jones and Thornton, using their
`method for predicting protein-protein binding sites [46],
`successfully predicted B-cell epitopes on the surface of the
`β-subunit of human chorionic gonadotropin (βhCG)
`[47].
`
`Since the number of available structures of antibody-pro-
`tein complexes remains limited, thus far only a few meth-
`ods, CEP (Conformational Epitope Prediction) [48] and
`DiscoTope [49], for B-cell epitope prediction using a pro-
`tein of a given three-dimensional structure have been
`
`Page 2 of 19
`(page number not for citation purposes)
`
`Lassen - Exhibit 1036, p. 2
`
`

`

`BMC Structural Biology 2007, 7:64
`
`http://www.biomedcentral.com/1472-6807/7/64
`
`developed. In the near future, with growth in the number
`of available structures of antibody-protein complexes,
`extensive development in this area is expected. Existing
`and new methods for epitope prediction demand a
`benchmark which will set the standard for the future com-
`parison of methods. To facilitate the further development
`of this standard, we have developed B-cell epitope bench-
`mark datasets inferred from existing 3D structures of anti-
`body-protein complexes. Further, using the benchmark
`datasets, we evaluated CEP, DiscoTope, and six recently
`developed publicly available web-servers for generalized
`protein-protein binding site prediction using various
`approaches: protein-protein docking (ClusPro [50], DOT
`[51] and PatchDock [52]); structure-based methods
`applying different principals and trained on different
`datasets (PPI-PRED [53], PIER [54] and ProMate [55]),
`and residue conservation (ConSurf [56]).
`
`Results and discussion
`Structural epitope definition
`Three definitions of an epitope inferred from the X-ray
`structures of antibody-protein complexes were consid-
`ered: (1) The epitope consists of protein antigen residues
`in which any atom of the residue looses more than 1Å2 of
`accessible surface area (ASA) upon antibody binding. ASA
`was calculated using the program NACCESS [57]; (2) The
`epitope consists of protein antigen residues in which any
`atom of the epitope residue is separated from any anti-
`body atom by a distance ≤ 4Å; (3) The epitope consists of
`protein antigen residues in which any atom of the epitope
`residue is separated from any antibody atom by a distance
`≤ 5Å. These three definitions were used for two reasons.
`First, the methods evaluated in this work use one of these
`three definitions, second, we wished to study how the
`epitope definition influenced the results.
`
`Results (not shown) indicated that the structural epitope
`definition did not influence the outcome. Hence, unless
`otherwise specified, results are based on the second
`epitope definition.
`
`Construction of the benchmark datasets
`Two benchmark datasets were derived from the 3D struc-
`tures of antibody-protein complexes available from the
`PDB [41]:
`
`(cid:127) Dataset #1 – Representative 3D structures of protein
`antigens with structural epitopes inferred from 3D struc-
`tures of antibody-protein complexes. This dataset is
`intended for the study of the antigenic properties of pro-
`teins as well as for development and evaluation of the
`methods based on protein structure alone, or protein-pro-
`tein unbound docking methods, that is, if the structure of
`the antibody is known or can be modeled. Here this data-
`set was used for the evaluation of scale-based methods
`
`(DiscoTope, PIER, ProMate and ConSurf). The dataset
`contains 62 antigens, 52 of which are one-chain antigen
`proteins.
`
`(cid:127) Dataset #2 – Representative 3D structures of antibody-
`protein complexes presenting different epitopes. This
`dataset is useful for the study of the properties of individ-
`ual epitopes as well as for development and evaluation of
`protein-protein bound docking methods. Since the cur-
`rent work attempts to compare the methods of different
`types, including protein-protein docking methods, this
`dataset was used to compare the performance of all meth-
`ods to each other. The dataset contains 70 structures of
`proteins in complexes with two-chain antibodies and 12
`structures of proteins in complexes with one-chain anti-
`bodies.
`
`The flowchart describing the construction of the bench-
`mark datasets is shown in Figure 1. Steps from 1 to 4 relate
`to dataset #1; steps 1–6 relate to dataset #2.
`
`Step 1 – crystal structures of protein antigens of length ≥30
`amino acids at a resolution ≤ 4Å in complex with anti-
`body fragments containing variable regions (Fab, VHH,
`Fv, or scFv fragments) were collected from the Protein
`Data Bank (PDB) [41]. Structures in which the antibody
`binds antigen but involves no CDR residues have been
`excluded from the analysis; there were four such structures
`[PDB: 1MHH, 1HEZ, 1DEE, 1IGC]. If a structure con-
`tained several complexes in one asymmetric unit and
`there was no structural difference observed between these
`complexes, only one complex was selected. In this way
`166 structures containing 187 antibody-protein com-
`plexes were selected: 24 complexes were formed by one-
`chain antibody fragments and 163 complexes by two-
`chain antibody fragments.
`
`Step 2 – all antigen protein chains were structurally
`aligned to one another using the CE algorithm [58]. Two
`protein chains were considered similar if all the following
`conditions applied: (i) rmsd ≤3Å, (ii) z-score ≥4.0, (iii)
`number of residue-residue matches relative to the length
`of the longest chain ≥80%, (iv) sequence identity in the
`structural alignment (not considering gaps) ≥80%. The z-
`score takes into account overall structural similarity and
`number of gapped positions. Two protein molecules were
`considered similar if each chain in one protein had a sim-
`ilar chain in another protein. Figure 2 demonstrates how
`the last two parameters, number of matches and sequence
`identity in the structural alignment, are defined.
`
`The structural alignment rather than sequence alignment
`was used because protein structure is more conserved than
`sequence, and there can be expected regions in proteins
`with low sequence similarity that cannot be aligned by
`
`Page 3 of 19
`(page number not for citation purposes)
`
`Lassen - Exhibit 1036, p. 3
`
`

`

`BMC Structural Biology 2007, 7:64
`
`http://www.biomedcentral.com/1472-6807/7/64
`
`Flowchart for building benchmark datasetsFigure 1
`
`Flowchart for building benchmark datasets.
`
`sequence alone. The structural alignment also avoids con-
`sidering two proteins as similar if they have similar
`sequences but different structures (possible over short
`regions). The threshold values were chosen empirically
`based on previous experience working with the CE algo-
`rithm. As a result, the chosen threshold values separated
`human and bird lysozymes (61% sequence identity) and
`neuraminidases of different influenza virus strains, H3N2
`and H11N9 (47% sequence identity).
`
`Step 3 – 35 proteins were orphans represented by only one
`3D structure. Of the remaining 27 proteins represented by
`more than one 3D structure, the structure with the best
`resolution was selected as the representative structure. The
`final representative dataset contained 62 antigens [see
`Additional file 1], 52 of which were one-chain antigen
`proteins.
`
`Hypothetical example of the structural alignment of proteins (A) (sequence AVCQYWC) and (B) (sequence ACYARTYC)Figure 2
`
`
`Hypothetical example of the structural alignment of
`proteins (A) (sequence AVCQYWC) and (B)
`(sequence ACYARTYC). Number of residue-residue
`matches = 5, number of residue-residue matches relative to
`the length the longest chain = 63% (5/8), sequence identity =
`80% (4/5).
`
`Step 4 – for each protein, epitopes inferred from the 3D
`structures of antibody-protein complexes were mapped
`onto the representative structure of the protein. First,
`epitope residues were calculated for each complex struc-
`ture using one of the aforementioned epitope definitions.
`Second, epitope residues defined for the represented
`structures were mapped onto the representative structure
`based on the structure alignments. For example, the
`hemagglutinin HA1 chain of influenza A virus was repre-
`sented by six 3D structures of the protein in complexes
`with Fab fragments of antibodies HC45 [PDB:1QFU],
`BH151 [PDB:1EO8], HC63 [PDB:1KEN], and HC19
`[PDB:2VIR, 2VIS, 2VIT]. Figure 3 illustrates a representa-
`tive structure [PDB:1EO8] of hemagglutinin HA1 upon
`which epitopes are mapped having been inferred from six
`complex structures. In this way, epitopes inferred from
`187 structures of antibody-protein complexes were
`mapped onto the 62 representative protein structures. The
`resulting dataset is denoted dataset #1. Data on mapped
`epitope residues are available upon request.
`
`Step 5 – to study the properties of individual epitopes and
`their prediction a dataset of representative epitopes, data-
`set #2 derived from 3D structures of antibody-protein
`complexes defining different epitopes was constructed. An
`important question to consider is how to define individ-
`ual epitopes yet avoid bias by over-presentation of partic-
`ular epitopes? For example (Fig. 3), while HC45 (blue)
`and BH151 (magenta) epitopes overlap, neither HC63
`(green) nor HC19 (red) epitopes overlap, they are sepa-
`rated on the protein surface. Nevertheless, HC45 and
`BH151 epitopes share residues (orange in Fig. 3), as do
`HC63 and HC19 epitopes (yellow in Fig. 3). Are HC45
`
`Page 4 of 19
`(page number not for citation purposes)
`
`Lassen - Exhibit 1036, p. 4
`
`

`

`BMC Structural Biology 2007, 7:64
`
`http://www.biomedcentral.com/1472-6807/7/64
`
`Two orthogonal views of a representative structure, influenza A virus hemagglutinin HA1 chain [PDB:1EO8]Figure 3
`
`Two orthogonal views of a representative structure, influenza A virus hemagglutinin HA1 chain [PDB:1EO8].
`Chain A is shown in light gray upon which are mapped epitope residues inferred from six protein structures in complexes with
`antibody fragments: HC45 Fab [PDB:1QFU] (blue), BH151 Fab [PDB:1EO8] (magenta), HC63 Fab [PDB:1KEN] (green), HC19
`Fab [PDB:2VIR, 2VIS, 2VIT] (red). The hemagglutinin HA2 chain is shown in cyan. Residues common to HC45 and BH151
`epitopes are shown in orange; residues common to HC63 and HC19 epitopes are shown in yellow; residue Tyr98 which is a
`part of HC19 epitope inferred from structure 2VIR but not from 2VIS and 2VIT structures is shown in black; The HC19
`epitope residue Thr131 which is mutated to Ile in the 2VIS structure is shown in dark red. The HC19 epitope residue Thr155
`which is mutated to Ile in 2VIT structure is shown in violet.
`
`and BH151 epitopes similar or different? This question is
`answered by considering the degree of overlap.
`
`Two epitopes are deemed similar if, in addition to the
`aforementioned criteria for epitope definition, they
`belong to similar protein chains and have >75% residues
`in common for both epitopes. A cut-off value of 75% for
`epitope similarity was chosen empirically. Thus, the
`HC45 and BH151 epitopes on influenza A virus hemag-
`glutinin HA1 (Fig. 3) share 14 residues, that make up 74%
`and 93% of the size of HC45 and BH151 epitopes, respec-
`tively. A cut-off on epitope overlap of less than 75%
`
`would define HC45 and BH151 epitopes as similar even
`though they are known to be different. HC45 and BH151
`are antibodies from different germ-lines with variable
`domains sharing only 56% sequence similarity, their H3
`CDR regions adopt distinct conformations and these anti-
`bodies are tolerant to different mutations in hemaggluti-
`nin [59]. Another example, X5 and 17B epitopes of gp120
`share 75% of their residues yet X5 and 17B antibodies are
`from different genes [60]. A cut-off value for epitope sim-
`ilarity equal to or less than 75% would erroneously define
`X5 and 17B epitopes as similar. Conversely, a cut-off value
`of 80% would make epitopes inferred from different
`
`Page 5 of 19
`(page number not for citation purposes)
`
`Lassen - Exhibit 1036, p. 5
`
`

`

`BMC Structural Biology 2007, 7:64
`
`http://www.biomedcentral.com/1472-6807/7/64
`
`structures of the same antibody-protein complex dissimi-
`lar. For example, the H57 epitope of T cell receptor N15 is
`inferred from two complex structures of a single crystal
`asymmetric unit ([PDB:1NFD], complexes (D)-(HG) and
`(B)-(FE), where the letters denote protein chain identifi-
`ers) would be dissimilar.
`
`Given a 75% empirical cut-off for epitope similarity,
`epitopes inferred from structures of complexes with two-
`chain antibody fragments were divided into 44 singletons
`and 26 groups; epitopes inferred from structures of com-
`plexes with one-chain antibody fragments were divided
`into ten singletons and two groups.
`
`Step 6 – for each group of similar epitopes, the represent-
`ative 3D structure of the antibody-protein complex was
`selected based upon the following preferences. First, the
`structure with no or a minimal number of heteroatoms
`(excluding water) and other protein chains in the interface
`(i.e., separated from any atoms of both antigen and anti-
`body by ≤4Å distance) was preferred. Second, preference
`was given to the structure with the largest epitope, i.e.,
`maximum number of epitope residues. Third, the struc-
`ture with the best resolution ≤2.5Å was preferred. Dataset
`#2 of representative structures of antibody-protein com-
`plexes (representative epitopes) consisted of 70 structures
`of proteins in complexes with two-chain antibody frag-
`ments and 12 structures of proteins in complexes with
`one-chain antibody fragments.
`
`Web-servers performance evaluation
`Using the benchmark datasets introduced above we eval-
`uated eight recently-developed and publicly available
`
`Table 1: Servers evaluated in this work
`
`web-servers. The servers use different methods yet all have
`the goal of predicting either B-cell epitopes, or more gen-
`erally protein-protein binding sites. The servers are listed
`in Table 1. Any reference in the text to the method actually
`means the server which implements that method, e.g., the
`DOT method running on the ClusPro server is called Clus-
`Pro(DOT).
`
`The methods fall into two categories:
`
`(cid:127) Scale-based methods – each protein residue is assigned a
`value reflecting the probability of that residue being part
`of the protein interface or epitope. DiscoTope, PIER, Pro-
`Mate and ConSurf fall into this category.
`
`(cid:127) Patch prediction and protein-protein docking methods – each
`protein residue is predicted to be part of a surface patch of
`residues defining the protein interface or epitope. Disco-
`Tope, ProMate, CEP, PPI-PRED, ClusPro(DOT), and
`PatchDock fall into this category.
`
`Two methods, DiscoTope and ProMate, fall into both cat-
`egories since they predict patches and assign score values
`to each protein residue.
`
`The evaluation of the methods was performed as follows.
`First, the scale-based methods were analyzed on how well
`the residue score values discriminate epitope versus non-
`epitope residues using dataset #1. Further, performance of
`all methods was evaluated on their ability to recognize
`representative epitopes from dataset #2. The first step is
`obviously not essential; it was performed as an example of
`the application of dataset #1 that can be used for future
`
`Server name
`
`Method type
`
`Training dataset
`
`Reference
`
`CEP (Conformational
`Epitope Prediction)
`DiscoTope
`
`ProMate
`
`PIER (Protein IntErface
`Recognition)
`
`PPI-PRED (Protein-
`Protein Interface
`Prediction)
`ConSurf
`
`ClusPro (DOT
`program)
`PatchDock
`
`Discontinuous epitope prediction based on residue
`solvent accessibility and spatial distribution.
`Discontinuous epitope prediction based on amino
`acid statistics, residue solvent accessibility and spatial
`distribution.
`Protein-protein binding interface prediction based on
`significant structural and sequence interface
`properties.
`Protein-protein binding interface prediction based on
`local statistical properties of the protein surface
`derived at the level of atomic groups.
`Protein-protein binding interface prediction based on
`significant structural and sequence interface
`properties.
`Mapping of phylogenetic information (sequence
`conservation grades) on to the surface of proteins
`with known 3D structure.
`Rigid-body protein-protein docking based on the
`Fast-Fourier Transform correlation approach.
`Rigid-body protein-protein docking based on local
`shape feature matching.
`
`No training set.
`
`75 structures of antibody-antigen complexes.
`
`Manually curated; 57 protein involved in
`heterodimeric transient interactions (excluding
`antigen-antibody complexes).
`490 homodimeric, 62 heterodimeric and 196
`transient interfaces (excluding antigen-antibody
`complexes).
`Manually curated; 180 proteins from 149 complexes
`both obligate (114) and transient (66).
`
`No training set.
`
`No training set.
`
`No training set.
`
`[48]
`
`[49]
`
`[55]
`
`[54]
`
`[53]
`
`[56]
`
`[50] [51]
`
`[52]
`
`Page 6 of 19
`(page number not for citation purposes)
`
`Lassen - Exhibit 1036, p. 6
`
`

`

`BMC Structural Biology 2007, 7:64
`
`http://www.biomedcentral.com/1472-6807/7/64
`
`methods development and for revealing properties of
`epitope residues beyond the fact that epitopes are sites on
`the protein surface.
`
`Scale-based methods: score value distributions
`DiscoTope, PIER, ProMate and ConSurf assign to each
`protein residue a score reflecting the probability of that
`residue being a part of the protein interface or epitope.
`Details are provided in the Methods section. For the anal-
`ysis of epitope residues versus non-epitope residues we
`used dataset #1, that is, representative antigen structures
`with epitopes mapped onto them. Here an epitope resi-
`due is an antigen residue known to be part of an epitope
`in any complex of this antigen with any antibody. Con-
`versely a non-epitope residue implies an antigen residue
`which is not known to be part of a structural epitope. To
`simplify the calculation proteins with epitopes located on
`more than one protein chain were discarded from the
`analyses (there were 10 such proteins). As a result 52 pro-
`tein antigens were analyzed [see Additional file 1].
`
`The score distributions for epitope, non-epitope and all
`protein residues were calculated for each method and are
`shown in Figures 4, 5, 6, 7. Distributions taking into
`account only surface residues were similar for all methods
`(results not shown). The definition of a surface residue is
`given in the Methods section.
`
`DiscoTope, ProMate and ConSurf scores discriminate
`epitope versus non-epitope and versus all protein resi-
`dues, while PIER and ConSurf confidence scores do not.
`Thus, as one can see in Figure 4, DiscoTope discriminates
`x
`epitope residues (
` = -10.2, s = 5.4, number of residues N
`x
`= 1,364) from non-epitope residues (
` = -13.3, s = 6.3, N
`x
`= 9,713) (p < 0.001) and all antigen residues (
` = -13.0,
`s = 6.3, N = 11,077) (p < 0.001). These distributions are
`
`Distribution of ConSurf scores for epitope and all protein residuesFigure 5
`
`
`Distribution of ConSurf scores for epitope and all
`protein residues. For the definition of confidence score
`see the Methods section.
`
`significantly different (p < 0.001) regardless of the epitope
`definition used. The ConSurf conservation score also dis-
`x
`criminates epitope residues (
` = 0.273, s = 1.050, N =
`x
`1,119) versus non-epitope residues (
` = -0.049, s =
`x
`0.987, p < 0.001) and versus all antigen residues (
` = -
`0.007, s = 1.00, N = 8,684, p < 0.001) (Fig. 5). The same
`was true for epitope vs. all surface residues. Further, the
`confidence level did not change when the definition of
`surface residues and/or epitope residues was changed
`(data not shown). However, if only residues with ConSurf
`
`Distributions of DiscoTope scores for epitope, non-epitope and all protein residuesFigure 4
`
`
`Distributions of DiscoTope scores for epitope, non-epitope
`and all protein residues.
`
`Distribution of ProMate scores for epitope, non-epitope and all protein residuesFigure 6
`
`
`Distribution of ProMate scores for epitope, non-epitope and
`all protein residues.
`
`Page 7 of 19
`(page number not for citation purposes)
`
`Lassen - Exhibit 1036, p. 7
`
`

`

`BMC Structural Biology 2007, 7:64
`
`http://www.biomedcentral.com/1472-6807/7/64
`
`the sensitivity and positive predictive values and measure
`the method performance from the relative number of suc-
`cessful predictions in the test dataset [53].
`
`Approaching the task of evaluation and comparison of
`different methods, we encountered a number of ques-
`tions. How can we compare scale-based methods with
`patch prediction and docking methods? DiscoTope and
`ProMate predict one patch per protein, while other meth-
`ods predict several patches, how can these be compared?
`Using a score value assigned by ProMate, DiscoTope, or
`ConSurf to a residue, all epitopes in the protein are taken
`into account, so can we say that the method predicts one
`epitope per protein? Is not the direct comparison of pro-
`tein docking methods (ClusPro (DOT), PatchDock) ver-
`sus patch-based prediction methods
`(DiscoTope,
`ProMate, CEP, PPI-PRED) questionable since the former
`methods are based on optimization of an interaction
`energy function, while the latter depend on training?
`Finally, docking methods require knowledge of the struc-
`tures of both interacting proteins, antigen and antibody,
`while binding site prediction methods are based on the
`structure of the protein antigen alone and do not require
`knowledge of the antibody structure. Is this a fair compar-
`ison? Being aware of these questions and limitations, we
`applied various evaluation criteria in an attempt to pro-
`vide a thorough and fair comparison of the methods.
`
`The evaluation was performed on the dataset of represent-
`ative epitopes, assuming any antigen residue which is not
`a part of a considered epitope is part of a non-epitope. We
`didn't discard non-epitope residues, which we know
`belong to some other epitope in the protein, because we
`assumed that a prediction program will predict an epitope
`in an antigen for which it doesn't have any additional
`information except its sequence and structure – this is
`how all evaluated methods were constructed. The analysis
`was performed using the representative epitopes from
`dataset #2 that were inferred from structures of one-chain
`(monomer) antigens in complexes with two-chain anti-
`body fragments. There were 59 such epitopes in 48 anti-
`gens (Table 2).
`
`The following parameters were used to evaluate the meth-
`ods:
`
`Sensitivity (recall or true positive rate (TPR)) = TP/(TP +
`FN) – a proportion of correctly predicted epitope residues
`(TP) with respect to the total number of epitope residues
`(TP+FN).
`
`Specificity (or 1 – false positive rate (FPR)) = 1 - FP/(TN +
`FP) – a proportion of correctly predicted non-epitope res-
`idues (TN) with respect to the total number of non-
`epitope residues (TN+FP).
`
`Page 8 of 19
`(page number not for citation purposes)
`
`Distribution of PIER scores for epitope, non-epitope and all protein residuesFigure 7
`
`
`Distribution of PIER scores for epitope, non-epitope and all
`protein residues.
`
`confidence score values were considered, no significant
`difference between epitope and other protein residues was
`x
`observed (epitope residues:
` = 0.197, s = 0.539; non-
`x
`epitope residues:
` = 0.194, s = 0.556, p > 0.05). For Pro-
`x
`Mate mean scores for epitope residues (
` = 52.8, s = 25.4,
`N = 1,363) were significantly higher than for all antigen
`x
`residues (
` = 46.5, s = 28.1, N = 11,074) or non-epitope
`residues or all surface residues (p < 0.001) (Fig. 6). The
`PIER score does not discriminate epitope versus other
`x
`antigen residues (epitope residues:
` = 11.9, s = 11.4, N =
`x
`1,363; non-epitope residues:
` = 12.6, s = 13.7; N =
`8,221, p > 0.05) (Fig. 7).
`
`These results suggest that epitope residues are less con-
`servative according to the ConSurf evolutionary conserv-
`ancy scores than protein surface residues in general at a
`99.9% confidence level (p < 0.001). PIER, which is trained
`on 3D structures of all protein-protein complexes availa-
`ble in the PDB, could not distinguish epitopes from the
`rest of the protein surface. One possible explan

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket