throbber
2 T
`
`he Kabat Database and a Bioinformatics Example
`
`George Johnson and Tai Te Wu
`
`1. Introduction
`In 1969, Elvin A. Kabat of Columbia University College of Physicians and
`Surgeons and Tai Te Wu of Cornell University Medical College began to col-
`lect and align amino acid sequences of human and mouse Bence Jones proteins
`and immunoglobulin (Ig) light chains. This was the beginning of the Kabat
`Database. They used a simple mathematical formula to calculate the various
`amino acid substitutions at each position and predict the precise locations of
`segments of the light-chain variable region that would form the antibody-com-
`bining site from a variability plot (1). The Kabat Database is one of the oldest
`biological sequence databases, and for many years was the only sequence data-
`base with alignment information.
`The Kabat Database was available in book form free to the scientific com-
`munity starting in 1976 (2), with an updated second edition released in 1979
`(3), third edition in 1983 (4), fourth edition in 1987 (5), and fifth printed edi-
`tion in 1991 (6). Because of the inclusion of amino acid as well as nucleotide
`sequences of antibodies, T-cell receptors for antigens (TCR), major histocom-
`patibility complex (MHC) class I and II molecules, and other related proteins
`of immunological interest, it became impossible to provide printed versions
`after 1991. In that same year, George Johnson of Northwestern University cre-
`ated a website to electronically distribute the database located temporarily at:
`
`http://kabatdatabase.com
`
`During the following decade, the Kabat Database had grown more than five
`times. Thanks to the generous financial support from the National Institutes of
`Health, access to this website had been free for both academic and commercial use.
`With the completion of the human genome project as well as several other
`genome projects, scientific emphasis has gradually shifted from determining
`
`From: Methods in Molecular Biology, Vol. 248: Antibody Engineering: Methods and Protocols
`Edited by: B. K. C. Lo © Humana Press Inc., Totowa, NJ
`
`11
`
`1 of 17
`
`BI Exhibit 1057
`
`

`

`12
`
`Johnson and Wu
`
`more sequences to analyzing the information content of the existing sequence
`data. With regard to the Kabat Database, the collection and alignment of amino
`acid and nucleotide sequences of proteins of immunological interest has been
`progressing side-by-side with the ability to determine structure and function
`information from these sequences, from its very start.
`
`1.1. Historical Analysis and Use
`After the pioneering work of Hilschmann and Craig (7) on the sequencing of
`three human Bence Jones proteins, many research groups joined the effort of
`determining Ig light chain amino acid sequences. By 1970, there were 77 pub-
`lished complete or partial Ig light chain sequences: 24 human κ-I, 4 human κ-
`II, 17 human κ-III, 10 human λ-I, 2 human λ-II, 6 human λ-III, 5 human λ-IV,
`2 human λ-V, 2 mouse κ-I, and 5 mouse κ-II proteins (1). The invariant Cys
`residues were aligned at positions 23 and 88, the invariant Trp residue posi-
`tioned at 35, and the two invariant Gly residues at positions 99 and 101. To
`align the variable region of kappa and lambda light chains, single-residue gaps
`were placed at positions 10 and 106A. Longer gaps were introduced between
`positions 27 and 28 (27A, 27B, 27C, 27D, 27E, and 27F) and between 97 and
`98 (97A and 97B), which was later changed to between 95 and 96 (95A, 95B,
`95C, 95D, 95E and 95F). A similar alignment technique with a different num-
`bering system was introduced for the Ig heavy-chain variable regions (8). The
`invariant Cys residues were located at positions 22 and 92, the Trp residue at
`position 36, and the two invariant Gly residues at positions 104 and 106.
`The most important discovery to come from alignment of the Ig heavy- and
`light-chain sequences was the location of segments forming the antibody-com-
`bining site, known as the complementarity (initially called hypervariable)-
`determining regions (CDRs). Since different antibodies bind different antigens,
`numerous amino acid substitutions occur in these segments, leading to large,
`calculated variability values. The first variability plot of the 77 complete and
`partial amino acid sequences of human and mouse light chains showed three
`distinct peaks of variability, located between positions 24 to 34, 50 to 56, and
`89 to 97 (1). Three similar peaks were discovered in heavy chains at positions
`31 to 35, 50 to 65, and 95 to 102. These six short segments were hypothesized
`to form the antigen-binding site and were designated as CDRL1, CDRL2,
`CDRL3 for light chains, and CDRH1, CDRH2, and CDRH3 for heavy chains,
`respectively.
`Initial Ig three-dimensional (3D) X-ray diffraction experiments suggested
`that the six binding-site segments were indeed physically located on one side of
`the Ig macromolecule. Final verification of this theoretical prediction came
`after the development of hybridoma technology (9). An anti-lysozyme mono-
`clonal antibody Fab fragment was co-crystallized with lysozyme (10), and the
`
`2 of 17
`
`BI Exhibit 1057
`
`

`

`The Kabat Database
`
`13
`
`combined 3D structure was determined by X-ray diffraction analysis. Several
`amino acid residues in each of the six CDRs of the antibody were found to be
`in direct contact with the antigen. As theoretically predicted, antibody speci-
`ficity thus resided exclusively in the CDRs. During the past decade, designer
`antibodies have been constructed genetically by selecting these CDRs for their
`affinity for the target antigen.
`By comparing the amino acid sequences of the CDRs as well the stretches of
`sequence that connect them, known as framework regions (FR), Kabat and Wu
`hypothesized that the Ig variable regions were assembled from short genetic
`segments (11,12). This hypothesis was verified experimentally by Bernard et
`al. (13) with the discovery of the J-minigenes, reminiscent of the switch pep-
`tide proposed by Milstein (14). The D-minigenes were soon identified as
`another component of the heavy-chain variable region (15,16). In addition, the
`idea of gene conversion (17) was proposed as a possible mechanism of anti-
`body diversification, and appears to play a central role in chickens (18), and to
`a varying extent in humans, rabbits, and sheep.
`For precisely aligned amino acid sequences of Ig heavy-chain variable
`regions, CDRH3 is defined as the segment from position 95 to position 102,
`with possible insertions between positions 100 and 101. The CDRH3-binding
`loop is the result of the joining of the V-genes, D-minigenes, and J-minigenes.
`This intriguing process has been studied extensively (19,20), and suggests the
`CDRH3 plays a unique role in conferring fine specificity to antibodies (21,22).
`Indeed, a particular amino acid sequence of CDRH3 is almost always associ-
`ated with one unique antibody specificity. The CDRH3 sequences within the
`Kabat Database have further been analyzed by their length distributions (23),
`for which the length distributions of 2,500 complete and distinct CDRH3s of
`human, mouse, and other species were found to be more-or-less in agreement
`with the Poisson distribution. Interestingly, the longest mouse CDRH3 had a
`length of 19 amino acid residues, and that of human had 32 residues, and only
`one of them was shared by both species (24), suggesting that CDRH3 may be
`species-specific.
`Because of the subtle differences between the variable regions of the Ig light
`and heavy chains, their alignment position numberings are independent. For
`example, in light chains, the first invariant Cys is located at position 23 and
`CDRL1 is from position 24 to 34—e.g., immediately after the Cys residue.
`However, in heavy chains, the invariant Cys is located at position 22 and
`CDRH1 is from position 31 to 35—e.g., eight amino residues after that Cys.
`Because of this important difference, the Kabat numbering systems are sepa-
`rate for Ig light and heavy chains. Attempts to combine these two numbering
`systems into one in other databases have resulted in the presence of many gaps
`and confusions. Similarly, variable regions of TCR alpha, beta, gamma, and
`
`3 of 17
`
`BI Exhibit 1057
`
`

`

`14
`
`Johnson and Wu
`
`Table 1
`FRs and CDRs of Antibody and TCR Variable Regions
`
`FR or CDR
`
`FR1
`CDR1
`FR2
`CDR2
`FR3
`CDR3
`FR4
`
`VL
`1–23
`24–34
`35–49
`50–56
`57–88
`89–97
`98–107
`
`VH
`1–22
`31–35B
`36–49
`50–65
`66–91
`95–102
`103–113
`
`Vα
`
`Vβ
`
`Vγ
`
`Vδ
`
`1–22
`23–33
`34–47
`48–56
`57–92
`93–105
`106–116
`
`1–21
`1–23
`22–34
`24–33
`35–49
`34–49
`50–59
`50–56
`60–95
`57–94
`96–107
`95–107
`108–116A 108–116C
`
`1–22
`23–34A
`35–49
`50–57
`58–89
`90–105
`106–116
`
`delta chains are aligned using different numbering systems. The alignments are
`summarized in Table 1, with the locations of CDRs indicated.
`
`1.2. Current Analysis and Use
`There are approx 25,000 unique yearly logins to the website of the Kabat
`Database by immunologists and other researchers around the world. The web-
`site is designed to be simple to use by those who are familiar with computers
`and those who are not. A description of the tools currently available is shown in
`Table 2. We encourage researchers who use the database to share their sugges-
`tions for improving the access and searching tools.
`A common but extremely important question asked by researchers is
`whether a new sequence of protein of immunological interest has been deter-
`mined before and stored in the database. Without asking this simple question,
`one may encounter the following situation: a heavy-chain V-gene from goldfish
`was sequenced (25) and found to be nearly identical to some of the human V-
`genes. Subsequently, the authors suggested that it might be of human origin,
`possibly because of the extremely sensitive amplification method used in the
`study and minute contamination of the sample by human tissue.
`Another common use of the database is to confirm the reading frame of an
`immunologically related nucleotide sequence. Comparing short segments of
`sequence with stored database sequences can easily identify inadvertent omis-
`sion of a nucleotide in the sequencing gel. Of course, if the missing nucleotide is
`real, this can suggest the presence of a pseudogene. Researchers also use the
`website to calculate variability for groupings of similar sequences of interest. For
`example, the variability plots of the variable regions of the Ig heavy and light
`chains of human anti-DNA antibodies are shown in Figs. 1 and 2. These two
`plots seem to indicate that CDRH3 may contribute most to the binding of DNA.
`In many instances, investigators would like to identify the germline gene
`that is closest to their gene of interest, as well as the classification of that par-
`
`4 of 17
`
`BI Exhibit 1057
`
`

`

`The Kabat Database
`
`15
`
`Table 2
`Listing of Tools Available on the Kabat Database Website
`
`Tool
`
`Seqhunt II
`
`Align-A-Sequence
`
`Subgrouping
`
`Find Your Families
`
`Current Counts
`
`Variability
`
`Description
`
`The SeqhuntII tool is a collection of searching programs for
`retrieving sequence entries and performing pattern matches,
`with allowable mismatches, on the nucleotide and amino
`acid sequence data. The majority of fields in the database are
`searchable—for example, a sequence’s journal citation.
`Matching entries may be viewed as HTML files or down-
`loaded and printed. Pattern matching results show the match-
`ing database sequence aligned with the target pattern, with
`differences highlighted.
`The Align-A-Sequence tool attempts to programmatically align
`different types of user-entered sequences. Currently kappa
`and lambda Ig light-chain variable regions may be aligned
`using the program.
`The Subgrouping tool takes a user-entered sequence of either
`Ig heavy, kappa, or lambda light-chain variable region and
`attempts to assign it a subgroup designation based on those
`described in the 1991 edition of the database. In many cases
`the assignment is ambiguous because of a sequence’s simi-
`larity to more than one subgroup.
`The Find Your Family tool attempts to assign a “family”
`designation to a user-entered sequence. The user-entered tar-
`get sequence is compared to previously assembled groupings
`of sequences, based on sequence homology. Please note that
`the assigned family number is arbitrary, since the groupings
`usually change as new data is added to the database.
`Current amino acid, nucleotide, and entry counts may be made
`for various groupings of sequences.
`Variability calculations may be made over a user-specified
`collection of sequences. The distributions used to calculate
`the variability are also available for viewing and printing.
`Variability plots can be customized for scale, axis labels, and
`title, or downloaded for printing.
`
`ticular gene to a specific family or subgroup. SEQHUNT (26) can pinpoint the
`sequence available in the database with the least number of amino acid or
`nucleotide differences.
`The previous examples represent most of the current uses of the Kabat Data-
`base by immunologists and other scientists. However, many more detailed
`
`5 of 17
`
`BI Exhibit 1057
`
`

`

`16
`
`Johnson and Wu
`
`Fig. 1. Variability plot for human anti-DNA heavy-chain variable region.
`
`analyses are possible from the data stored in the Kabat Database, as shown in
`Table 3.
`In the following section, a current bioinformatics example is illustrated,
`using the uniquely aligned data contained in the Kabat Database.
`
`2. Kabat Database Bioinformatics Example: HIV gp120 V3-loop
`and Human CDRH3 Amino Acid Sequences
`The human immunodeficiency virus (HIV) has intrigued the scientific com-
`munity for several decades. It is a retrovirus with two copies of RNA as its
`genetic material. Upon infecting humans, HIV uses its reverse-transcriptase
`molecules to convert its RNA into DNA, which are in turn transported into the
`nucleus and incorporated into the host chromosomes of CD4+ T cells.
`Although the infected individual produces antibodies against the initial viral
`strain, not all viruses can be eliminated because of the integration of its genetic
`material into the host cells. Gradually, the viral-coat proteins change in
`sequence, rendering the host’s antibodies less effective. Eventually, acquired
`
`6 of 17
`
`BI Exhibit 1057
`
`

`

`The Kabat Database
`
`17
`
`Fig. 2. Variability plot for human anti-DNA kappa light-chain variable region.
`
`immunodeficiency syndrome (AIDS) develops with a latent period of approx
`10 ± 3 yr. Because of this, HIV is classified as a lentivirus or slow virus.
`Several specific drugs have been synthesized during recent years to treat
`HIV infection and AIDS. They include reverse-transcriptase inhibitors, pro-
`tease inhibitors, and fusion inhibitors. However, these drugs have serious side
`effects, and most are very expensive, making the cost of treatment prohibitive
`in countries with a large percentage of HIV-positive patients. For years, the
`ideal solution has been to develop an inexpensive vaccine. Unfortunately,
`because of the rapid changes of its envelope coat proteins, especially gp120,
`HIV strains cannot be singled out as candidates for vaccine. Many research lab-
`oratories around the world have undertaken the task of sequencing gp120, and
`these sequences have been stored on two websites:
`
`http://ncbi.nlm.nih.gov and http://www.lanl.gov
`Figure 3 shows a variability plot for the 302 nearly complete sequences of
`HIV-1 stored at the latter site. For comparison, a variability plot of 138
`
`7 of 17
`
`BI Exhibit 1057
`
`

`

`18
`
`Johnson and Wu
`
`Table 3
`Partial Listing of Bioinformatics Studies Performed Using
`the Kabat Database
`
`Subject
`
`Binding Site Prediction
`
`Antibody Humanization
`
`Gene Count Estimation
`
`MHC Class I gene
`assortment
`TCR CDR3 length
`distribution
`
`Antibody and TCR
`evolution
`
`Designer Antibodies
`
`Autoimmunity
`
`Summary
`
`The CDRs of Ig heavy and light chains were predicted from
`variability calculations made over the sequence align-
`ments (1,8).
`It is possible to identify the most similar framework regions
`between the mouse antibody and all existing human anti-
`bodies stored in the database (30).
`From the existing sequences, it is possible to estimate the
`total number of human and mouse V-genes for antibody
`light and heavy chains, as well as TCR alpha and beta
`chains (31,32).
`The known sequences of human MHC class I sequences
`suggest that their a1 and a2 regions can be assorted (33).
`The lengths of CDR3s in antibodies and TCRs have distinct
`features (34,35). In the case of TCR alpha and beta
`chains, their CDR3 lengths follow a narrow and random
`distribution. That may be a result of the relatively fixed
`size and shape of the processed peptide in the groove of
`MHC class I or II molecules. On the other hand,
`although the TCR gamma chain CDR3 lengths are simi-
`larly distributed, those of TCR delta chains exhibit a
`bimodal distribution (35). TCR delta chains with shorter
`CDR3s may be MHC-restricted, although those with
`longer CDR3s MHC-unrestricted.
`Possible mechanisms of antibody and TCR evolution can
`also be investigated by comparing aligned sequences
`from different species (36,37).
`More specific/potent antibodies may be designed using the
`preferred CDR lengths calculated from database
`sequences against the same antigen (34).
`Similarities between non-self antigens such as influenza
`virus and Ig autoantibodies have been found. Certain
`antigens may help initially trigger autoimmunity, and
`certain antibody clones may help to stimulate the
`autoimmune response (36).
`
`aligned human influenza virus A hemagglutinin amino acid sequences is
`shown in Fig. 4.
`Based on various studies, the V3-loop has been singled out for vaccine
`development. Although the V3-loop has the least amount of variation among
`
`8 of 17
`
`BI Exhibit 1057
`
`

`

`The Kabat Database
`
`19
`
`Fig. 3. Variability plot for HIV-1 gp120.
`
`Fig. 4. Variability plot for influenza virus A hemagglutinin.
`
`the five V-loops, there are still many different sequences from various strains of
`HIV. How these different sequences are related to the pathogenesis and pro-
`gression of HIV infection is unclear. Longitudinal analysis of sequences of the
`V3-loop as the disease progresses is of vital importance in understanding the
`
`9 of 17
`
`BI Exhibit 1057
`
`

`

`20
`
`Johnson and Wu
`
`changes that occur during infection, so that an effective vaccine can be devel-
`oped. Unfortunately, there is only one published report for a 10-yr sequence
`analysis, and in that case, the authors were unable to describe how the V3-loop
`amino acid sequences are related to disease progression (27).
`When HIV infects a person, its gp120 is a foreign protein and the patient
`produces antibodies toward this foreign antigen. However, once the HIV gene
`is integrated into the host chromosome, as in various human endogenous retro-
`viruses, the gp120 becomes a self-protein. This transition from foreign to self
`usually cannot occur instantaneously, but as it occurs the host will have
`increasing difficulty producing effective antibodies. Indeed, initial antibodies
`from patients who are infected with HIV are usually ineffective in binding HIV
`at later stages of the disease.
`The V3-loop has been described as being located on the surface of gp120.
`One way for the gp120 to become less antigenic would be for the virus to
`replace portions of the exposed V3-loop with segments of the host chromo-
`some. Although any human protein could serve this purpose, we investigate the
`possibility that human CDRH3 regions are being used. CDRH3 is particularly
`attractive, because they can assume many possible configurations and they are
`on the surface of normal human proteins.
`To locate matches between the V3-loop and CDRH3, the Kabat Database is
`uniquely useful. BLAST (http://www.ncbi.nlm.nih.gov) has recently allowed
`matches of short amino acid sequences, and eMOTIF (http://emotif.stanford.
`edu/emotif/) can be used to search for various length sequences. However, both
`programs use sequence databases containing large numbers of HIV-1 sequences
`and relatively few antibody heavy-chain variable region sequences. A search for
`short V3-loop sequences at these two websites usually results in a listing of other
`V3-loop sequences, and few,
`if any, CDRH3 sequences. By using the
`SEQHUNTII program, we picked the human heavy-chain variable regions and
`searched for all penta-peptides in the sequences of V3-loops determined in the
`10-yr longitudinal study. The result of matching is listed in Table 4.
`The initial number of matches is gradually reduced over the years, until the
`CD4+ T-cell count drops below 200. At that time, the number of matches
`increases dramatically. The match number appears to closely correlate with the
`number of HIV RNA molecules in the patient’s blood. For example, after treat-
`ment, the number of matches drops to zero, along with a reduction in the
`plasma HIV RNA number. Subsequently, after 10 yr of HIV infection, the
`number of matches begins to creep up again.
`A possible explanation for this finding is that the presence of CDRH3 penta-
`peptides in the V3-loop reduces its antigenicity. Such mutant HIV would bind
`existing anti-HIV antibodies in the patient less effectively, becoming more
`pathogenic. Based on this observation, the use of amino acid or nucleotide
`sequences of V3-loop as a vaccine would not be very efficient.
`
`10 of 17
`
`BI Exhibit 1057
`
`

`

`The Kabat Database
`
`21
`
`Table 4
`Longitudinal Study of HIV gp120 V3-Loop Sequence Variations
`
`Months
`after
`Infection
`
`Sequence
`of V3-loop
`determined
`
`Matches
`in human
`CDRH3
`
`CDR4+
`T-cells
`
`HIV RNA
`per mL of
`plasma
`
`0
`12
`27
`42
`70
`94
`97
`110
`118
`
`10
`10
`7
`5
`3
`12
`
`12
`12
`
`6
`3
`0
`0
`0
`21
`
`0
`1
`
`230
`230
`2,300
`230
`230
`23,000
`
`2,300
`2,300
`
`427
`277
`186
`156
`
`248
`212
`
`Sample
`
`A1
`A2
`A2b
`A3
`A4
`A5
`treatment
`A6
`A7
`
`An effective vaccine would most likely be made from an area of the exposed
`surface that does not contain high variability, as indicated in Fig. 3. There are
`several segments of seven or more nearly invariant amino acid residues in HIV
`gp120, in contrast to influenza virus hemagglutinin. Nearly invariant residues
`are defined as those that occur more than about 95% of the time at a particular
`position (1). They are located at the following positions (numbering including
`the precursor region) in the C1, C2, or C5 region of gp120:
`
`Segment #
`
`I
`II
`III
`IV
`V
`VI
`VII
`
`Position #
`
`4 to 14
`23 to 30
`44 to 50
`225 to 231
`261 to 267
`269 to 282
`538 to 545
`
`Sequence
`
`WVTVYYGVPVW
`LFCASDA
`ACVPTDP
`PIPIHYC
`VQCTHGL
`PVVSTQLLL-NGSL
`ELYKYKVV
`
`Some of the adjacent residues occur more than 90% of the time. Further-
`more, segments II and III and segments VI and V form disulfide bonds. Seg-
`ment VI is only one residue away from segment V, and that residue is either K
`or R most of the time. Segment I is near the N-terminal and segment VII near
`the C-terminal, and they are physically located near each other in the folded
`structure of gp120 (28). If these segments are indeed located on the surface of
`gp120, we may then suggest that segment I linked to segment VII—with link-
`ers consisting of repeats of GGGGS, segment II disulfide bounded to segment
`
`11 of 17
`
`BI Exhibit 1057
`
`

`

`22
`
`Johnson and Wu
`
`III, and segment IV S-S bounded to segment V joined to segment VI with an
`intervening residue of K or R—should be used as possible peptide vaccine can-
`didates. Additional residues that occur more than 90% of the time may also be
`included in these segments, suggesting the following three possible peptides:
`
`In contrast, for influenza virus hemagglutinin amino acid sequences, no such
`segments of seven or more residues are found.
`
`3. Future Directions
`As previously discussed, during the past few years a substantial decline in
`the number of published sequences of proteins of immunological interest has
`occurred. With the shift in focus from brute-force data collection to in-depth
`analysis and “data mining” by various researchers, well-characterized data sets
`have become extremely important. Each entry in the database inherently con-
`tains a large amount of bioinformatic analysis such as alignment information,
`the relationship between gene sequence and protein sequence, and coding
`region designation. These relationships prove most valuable in allowing
`researchers to ask more intuitive, abstract questions than would be possible
`with most unaligned, raw sequence databases. We continue to locate, annotate,
`and align sequences found in the published literature. Periodically, the database
`and website are updated to reflect inclusion of the new data. Corrections of
`errors found in the sequence data by us and by database users are constantly
`made, ensuring the collection’s accuracy. We continue to explore new ways of
`relating the database entries, such as incorporating links to journal abstracts,
`links to 3D structural information, and germline gene assignment.
`We continue to create and develop software programs for performing various
`analyses of the data. We are in the process of converting many tools we have
`used into Java and adding graphical interfaces. Two major groupings of tools are
`currently being created: the first to update and extend the current entry retrieval
`tools (such as SeqhuntII), and the second to perform distribution analyses on
`entire groups of sequences (such as variability). Java tools for locating sequences
`based on pattern matching, length distribution of a specified region, positional
`
`12 of 17
`
`BI Exhibit 1057
`
`

`

`The Kabat Database
`
`23
`
`examination of a codon or residue, and sequence length have been developed
`and are undergoing testing. Many of the studies we have performed on the data-
`base require tools for grouping and analyzing collections of sequences rather
`than each one individually. We are developing a Java interface for creating distri-
`butions based on position (used most frequently for calculating variability),
`region length (used in length distribution analyses), and sequence pattern (used
`in gene count estimations and various homology studies). Together, these power-
`ful interfaces will allow researchers to quickly perform many complex bioinfor-
`matics studies on the aligned sequence data and combine their results.
`
`4. Conclusion
`The fundamental reason for creating and maintaining most sequence data-
`bases is to study and correlate a protein’s primary sequence structure with its 3D
`structure. Although there are many proteins with known 3D structures, there are
`probably two orders of magnitude more proteins with known amino acid or
`nucleotide sequences. In the 1950s, Anfinsen proposed and summarized in his
`1973 paper (29) that the primary sequence of a protein should determine its 3D
`folding. Unfortunately, we still do not know how to decipher this information.
`In the long run, the Kabat Database must be self-sustained. However, the
`transition from a free NIH-supported database to a self-sustaining format will
`take time and continued investigator interest. For example, it is hoped that the
`rapid development of therapeutic antibody techniques, using chimeric or
`humanized approaches, will eventually lead to the de novo synthesis of
`designer antibodies. Thus, immunotherapy for cancers and viral infections may
`rely heavily on the Kabat Database collections.
`We will also rely on users to suggest to us what basic immunological ideas,
`what computer programs, and which types kinds of structure and function
`information will be of importance for future studies in this central problem in
`biomedicine. This feedback from users is of primary importance to the exis-
`tence of the Kabat Database.
`
`References
`1. Wu, T. T. and Kabat, E. A. (1970) An analysis of the sequences of the variable
`regions of Bence Jones proteins and myeloma light chains and their implications
`for antibody complementarity. J. Exp. Med. 132, 211–250.
`2. Kabat, E. A., Wu, T. T., and Bilofsky, H. (1976) Variable Regions of Immunoglobu-
`lin Chains. Bolt Beranek and Newman Inc., Cambridge, MA.
`3. Kabat, E. A., Wu, T. T., and Bilofsky, H. (1979) Sequences of Immunoglobulin
`Chains. NIH Publication No. 80–2008, Bethesda, MD.
`4. Kabat, E. A., Wu, T. T., Bilofsky, H., Reid-Miller, M., and Perry, H. (1983)
`Sequences of Proteins of Immunological Interest. NIH Publication No. 369–847,
`Bethesda, MD.
`
`13 of 17
`
`BI Exhibit 1057
`
`

`

`24
`
`Johnson and Wu
`
`5. Kabat, E. A., Wu, T. T., Reid-Miller, M., Perry, H., and Gottesman, K. (1987)
`Sequences of Proteins of Immunological Interest, 4th ed., U. S. Govt. Printing Off.
`No. 165–492, Bethesda, MD.
`6. Kabat, E. A., Wu, T. T., Perry, H., Gottesman, K., and Foeller, C. (1991) Sequences
`of Proteins of Immunological Interest, 5th ed., NIH Publication No. 91–3242,
`Bethesda, MD.
`7. Hilschmann, N., and Craig, L. C. (1965) Amino acid sequence studies with Bence
`Jones proteins. Proc. Natl. Acad. Sci. USA 53, 1403–1409.
`8. Kabat, E. A. and Wu, T. T. (1971) Attempts to locate complementarity-determining
`residues in the variable portions of light and heavy chains. Ann. NY Acad. Sci. 190,
`382–393.
`9. Kohler, G. and Milstein, C. (1975) Continuous cultures of fused cells secreting
`antibody of predefined specificity. Nature 256, 495–497.
`10. Amit, A. G., Mariussa, R. A., Phillips, S. E., and Poljak, R. J. (1986) Three-dimen-
`sional structure of antigen-antibody complex at 2.8 A resolution. Science 233,
`747–753.
`11. Wu, T. T., Kabat, E. A., and Bilifsky, H. (1975) Similarities among hypervariable
`segments of immunoglobulin chains. Proc. Natl. Acad. Sci. USA 72, 5107–5110.
`12. Kabat, E. A., Wu, T. T., and Bilofsky, H. (1978) Variable region genes for
`immunoglobulin framework are assembled from small fragments of DNA—a
`hypothesis. Proc. Natl. Acad. Sci. USA 75, 2429–2433.
`13. Bernard, O., Hozumi, N., and Tonegawa, S. (1978) Sequences of mouse light chain
`genes before and after somatic changes. Cell 15, 1133–1144.
`14. Milstein, C. (1967) Linked groups of residues in immunoglobulin chains. Nature
`216, 330–332.
`15. Early, P., Huang, H., Davis, M., Calame, K., and Hood, L. (1980) An Immunoglob-
`ulin heavy chain variable gene is generated from three segments of DNA: VH, DH,
`and JH. Cell 19, 981–992.
`16. Sakano, H., Maki, R., Kurosawa, Y., Roeder, W., and Tonegawa, S. (1980) Two
`types of somatic recombinations are necessary for the generation of complete
`heavy chain genes. Nature 286, 676–683.
`17. Baltimore, D. (1981) Gene conversion: some implications for immunoglobulin
`genes. Cell 24, 592–594.
`18. Reynaud, C., Anquez, V., Dahan, A., and Weill, J. (1985) A single rearrange event
`generates most of the chicken immunoglobulin light chain diversity. Cell 40,
`283–291.
`19. Desiderio, S. V., Yancopoulos, G. D., Paskind, M., Thomas, E., Boss, M. A., Lan-
`dau, N., et al. (1984) Insertion of N regions into heavy-chain genes is correlated
`with expression of terminal deoxytransferase in B cells. Nature 311, 752–755.
`20. Sleckman, B. P., Gorman, J. R., and Alt, F. W. (1996) Accessibility control of anti-
`gen-receptor variable-region gene assembly: role of cis-acting elements. Annu. Rev.
`Immunol. 14, 459–481.
`21. Kabat, E. A. and Wu, T. T. (1991) Indentical V-region amino acid sequences and
`segments of sequences in antibodies of different specificities: relative contributions
`
`14 of 17
`
`BI Exhibit 1057
`
`

`

`The Kabat Database
`
`25
`
`of VH and VL genes, minigenes and CDRs to binding of antibody combining sites.
`J. Immunol. 147, 1709–1819.
`22. Wu, T. T. (1994) From esoteric theory to therapeutic antibodies. Appl. Biochem.
`Biotechnol. 47, 107–118.
`23. Wu, T. T., Johnson, G., and Kabat, E. A. (1993) Length distribution of CDRH3 in
`antibodies. Proteins 16, 1–7.
`24. Wu, T. T. (2001) Analytical Molecular Biology. Kluwer Academic Publishers, Nor-
`well, MA.
`25. Wilson, M. R., Middleton, D., and Warr, G. W. (1988) Immunoglobulin heavy
`chain variable region gene evolution: structure and family relations of two genes
`and a pseudogene in a teleost fish. Proc. Natl. Acad. Sci. USA 85, 1566–1570; and
`(1989) Erratum. Proc. Natl. Acad. Sci. USA 86, 3276.
`26. Johnson, G., Wu, T. T., and Kabat, E. A. (1995) SEQHUNT, a program to search
`aligned nucleotide and amino acid sequences, in Antibody Engineering Protocols
`(Paul, S., ed.), Humana Press, Totowa, NJ, pp. 1–15.
`27. Janssens, W., Nkengasong, J., Heyndricks, L. van der Auwera, G., Vereecken, K.,
`Coppens, S., et al. (1999) Intrapatient variability of HIV type I group O ANT70
`during a 10-year follow-up. AIDS Res. Hum. Retrovir. 15, 1325–1332.
`28. Wyatt, R., Kwong, P. D., Desjardins, E., Sweet, R. W., Robinson, J., Hendrickson,
`W. A., et al. (1998) The antigen structure of HIV gp120 envelope glycoprotein.
`Nature 393, 705–711.
`29. Anfinsen, C. B. (1973) Principles that govern the folding of protein chains. Science
`181, 223–230.
`30. Wu, T. T. and Kabat, E. A. (1992) Possible use of similar framework region amino
`acid sequences between human and mouse immunoglobulins for humanizing
`mouse antibodies. Mol. Immunol. 29, 1141–1146.
`31. Johnson, G. and Wu, T. T. (1997a) A method of estimating the numbers of human
`and mouse immunoglobulin V-

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket