throbber
214–218
`
`Nucleic Acids Research, 2000, Vol. 28, No. 1
`
`© 2000 Oxford University Press
`
`Kabat Database and its applications: 30 years after the
`first variability plot
`George Johnson and Tai Te Wu*
`
`Departments of Biochemistry, Molecular Biology and Cell Biology, and of Biomedical Engineering,
`Northwestern University, Evanston, IL 60208, USA
`
`Received July 23, 1999; Revised and Accepted October 13, 1999
`
`ABSTRACT
`The Kabat Database was initially started in 1970 to
`determine the combining site of antibodies based on
`the available amino acid sequences at that time.
`Bence Jones proteins, mostly from human, were
`aligned, using the now-known Kabat numbering
`system, and a quantitative measure, variability, was
`calculated for every position. Three peaks, at positions
`24–34, 50–56 and 89–97, were identified and proposed
`to form the complementarity determining regions
`(CDR) of light chains. Subsequently, antibody heavy
`chain amino acid sequences were also aligned using
`a different numbering system, since the locations of
`their CDRs (31–35B, 50–65 and 95–102) are different
`from those of the light chains. CDRL1 starts right
`after the first invariant Cys 23 of light chains, while
`CDRH1 is eight amino acid residues away from the
`first invariant Cys 22 of heavy chains. During the past
`30 years, the Kabat database has grown to include
`nucleotide sequences, sequences of T cell receptors
`for antigens (TCR), major histocompatibility complex
`(MHC) class I and II molecules and other proteins of
`immunological interest. It has been used extensively
`by immunologists to derive useful structural and
`functional information from the primary sequences
`of these proteins. An overall view of the Kabat Database
`and its various applications are summarized here.
`The Kabat Database is freely available at http://immuno.
`bme.nwu.edu
`
`INTRODUCTION
`The purpose of maintaining the Kabat Database of aligned
`sequences of proteins of
`immunological
`interest,
`in our
`opinion, is to provide useful correlations between structure and
`function for this special group of proteins from their nucleotide
`and amino acid sequences to their tertiary structures (1). These
`sequences are thus aligned with the ultimate aim of under-
`standing how these proteins are folded and how they can
`perform their biological functions. We include only coding
`region sequences that have been published. In some cases, only the
`amino acid sequences were published, while the corresponding
`nucleotide sequences were deposited in GenBank. All stored
`
`sequences were then printed out and checked visually against
`available published sequences. We routinely survey for
`possible new sequences in journals in our libraries, Medline
`entries, cross-references from other papers, and author notification;
`however, we may still miss some sequences. GenBank, on the
`other hand, contains a substantial number of unpublished
`sequences. If there are doubts about these sequences or their
`annotations, please refer to the original papers. The Kabat
`numbering systems (see the Introduction of 2) for antibody
`light and heavy chains, for TCR alpha and beta chains, etc., go
`hand-in-hand with variability calculations. The locations of the
`CDRs are the theoretically derived positions which can be
`verified experimentally. Indeed, from the first antigen–antibody
`Fab complex (3) to the complexes of TCR, processed peptide
`and MHC class I molecule (4,5), it has been realized that alignment
`of amino acid sequences and variability calculations can be of
`utmost
`importance in understanding how these important
`macromolecules function biologically. Due to the rapid devel-
`opment of genetic and protein engineering methods, mouse
`and rat antibodies have been humanized to treat human
`cancers, viral infections, etc (6). CDRs of selected rodent anti-
`bodies are cut out and glued onto human antibody frameworks
`to minimize rejection by human patients.
`Our predicted CDRs are slightly different from Chothia’s. A
`careful comparison can be found from a hyperlink on our
`website to ‘Andrew’s Antibody Page’ (http://www.biochem.ucl.
`ac.uk/~martin/abs/index.html ).
`Massive amounts of sequence data are being continuously
`published in the scientific literature. It is imperative to collect
`and properly align the sequences so that they can be used by as
`many researchers in this field as possible. We have previously
`published five editions of these sequences (see the Introduction
`of 2). In 1991, the fifth edition (2) consisted of three volumes.
`Currently, the database is more than five times as large. As of
`September 29, 1999, the Kabat database contained 1 599 375
`and 2 517 756 nt for antibody light and heavy chain variable
`regions, respectively, as compared to 272 244 and 418 962 nt
`in 1991. Total numbers of entries, amino acids and bases of
`other categories of sequences can be obtained by using the
`‘Current Counts’ hyperlink on our website. The collection is
`available on our website (http://www.immuno.bme.nwu.edu )
`which is free due to the generous support by various research
`grants from NIH since 1970.
`Finally, numerous scientific papers have cited our database,
`quoting our fourth edition (7), fifth edition (2), or one of our
`more recent papers (8). On our part, we have been analyzing
`
`*To whom correspondence should be addressed. Tel: +1 847 491 7849; Fax: +1 847 491 4928; Email: t-wu@nwu.edu
`
`Ex. 1010 - Page 1 of 5
`
`AMGEN INC.
`Exhibit 1010
`
`

`

`the Kabat Database during the past few years with reference to
`the total numbers of antibody and TCR V-genes, possible
`evolutionary selection processes,
`importance of antibody
`CDRH3s as related to their fine specificities, etc.
`
`KABAT DATABASE
`The Kabat Database may be accessed for searching, sequence
`retrieval and analysis by a few different methods: electronic
`mail, WWW and ftp. The electronic mail interface has been
`available since 1993, the WWW interface since 1995 and
`various formats of the database in electronic format for nearly
`a decade (8). Our data formats, searching tools, output formats
`and database structures have gradually been adopted by other
`immunological databases and interfaces.
`Electronic mail interface
`An electronic mail interface (seqhunt2@immuno.bme.nwu.edu )
`provides a non-interactive method for searching and sequence
`retrieval (9). Sending mail to the server address with the single
`word ‘help’ (no quotes) in the message body returns instructions
`for using the server.
`All sequences classes are searchable and returnable. The
`query format allows making AND/OR/NOT constructed
`restrictions on the database and amino acid and nucleotide
`sequence pattern matching with allowable differences.
`Requests are processed as they are received and depending on
`the network traffic, take ~1–2 min to be searched and returned
`to the sender. The returned format is a fixed-line length record
`of 80 or fewer characters per line for ease in visual inspection
`and processing by user-written scripts or programs. The characters
`are plain text.
`The query format for the sent request consists of two parts.
`The first part contains directives for the server to follow while
`the second part contains specifications of the search. Specification
`of the extent of data returned, the number of documents to
`return, starting document and whether plain ASCII text or
`PostScript should be used in the return format may be entered.
`Further, one can direct the server to return a distribution, the
`variability or unaligned raw data for the search specified.
`The second part of the query contains the search restrictions
`on the database. Words separated by AND and OR may be
`used, as well as searching functions, like nucleotide/amino
`acid pattern matching and positional restriction matching.
`There are basically three steps in translating and performing
`a search on the Kabat Database: generate the question or query,
`translate it into a format the server can recognize and decide on
`the output options desired of the returned matches. For
`example, if matches of mouse kappa light chains of anti-
`phosphorylcholine antibodies are desired,
`the query and
`restriction on the database would be:
`Begin
`@mouse and kappa and phosphorylcholine
`The ‘@’ before mouse tells the server that matches of the
`species mouse are desired, rather than searching through the
`entire database record for instances of the word ‘mouse’. More
`complicated restrictions can be generated using parentheses for
`grouping and the minus sign ‘–’ for NOT. Finding all rat and
`rabbit sequences which are not kappa light chains, and
`returning them as amino acid sequences in PostScript format
`would be constructed as:
`
`Nucleic Acids Research, 2000, Vol. 28, No. 1
`
`215
`
`PSAA
`Begin
`(rat and rabbit) and –kappa
`Pattern matching is interpreted as the second part of an AND
`statement, such that finding all rat and rabbit sequences which
`are not kappa and contain the nucleotide pattern cagtacgtcag
`with three allowable mismatches, would be sent as:
`Begin
`(rat and rabbit) and –kappa [ implicit AND ]
`#NM 3
`cagtacgtcag
`More examples of searching and output options may be found
`in the ‘help’ file returned from the server.
`WWW interface
`The WWW interface (8) to the Kabat Database: http://immuno.
`bme.nwu.edu contains searching and analysis tools as well as
`links to database download sites and other interesting databases.
`Most of the features found in the electronic mail interface are
`found in the WWW interface, as well as other tools. The
`WWW interface is more interactive than the Email and returns
`results faster, depending on the network traffic.
`Searching and analysis tools
`SeqhuntII. This grouping of programs allows searches through
`the annotations and sequence pattern matching of the amino
`acid and nucleotide sequence data with allowable mismatches.
`Like the Email server, restrictions on the database may be
`formulated as AND/OR/NOT constructs. Output extent, output
`format, maximum documents and starting document may be
`specified. Browsing of the output results in HTML format
`allows the user to view the database entries in an easy-to-read
`format. ASCII text may be selected as output for use in user-
`generated scripts and programs. PostScript generation allows
`for printing on a PostScript supporting printer. Sequence
`matching is returned aligned with the target sequence with
`nucleotide or amino acid differences from the database
`sequence displayed in a case change. Since the database
`contains only coding regions of genes and proteins, the query
`sequence should be a portion of the coding region being sought.
`
`Variability. Variability and amino acid distributions of
`sequence groups may be generated for restrictions on the data-
`base. The variability plots are in PostScript format and may
`either be viewed on the screen with an appropriate PostScript
`viewer (e.g. GNU ghostscript or ghostview) or printed to a
`postscript-supporting printer. Plots for human and mouse TCR
`gamma and delta chain variable regions are shown in Figure 1.
`Scaling of the variability plots may be done allowing comparison
`of variability plots for different groupings of sequences.
`Distributions of the amino acids per position may be returned
`also, including the calculated variability for each position.
`
`Sequence alignment. Alignment of user-entered coding regions
`of
`immunoglobulin light chains according to the Kabat
`numbering system can be performed. Because of the relatively
`few alignment options available for
`light chains, most
`sequences can be aligned. One can start with around 10 amino
`acid residues or 30 nt. There is no lower limit on the length of
`sequence to be matched. In some cases though, visual inspection
`and alignment is necessary, as is for heavy chain alignment,
`
`Ex. 1010 - Page 2 of 5
`
`

`

`216
`
`Nucleic Acids Research, 2000, Vol. 28, No. 1
`
`Figure 1. Variability plots for human and mouse TCR gamma and delta chain variable regions, using 377 human gamma, 1260 human delta, 297 mouse gamma
`and 461 mouse delta partial and complete sequences.
`
`especially within the CDRH3 region, if additional codons or
`residues are inserted and denoted by ‘#’. If a suitable alignment
`counterpart from the database is not found for the target
`sequence, the user can contact us.
`
`FTP. Various formats of the database are available for down-
`load from NCBI’s repository under the directory ‘kabat’.
`Currently active formats include a FASTA-like raw sequence
`format and the database’s fixed length format of 80 or fewer
`
`characters per line and vertical alignment. Four main variations
`on the fixed length format exist to properly visually display
`single translations, pseudogene translations, J-minigenes and
`D-minigenes. Other immunological databases have adopted
`similar formats as exemplified by the three letter code amino
`acid translation followed by single letter code. A ‘dump’
`version of the database is periodically updated which contains
`unlimited line length records more suitable for mass
`processing on unix-based systems.
`
`Ex. 1010 - Page 3 of 5
`
`

`

`Nucleic Acids Research, 2000, Vol. 28, No. 1
`
`217
`
`Figure 2. Length distributions of CDR3s of human and mouse TCR gamma and delta chains, based on 135 human gamma, 546 human delta, 37 mouse gamma and
`66 mouse delta complete CDR3 sequences.
`
`OTHER APPLICATIONS
`As mentioned before,
`the Kabat Database was initially
`constructed for
`the purpose of
`identifying the antibody
`combining site (1). Starting from aligned amino acid sequences
`and using variability calculations, we have identified CDRs of
`antibody light and heavy chains, as well as those of TCRs.
`Such calculations can also provide useful predictions for MHC
`class I and II sequences (8), and to other aligned proteins
`sequences, e.g. HIV gp120, gp41, etc.
`The importance of CDRH3 to confer fine specificity to anti-
`bodies was realized a few years ago (10). Furthermore, the
`unique CDRH3 nucleotide sequences have recently been used
`as a sensitive diagnostic test to detect residue B cell malignancies
`in cancer patients. Thus, many of these sequences have been
`determined. But most of them have been excluded from
`GenBank due to their relative short lengths. We have been
`meticulously collecting them, and realized the importance of
`their length distributions in antibodies of various specificities
`(11), and possible differences between CDRH3s of human and
`mouse (12). In the case of rabbit, the CDRH3s have less length
`variation than human and mouse. This may be compensated by
`the length variations of the CDRL3s (13).
`
`The length variations of TCR alpha and beta chain CDR3s
`are very restricted (14). On the other hand, TCR gamma and
`delta chain CDR3s have more length variation, close to those
`of antibody heavy chains (Fig. 2). Whether they bind antigens
`directly is unclear.
`During recent years, various research groups have decided to
`sequence the entire coding region of different antibody and
`TCR V-genes, in order to have an idea of their total numbers. On
`the other hand, we have discovered that pair-wise comparisons of
`V-gene nucleotide sequences in the Kabat Database provide
`very accurate estimations of their total numbers (15,16). In
`addition, such comparisons seem to suggest that antibody and
`TCR V-genes have evolved under different selective pressures
`(17). In the case of MHC class I sequences, comparison of their
`aligned sequences has elucidated a new mechanism of generating
`novel MHC class I molecules by random assortment of their a1
`and a2 gene segments (18).
`
`DISCUSSION
`The Kabat Database has been around for 30 years. It has
`provided the immunology community a useful service, since it
`
`Ex. 1010 - Page 4 of 5
`
`

`

`218
`
`Nucleic Acids Research, 2000, Vol. 28, No. 1
`
`not only is a sequence database but also incorporates vital
`aspects of the biology of the immune system. Various analytical
`methods have been developed to study the structure and function
`relations of proteins of immunological interest.
`Electronic addresses:
`http://immuno.bme.nwu.edu
`seqhunt2@immuno.bme.nwu.edu
`Citing the Kabat Database:
`Authors using this database may cite this paper together with
`the electronic addresses.
`
`ACKNOWLEDGEMENT
`Supported in part by NIH Grant 5 R24 AI25616-10.
`
`REFERENCES
`1. Wu,T.T. and Kabat,E.A. (1970) J. Exp. Med., 132, 211–250.
`2. Kabat,E.A., Wu,T.T., Perry,H., Gottesman,K. and Foeller,C. (1991)
`Sequences of Proteins of Immunological Interest, Fifth Edition. NIH
`Publication No. 91-3242.
`3. Amit,A.G., Mariuzza,R.A., Phillips,S.E.V. and Poljak,R.J. (1986)
`Science, 233, 747–753.
`
`4. Garcia,K.C., Degano,M., Stanfield,R.L., Brunmark,A., Jackson,M.R.,
`Peterson,P.A., Teyton,L. and Wilson,I.A. (1996) Science, 274, 209–219.
`5. Garboczi,D.H., Ghosh,P., Utz,U., Fan,Q.R., Biddison,W.E. and
`Wiley,D.C. (1996) Nature, 384, 134–141.
`6. Jones,P.T., Dear,P.H., Foote,J., Neuberger,M.S. and Winter,G. (1986)
`Nature, 321, 522–525.
`7. Kabat,E.A., Wu,T.T., Reid-Miller,M., Perry,H. and Gottesman,K. (1987)
`Sequences of Proteins of Immunological Interest, Fourth Edition. US
`Govt. Printing Off. No. 165-492.
`8. Johnson,G., Kabat,E.A. and Wu,T.T. (1996) In Herzenberg,L.A.,
`Weir,W.M., Herzenberg,L.A. and Blackwell,C. (eds), Weir’s Handbook
`of Experimental Immunology I. Immunochemistry and Molecular
`Immunology, Fifth Edition. Blackwell Science Inc., Cambridge, MA,
`pp. 6.1–6.21.
`9. Johnson,G., Wu,T.T. and Kabat,E.A. (1995) In Paul,S. (ed.),
`Antibody Engineering Protocols. Humana Press, pp.1–15.
`10. Kabat,E.A. and Wu,T.T. (1991) J. Immunol., 147, 1709–1719.
`11. Johnson,G. and Wu,T.T. (1998) Int. Immunol., 10, 1801–1805.
`12. Wu,T.T., Johnson,G. and Kabat,E.A. (1993) Proteins, 16, 1–7.
`13. Sehgal,D., Johnson,G., Wu,T.T. and Mage,R.G. (1999) Immunogenetics,
`50, 31–42.
`14. Johnson,G. and Wu,T.T. (1999) Immunol. Cell Biol., 77, 391–394.
`15. Johnson,G. and Wu,T.T. (1997) Genetics, 145, 777–786.
`16. Johnson,G. and Wu,T.T. (1997) Immunol. Cell Biol.,75, 580–583.
`17. Johnson,G. and Wu,T.T. (1997) J. Mol. Evol., 44, 253–257.
`18. Johnson,G. and Wu,T.T. (1998) Genetics, 149, 1063–1067.
`
`Ex. 1010 - Page 5 of 5
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket