`
`SEQUENCES OF
`
`PROTEINS OF
`
`HVIMUNOLOGICAL
`
`INTEREST
`
`FOURTH EDITION
`
`Tabulation and Analysis of
`Amino Acid and Nucleic Acid Sequences of
`Precursors, V-Regions, C-Regions, J-Chain,
`T—Cell Receptor for Antigen, T-Cell Surface Antigens,
`/32—Microgiobulins, Major Histocompatibility Antigens,
`Thy-1, Complement, C-Reactive Protein, Thymopoietin,
`Post-gamma Globulin, and a2-Macroglobulin
`
`1987
`
`Elvin A. Kabat*, Tai Te Wu T, Margaret Reid-Miller *,
`Harold M. Perry 1, and Kay 8. Gottesman i
`
`* Depts. of Microbiology, Genetics and Development, and Neurology, Cancer Center!
`institute of Cancer Research, College of Physicians and Surgeons, Columbia University, New
`York, NY 10032 and the National institute of Allergy and Infectious Diseases, Bethesda, MD
`20892.
`-
`
`T Depts. of Biochemistry, Molecular Biology, and Cell Biology, and Engineering Sciences and
`Applied Mathematics and Biomedical Engineering, Northwestern University, Evanston, lL 60201
`and the Cancer Center, Northwestern University Medical School, Chicago, lL 60611
`1: Bolt Beranek and Newman |nc., Cambridge, MA 02238
`
`The collection and maintenance of this data base is sponsored through Contract N01-RR-8-2158
`by the following components of the National institutes of Health, Bethesda, MD 20892:
`Division of Research Resources
`National Cancer institute
`National institute of Allergy and Infectious Diseases
`National institute of Arthritis, Diabetes, Digestive and Kidney Diseases
`National institute of General Medical Sciences
`
`\_
`"’"”9\
`
`‘
`
`US. DEPARTMENT OF HEALTH
`AND HUMAN SERViCES
`
`Public Health Service
`National institutes of Health
`(1987)
`
`
`Page 1
`Page 1
`
`Mylan v. Genentech
`Mylan V. Genentech
`IPR2016-01693
`IPR2016-01693
`Genentech Exhibit 2025
`Genentech Exhibit 2025
`
`
`
`
`
`.ii!
`
`3.
`
`
`
`(1:/Va.m»_~»,«.,..«.~,u~..WJ»
`
`.wz/aa’-1~v4.'r
`
`
`
`Our listing of sequences will be kept up to date. investigators are invited to send additional
`sequence data when accepted for publication. Send two copies of the manuscript together with a
`letter of acceptance from a journal to:
`Dr. E.A. Kabat
`National institutes of Health
`Building 8, Room 126
`9000 Rockville Pike
`Bethesda, Maryland 20892
`
`It would be extremely helpful if you can send us your sequence data on magnetic tapes or floppy
`diskettes or a clean copy of the sequences. The file formats should be such that they can be read
`by a generic word processor.
`
`When published, three reprints should be provided.
`
`If any published sequences have been overlooked or if any errors are found, please bring them to
`our attention.
`
`Page 2
`Page 2
`
`
`
`vii
`
`INTRODUCTION
`
`Our earlier “Variable Regions of immunoglobulin Chains" (1), the second edition “Sequences
`of immunoglobulin Chains" (2) and the third edition “Sequences of Proteins of Immunological lnterest”
`(3) have been further expanded herein to include amino acid and nucleotide sequences of precursors,
`variable regions, Constant regions, J-Chain, [32-microglobulins, antigens of the major histocompatibility
`complex (HLA, H-2, Ia, DR) as well as of Thy—1 , complement, T-lymphocyte receptors for antigens, other
`T-cell antigens ofthe immunoglobulin superfamily, interleukins and various other proteins related to
`immune functions. The identification and sequencing of clones obtained using recombinant DNA
`techniques has yielded nucleotide sequences of signal, variable, and constant regions of immunoglo-
`bulins (4,5), and these nucleotide sequences have been translated into amino acid sequences. instances
`of the latter have been included in the tables of amino acid sequences and are indicated by an apostrophe
`followed by CL afterthe name of the clone. We have continued to use the PROPHET Computer System
`of the Division of Research Resources, National institutes of Health (6,7) to tabulate the sequences.
`
`in compiling the data we have tried to be as up—to—date as possible and have included only
`sequences which have been published or which have been accepted for publication. Residues which
`have not been definitely determined have been excluded. it should be remembered that sequences
`are often published in review articles without detailed documentary evidence. These have often been
`revised. We have listed such revisions in the notes in many instances; others can readily be found by
`comparison with sequences in previous editions.
`
`Since the preparation of camera-ready copy for printing the pages is carried out in sequence
`from page 1 in batches, the amino acid sequences were set several months before the nucleotide
`sequences. We have continued to include new nucleotide sequences up to the point atwhich camera-
`ready copy for them had to be set, but translated amino acid sequences were not able to be included.
`Thus many nucleotide sequences appear without translation. When antibody activities were known,
`they have been listed at the end of the nucleotide sequences and are included in the index.
`
`When doubts arise as to the validity of any residue in a sequence, the original reference
`should be examined to ascertain whether definitive evidence for the sequence has been provided. We
`have sent the amino acid and nucleotide sequences as stored in the computer to the original authors
`for verification. if so verified, this is denoted by “(checked by author)" at the end of each reference.
`Except for the earliest sequences, the date on which the checked sequence was returned to us is given.
`Whenever possible, nucleotide sequences from GenBank (8) have been used. Programs for converting
`a GenBank sequence to the codon format of our tables have been developed. The correctness of the
`table sequence has been verified by converting back into the linear form and comparing with GenBank.
`When this has been done the sequence is listed as “(from GenBank)”. if the sequences were entered
`by us from the literature and then checked with GenBank, this is indicated by “(checked with GenBank)”.
`We have entered many nucleotide sequences which were not available from GenBank. In general, we
`have not included stretches of sequence such as enhancers, switch regions and introns for which no
`codification of the nucleotide sequences is as yet agreed upon with respect to function, etc. Much
`information about such stretches may be found in references 9, 10.
`
`it is also possible, by examining the numbers of sequences at the end of each table and the
`summary tables, to evaluate the probability that a given amino acid at a given position may not be
`correct. This is most readily done forthe framework residues of the V—region and for the C-region; in the
`complementarity-determining regions this is more difficult because of the high variability.
`
`Amino Acid Sequences
`
`The first column in each table gives the residue number. Except for complement, T-cell surface
`antigens and miscellaneous proteins, the second column is a tabulation of invariant residues. Since
`exceptions to invariance are found, the frequency, if less than 1.0 and greater than or equal to 0.95,
`is indicated alongside the residue listed as invariant; when only a single sequence is available, this
`is not given. Each sequence is tabulated in each subsequent column. Dashes (———) indicate that no
`amino acid is present at that position and that the sequence continues. in all instances residues
`
`Page 3
`Page 3
`
`
`
`viii
`
`considered uncertain by the authors have not been included in the table. in some instances the symbol
`# is used to indicate that several amino acid residues were found in one position, and these residues
`are listed in the notes. The four columns at the end of each table give:
`
`1.
`2.
`3.
`4.
`
`the number of residues sequenced at that position,
`the number of different amino acids found at that position,
`the number oftimes the most common amino acid occurred and that amino acid in parentheses, and
`the variability.
`
`Variability is calculated (11) as:
`
`Variability =
`
`Number of different amino acids occurring
`at a given position
`Frequency of the most common amino acid
`at that position
`
`An invariant position would have a variability of one; if 20 amino acids occurred with equal
`frequency, the variability would be 20 divided by 0.05 equals 400. if, for example, four different amino
`acids Ser, Asp, Pro, and Thr occurred at a given position, and of 100 sequences available at that position,
`Ser occurred 80 times, the variability would be 4/0.8 = 5. When any of the amino acid residues
`sequenced were not identified completely and are listed as Glx (or Asx), two values, separated by a
`colon, are given in the last three columns. The first value in each of these columns is calculated assuming
`that only one ofthe two possibilities, e.g., Glu or Gin (or Asp or Asn) occurred, while the second considers
`that both were present and maximizes variability. in the variability plots, the horizontal bars indicate
`the two values.
`
`When two or more amino acids are most common and occur with equal frequency, they are
`tabulated as a note, and the symbol + is used in the next to last column. If no sequence data have been
`reported for any position, there are no entries in the last four columns. Variability is not calculated for
`insertions or ifonly a single sequence is known. When the translated sequence of a clone corresponds
`to a previously listed sequence of a plasmacytoma from which it was prepared, only one sequence
`is listed so that the variability computations are not affected, and a note is included.
`
`If a given sequence is associated with any antibody activity, this is indicated by an asterisk alongside
`the protein heading, and the antibody specificities are given in a separate list with binding constants
`if available. The notes list the a-allotypes for the rabbit heavy chain V-region and the b-allotypes for
`the constant domain of the rabbit kappa light chain. A key reference to the sequence is given; generally
`the most recent reference since it is usually the most nearly complete, but often several references
`are included, especially when revisions of a sequence have been made. Notes are now of two types;
`general notes about a table indicated by the symbol #, and specific notes indicated by the sequence
`number.
`
`Signal Sequences
`
`The signal (precursor) amino acid sequences of immunoglobulin chains are listed in three tables:
`one for kappa light chains, one for lambda light chains, and one for heavy chains. They were obtained
`either by direct sequencing of signal proteins (12-14) or bytranslating nucleotide sequences from DNA
`clones. Signal segments range from 17-29 amino acid residues in length and are thus numbered from
`~29 to -1 . Genomic DNA clones contain introns of varying length that interrupt the coding sequence
`of the precursor within the codon for position -4, and in rare cases for position -6. Thus, the L—gene
`encodes the leader peptide to position -4 and the 5’ end of the V-gene codes for positions -4 to —1.
`
`The signal amino acid sequences of the T-cell receptors for antigens, /32—microglobulins,
`major histocompatibility complex proteins, and complement components are listed in separate tables.
`
`By conformational energy calculations, the core VK hydrophobic Leu—Leu-Leu—Trp—Val-Leu—Leu—
`Leu (MOPC321, MOPC63) exists in an alpha helical conformation, terminated by chain reversal confor-
`mations in the tour Oterminal residues Trp-Val—Pro—Gly; the four amino terminal residues are compatible
`with the alpha heiix (15).
`
`Variable Region Sequences
`
`The variable regions (16) of immunoglobulins have been shown to contain hypervariable segments
`in their light (11 ,17-23) and heavy (2224-27) chains, of which certain residues have been affinity labeled
`(28-41). Three hypervariable segments of light chain were delineated from a statistical examination
`Page 4
`Page 4
`
`
`
`of sequences of human VK, human VA, and mouse V,C light chains aligned for maximum homology
`(11 ,22). These and the three corresponding segments of the heavy chains (22,26,27) were hypothesized
`(11,22) to be the complementarity-determining regions or segments (CDR) containing the residues
`which make contact with various antigenic determinants, and this has been verified by X-ray diffraction
`studies at high resolution (42-67). The rest of the V-region constitutes the framework (11 ,22,66-68).
`It is convenient to identify the framework segments (FR1, FR2, FR3, and FR4) and the complementarity-
`determlning segments (CDFt1, CDR2, and CDR3) with the three CDRs separating the tour FFts. The
`residue numbers for these segments are as follows:
`
`Light Chain
`
`Heavy Chain
`
`Segment
`
`FFt1
`
`CDR1
`
`FR2
`
`CDR2
`
`1
`
`1-23 (with an occasional
`residue at O, and a
`deletion at 10 in V) chains)
`
`24-34 (with possible
`insertions numbered
`as 27A,B,C,D,E,F)
`
`35-49
`
`50-56
`
`1-30 (with an occasional
`residue at 0)
`
`31-35 (with possible
`insertions numbered
`as 35A,B)
`
`36-49
`
`50-65 (with possible
`insertions numbered
`as 52A,B,C)a
`
`66-94 (with possible
`insertions numbered
`as 82A,B,C)
`
`95-102 (with possible
`insertions numbered as
`100A,B,C,D,E,F,G,H,l,J,K)
`
`103-113
`
`PR3
`
`57-88
`
`CDFl3
`
`FR4
`
`89-97 (with possible
`insertions numbered as
`95A,B,C,D,E,F)
`
`98-107 (with a possible
`insertion numbered
`as 106A)
`
`3 in the rabbit, Mage et al. (69) consider position 65 in VH to be in FR3, since it is allotype related.
`
`in the tables of V-regions, the FR and CDR are separated by horizontal lines for convenience
`in reading. One mouse kappa light chain, MPC11, has an extra segment of 12 amino acid residues
`between position 1 and the signal sequence (70). Several chains have internal deletions.
`
`in the tables, the V-genes for the light chains code to amino acid position 95, and the J-minigenes
`from position 97to107for|ambda and 108 for kappa light chains. Position 96 is usually the site of V-J
`joining by recombination and may be coded partly by the V-gene and partly by the J-minigene. Because
`the site of V—J recombination could occur at different positions within at codon, different amino acid
`residues may result at this position. We have changed the location ofthe inserted residues from 97A-
`F (2) to 95A-F, since it makes for better alignment by confining chains of different lengths to the V-gene
`region. in Vx chains, J1 and J2 were used 5 to 10 times more frequently than J4 and J5 (71).
`
`The V-genes for the heavy chains code up to amino acid position 94 and are followed by the D-
`and J-minigenes. Because of the extensive variation in the lengths of D-minigenes, the exact boundary
`between D and J is not always located at the same amino acid position. In addition, the lengths of the
`J encoded amino acid sequences vary by a few amino acid residues. Moreover, the process of D-J
`joining appears to involve insertions of extra nucleotides between V and D and between D and J, termed
`the N region (72-76) and correlates with the appearance of terminal deoxytransferase in B cells (75).
`The original numbering system for the heavy chains has therefore been retained. Wysocki etal. (76)
`have provided some evidence suggesting a non-random origin for the VH-DH junction, perhaps a
`minigene, rather than random addition of the N nucleotides.
`
`It has become evident that a critical understanding of the architecture of antibody combining sites
`and the genetics of the generation of diversity and of antibody complementarity will depend to a great
`extent on the evaluation of a large number of sequences ofthe variable regions and especially of the
`complementarity-deterrnining segments of light and heaw chains of immunoglobulins of different spe-
`cies. Ability to locate residues in the site making contact with antigenic determinants (77) and to predict
`(67,7882) the structures of antibody combining sites will depend heavily upon such sequences.
`
`Figures 1 and 2 are stereoviews of the mcarbon skeletons of the four Fv regions for which high
`resolution X-ray structures have been determined, NEWM (44), KOL (62), MCPCBOB (47, 48, 63), and
`J539 (64). The residues in the CDRs are shown as solid circles. in Fig. 1 the combining site is at the
`
`Page 5
`Page 5
`
`
`
`NEWM
`
`KOL
`
`MCPC603
`
`FIG. 1. Stereodrawings of the or-carbon skeletons of four Fv regions studied crystallographically. Top to bottom:
`NEWM(43,44,49,59), KOL (62), MCPC603(47, 48, 53, 55, 63), .i539(e4). Coordinates for NEWM, KOL from the
`Protein Data Bank (Bernstein et al. 1977, J. Mol. Biol. 112:532-544); for MCPCBOS and J539 courtesy of David
`R. Davies. The stereodrawings of Figures 1,2,3 and 4 were prepared by Dr. Eduardo Padlan.
`VL is on the left, VH on the right. The first and last residues of each chain as well as several other residues are
`shown as open circles for reference; residues of the CDRS are shown as solid circles. The Fv’s are aligned by
`least-squares superposition using the program ALIGN (G.H. Cohen, NIH); the stereoplots were prepared using
`the program PLUTO (S. Motherwell, Cambridge, England). The view shown is with the combining site at the top.
`With a stereo viewer it is possible to see two adjacent models at the same time, so that a comparison may be made
`in three dimensions.
`
`Page 6
`Page 6
`
`
`
`Page 7
`Page 7
`
`
`
`
`
`xii
`
`REI
`
`LOC
`
`RHE-1
`
`RHE-2
`
`FIG. 3. Stereodrawings of the a-carbon skeletons of four Bence Jones VL dimers studied crystailographicaiiy.
`Top to bottom: MCG(42,46,50), REI (45, 51, 57), LOC(61), RHE(60). The bottom view is of RHE with its twofold
`axis toward the top of the p
`. The left VL is oriented like the VL of the Fv’s in Fig. 1 . The view shown is like that
`of Fig. 1. Coordinates for M
`,FtEi, RHE from the Protein Data Bank; for LOC, courtesy of Dr. Marianne Schiffer.
`
`Page 8
`Page 8
`
`
`
`xiii
`
`MCG
`
`REI
`
`LOC
`
`RHE-1
`
`RHE-2
`
`FIG. 4. Same as Fig. 3. but I 0 ‘ng in
`view of RHE is looking into '
`wofol
`
`e combining site. Models perpendicular to those of Fig. 3. The bottom
`'s.
`
`Page 9
`Page 9
`
`
`
`
`
`xiv
`
`top; the view of Fig. 2 is perpendicular to Fig. 1 and is looking into the combining site. The different
`orientations of the loops containing the complementarity-determining regions provide some insight
`into how specificity of various sites might differ (67,82). It the amino acid side chains were included,
`the differences would become much more detailed.
`
`Figures 3 and 4 are stereoviews of the fourVL dimers, Bence Jones proteins, for which high reso-
`lution X-ray structures are available, MCG (42,46,50), REI (45, 51, 57), LOC (61) and RHE (60). The
`VL chains each contribute four [3-strands to the VL—VL or VH-VL interaction whereas the VH chains each
`provide five (65). Thus, although in a Bence Jones VL dimer one VL assumes the position of VH (50),
`nevertheless the absence of one [3—strand in each VL may make the sites of VL dimers less specific
`than those of the Fab fragments. This is supported by the finding that the VL dimer of MCG binds a
`wide variety of ligands whereas no ligand which binds has yet been found for the MCG protein (83).
`
`A recent high resolution x—ray crystallographic study (84) of a crystalline complex of lysozyme
`with a monoclonal anti~lysozyme shows that contact between lysozyme and antibody occur on a rather
`flat surface with the interactions largely due to protruberances and depressions formed by the amino
`acid side chains producing a tightly packed region of interaction. The lysozyme determinant involves
`two noncontiguous stretches, residues 18 to 27 and 116 to 119 of its polypeptide chain. All six CDFis
`of the antibody and two residues outside the CDRs but adjacent to the CDRS, Tyr 49 in V,_ and Thr
`30 in VH, make contact with the lysozyme. Ten of the 17 contacting residues are in VH. Four of the 10
`contacting VH residues and three of the seven contacting residues in VL are in the corresponding
`CDR3s. Table 1 lists the residues on the anti—lysozyme and on lysozyme which are in contact. These
`findings, if and to the extent applicable to anti—carbohydrate sites, with respect to interactions of side
`chains on essentially flat surfaces could have substantial implications for our understanding of these
`antigen—antibody interactions. However, the J539 site would appear to be some type of groove
`complementary to a tetrasaccharide. Unfortunately thus farthe crystal form has not allowed the ligand
`to enter the site (64). Figure 5 is a stereoview of the lysozyme-antilysozyme carbon skeleton showing
`the region of interaction.
`
`Bence Jones dimer LOC (61) has a convex (male) binding site quite different from the usual Bence
`Jones dimers MCG and REL if such a male type combining site were to be found for an Fab, the
`possibility would have to be considered that a reciprocal type of antigen-antibody interaction might occur
`in which the side chains of the CDRS would fit into a groove or depression on the surface of an antigen.
`The possibility has been noted that interactions might occur with the CDRs of the two faces of the
`projecting convex site (61). RHE also has a quite distinct type of binding site based on a unique VL-
`VL interaction. The basis for such differences is not understood but could contribute a new parameter
`to site complementarity and diversity.
`
`The sequence data may be used to make rough screens of a new sequence for homology with
`the V-region. lfthe sequence to be compared is aligned with the large V— region summary tables, one
`can ascertain whether any homology exists. If homology involves the less frequently occurring residues,
`they can be found in the individual tables and homology evaluated.
`
`The variable region a—group allotypes and allotype a-negative rabbit VH chains have been
`correlated with certain amino acids in FRi and FR3 as follows (69):
`
`Allotype
`
`Amino Acid Position
`
`FR1
`
`5
`
`8
`
`to
`
`12
`
`13
`
`16
`
`VHa1
`
`glu
`
`gly
`
`ARG val
`
`THR thr
`pro
`gly
`
`17
`
`pro
`gly
`ser
`
`65
`
`gly
`
`67
`
`70
`
`phe
`
`ser
`
`71
`
`lys
`
`FR3
`
`74
`
`thr
`
`75
`
`Ha
`
`76
`
`[—]
`
`84
`
`85
`
`87
`
`THFi GLU thr
`
`VH a2
`
`LYS GLU gly
`
`PHE lys
`
`ASP THR
`
`SER SER THR ARG ASN GLU asn ala
`GLN thr
`ser
`GLY ala
`ala
`ala
`
`glu
`VH a3”
`VHa1OO glu
`
`VHa—
`
`glu
`val
`
`gly
`gly
`
`gly
`
`ASP val
`gly
`val
`
`gly
`
`val
`
`lys
`gin
`
`gin
`
`ala
`ala
`
`gly
`glu
`thr
`
`ser
`ser
`
`gly
`ser
`
`gly
`
`gly
`
`phe
`thr
`
`phe
`
`ser
`ser
`
`ser
`
`lys
`lys
`
`ser
`
`thr
`ser
`
`ala
`
`[—l
`[-1
`
`H ala
`H
`?
`
`gin
`
`asn ala
`
`ala
`?
`
`ala
`
`thr
`MET
`
`thr
`
`3 Square brackets indicate gaps to maximize homology. Allotype related residues are in capitals.
`
`b In some a3-like genes, codons for amino acids 75 and 76 were found (85).
`
`Page 10
`Page 10
`
`
`
`XV
`
`in Figs. 1 and 2, the location of the allotypic regions may clearly be seen to be on the outside of
`VH away from the combining site. Residues 13 and 65 of VH are numbered and will facilitate location
`of the VHa allotypes. The few CDNA sequences (69) available have provided no evidence as yet that
`germ line sequences encoding latent allotypes may exist in some rabbits. Antisera to rabbit VHa
`allotypes crossreact with human |gG (86), various other species of IgM and lgG, and with the Galapagos
`shark 7S immunoglobuiin and correlate with the N~terminal amino acid sequence (87).
`
`There are substantial species differences between the human, rat and rabbit Cx allotypes.
`The amino acid sequences of rabbit CK allotypic determinants K-1 , b4, b5 and b9 differ at 47 of 106
`positions, the differences occuring in clusters; the K-2 bas isotype differs at three additional positions
`(88) whereas the human CK allotypes differ by two positions (89) and the rat R1—1a and R1-1b differ at
`11 positions (90).
`
`TABLE 1
`
`Antibody Residues involved in Contact with Lysozyme
`[
`Antibody residues
`Lysozyme residues in Contact
`
`Light Chain
`CDR1
`
`FFt2
`CDR2
`CDR3
`
`Heavy Chain
`FFl1
`CDR1
`
`CDR2
`
`CDR3
`
`His
`Tyr
`
`30
`32
`
`49
`Tyr
`50
`Tyr
`Phe 91
`Trp
`92
`Ser 93
`
`Thr
`Gly
`Tyr
`
`30
`31
`32
`
`52
`Trp
`53
`Gly
`Asp 54
`
`Leu 129
`Leu 25, Gin 121, lle 124
`
`Gly 22
`Asp 18, Ash 19, Leu 25
`Gin 121
`G|n121, lle 124
`Gln 121
`
`Lys 116, Gly 117
`Lys 116, Gly 117
`Lys 116, Gly 117
`
`Gly 117, Thr 118, Asp 119
`Gly 117
`Gly 117
`
`)
`99 (96
`Arg
`97)
`Asp 100 (
`Tyr 101 (98)
`99)
`
`Arg 21, Gly 22, Tyr 23
`Gly 22, Tyr 23, Ser 24, Ash 27
`Thr 118, Asp 119, Val 120, Gln 121
`Asn19,G|y22
`
`Sequence positions are numbered as in this book except for VH CDR3, where the numbers are
`given in parentheses; the others are sequential. (From (84). Amit, Mariuzza, Phillips, and Poljak (1986)
`Science 233:747-753; courtesy of Dr. Roberto Poljak and Science. Copyright 1986 by the AAAS.)
`
`
`
`FIG. 5. Stereo diagram of the Ca skeleton of the complex. Fab is shown (upper right) with the heavy and light
`chains with thick and thin bonds, respectively. The Iysozyme active site is the cleft containing the label HEL.
`Antibody-antigen interactions are most numerous between lysozyme and the heavy chain CDR loops. (From (84).
`Amit, Mariuzza, Phillips, and Poljak (1986) Science 233747-753; courtesy of Dr. Roberto Poljak and Science.
`Copyright 1986 by the AAAS.)
`
`Page 11
`Page 11
`
`
`
`xvi
`
`It has proven extremely useful, except for mouse V,[ chains, to order the VL and VH sequences
`into sets (68) such that all chains with identical FR1 are listed together, the set with the most members
`being listed first. Chains differing in sequence from this set by a single residue are then listed in order
`of substitution, beginning at residue 22 for light chains and residue 30 for heavy chains and proceeding
`in decreasing position number to residue 1. These are then followed by chains with two amino acid
`differences, again listing in the same decreasing order, and followed by chains with three amino acid
`substitutions, etc. Amino acid residues differing from the major FR1 sequence are given in lower case
`letters, so that one can readily see the pattern of substitution. In this ordering procedure, missing residues
`are treated as potentially different from the main sequence. if residues are missing at position 23 in
`the light chain and position 22 in the heavy chain, they are assumed to be Cys to preserve the essential
`V-domain structure. Finally, sequences which are incomplete in FR1 are given. Within a given FR1
`set, identical FR2 sets are also listed together.
`
`The human Vx rearranged and germ—line genes of all four subgroups have been sequenced
`(91-95). Human V,ClV has but a single germ-line gene (92, 93); thus somatic mutation must play a
`dominant role in the utilization of this gene. Vxll genes are characterized by a much longer intron between
`the signal and V-region (94) than the other subgroups. Unlike the mouse the human V,Cl,V,(l|, and V,(lll
`genes are not separated in the genome, but a large section of the Vx locus has been duplicated; both
`sections existing as two non-allelic clusters, containing eight and six genes of all three subgroups (91 ,95).
`All genes are in the 5’—>3’ orientation. These findings have necessitated reclassification of some
`incomplete sequences (RPMI-6410’CL).
`
`The tables of mouse Vxlight chains have been rearranged (96,97). In previous editions(1—3), mouse
`Vx light chains were listed in one table with the length from residue 1 to Trp 35 specified. They have
`now been separated into eight tables. The first six tables vary from 41 to 34 residues by the different
`lengths of CDR1 (residues 27A—F); the sixth also lacks residue 28 and the seventh is also missing residue
`22 in FR1. if residues 1 through 35 have not been determined completely, the sequences are listed
`in the eighth table, unless they show good homology to more complete sequences in one of the other
`tables. in each table, the group number is given below the name of the chain. When residues 1 through
`35 have not been determined, the earlier group designation based on residues 1-23 (96) is given in
`square brackets [ ]. However, for each table the same principles have been used in ordering the chains.
`In all instances, residues 1-35 for the largest group are given in capitals. Variations from this sequence
`are in lower case, beginning at residue 34 and proceeding toward residue 1 as in the other tables.
`
`The mouse VH sequences have been revised to take into account their division into families by
`Dildrop (98) based on amino acid sequences and of Brodeur and Riblet (100) based on nucleotide
`sequences of completely sequenced V-regions. We have however retained the earlier classification
`of VH chains into subgroups but have subdivided the subgroups to list the families as follows:
`
`Antibody specificities found
`
`Dildrop (98)
`Dildrop et at (99)
`
`FamilyBrodeur and Rlblet
`(100, 101)
`Winter et al (102)
`
`Subgroup IA
`B
`
`Ars, DNP, HEL, DIG, poly—GA
`2-PHEOX, HEL
`
`Subgroup ll A
`
`a(1»3)a(1»6)DEX, RNA, Ars, a(1»6)DEX, DlG,
`HEL, ldAc38, GAT
`
`B
`
`C
`
`NP, GAT, DlG, PC, |dAc38, 2-PHEOX
`
`GAT, GA, H-2K-k
`
`Subgroup lll A
`
`PC, DNP, HEL, DNA
`
`B
`
`C
`D
`
`{i(1»6)GAL, a(1—>6)DEX, NAcl\/IAN, [3(2-+6)FRU, GAT,
`STR-A({3—DGlcNAc)
`
`[3(2—>1)FFtU, STR—A, SRBC, GAT
`DIG. H~2K-l<, SRBC
`
`Subgroup v A
`B
`
`ARS CRl +, ARS car.
`ARS, AFKS CRl +
`
`'
`
`Miscellaneous
`
`ARS, DNA, HEL, H—2K~k, GAT, DlG, 2—PHEOX, CEA
`
`3
`2
`8
`
`9
`
`1
`
`7
`
`4
`
`6
`1
`
`‘l
`
`5
`
`VH36~6O
`VHQ52
`vH3eo9,
`MV31
`
`VHJ558
`Vl\/lU—1,
`VGAM3—8
`
`VHJ606
`
`VHS1O7
`
`VHJ558,
`VHX24
`
`VHJESOG
`J558
`
`.1558
`
`VH7183
`
`For newly sequenced VH regions, the nucleotide or amino acid sequence homologies have been
`compared with the previously classified sequences.
`
`Page 12
`Page 12
`
`
`
`It is evident that antibody specificities for a given antigen fall into several of these subgroups and
`that many framework residues which are characteristic of the subgroups are sufficiently different to
`indicate that different germ-line genes may give rise to antibodies of a given specificity (103).
`
`The classification given is in better accord with the amino acids (98) than with the nucleotides
`(100), since the latter used probes whereas the former was based on complete sequences. Two rabbit
`germ-line a-negative VH clones showed the greatest nucleotide sequence homology, 70.4-79.4% to
`VH X24, VH 7183, VH J606, and VH S107 (100) and considerably less homology to the other families.
`
`Each table of mouse VI and VH chains is followed by a list of strains other than BALB/c.
`
`The members of identical FR and identical CDR sets are given in the notes. Members of individual
`FR1 sets may be associated with different FR2 sets, etc. (68), suggesting independent assortment of
`FR sets.
`
`A sequence identical to the FR2 sequence of the light chain of McPC603 has been found in two
`human Vxl, one human V,C IV, 31 mouse Vx (14 NZB and 17 BALB/c), one each mouse Vxl and Vx
`VI, and 15 rabbit Vx sequences, and thus has been preserved for about 80 million years (104). It and
`the corresponding loop ofthe heavy chain are seen at the bottom center of each of the four stereo figures
`(Fig. 1); L40 and H41 are numbered to facilitate its location. Despite its preservation, there are 12 other
`FR2 sets in the mouse and 8 in the rabbit with sequence variation which may involve 13 of the 15 posi-
`tions, only Trp 35 and Gln 39 being invariant. The loop is in a relatively open position so that substitutions
`are readily permitted (104,63). The evidence of assortment of FR segments (68) suggested the
`hypothesis that the V—region was coded for by sets of minigenes for the FR and CDR segments and
`that these minigenes were assembled somatically during embryogenesis. FR2 of VH also shows
`substantial preservation, one set having six mouse and eight rabbit chains of identical amino acid
`sequence. It is extraordinary that one human VHll| genomic clone, VH26 (105), and a rabbit cDNA clone
`(106), were identical in nucleotide sequence of the codons for amino acids 36 through 47, differences
`being seen in codons for amino acids 48 and 49.
`
`Since only FR sets were used in demonstrating the assortment (68), it would be independent of
`and would be seen whether or not any CDR residues assorted with any FR. The early cloning studies
`(4,5) showed that the genes coding for residues comprising FR1 almost through CDR3 of mouse V) and
`V,C light chains were assembled in twelve—day-old mouse embryo DNA, and each was followed by an
`intervening sequence; in one V)l clone, two residues of CDR3 were included with FR4 (107). In an adult
`V) myeloma, the genes coding for the entire V—region were assembled (108). Thus, the minigene coding
`for FR4 plus the last two residues of CDR3 (107), termed the J segment (108), had been joined to the
`rest of the V—region between the twelfth day of embryonic life and the adult, and thus was added somat-
`ically by recombination. Milstein (17), in his original description of human Vx groups, had pointed out that
`subgroup associated residues extended only through residue 94 and that it seemed as if frequent cross-
`ing over occurred beyond residue 94. Weigert etal. (109), in studying the Vx 21 group in NZB myelomas,
`assorted the last two or three residues of CDR3 together with FR4 and suggested that this would
`contribute to the generation of diversity. With rabbit light chains, it was possible to assort the individual
`FR and CDR segments considering FR4, plus the last two residues of CDR8, as a J—minigene (110).
`
`The subsequent demonstration and sequencing o