`
`205
`
`Evolutionary families of peptidases
`Neil D. RAWLINGS and Alan J. BARRETT
`Department of Biochemistry, Strangeways Research Laboratory, Worts Causeway, Cambridge CB1 4RN, U.K.
`
`The available amino acid sequences of peptidases have been
`examined, and the enzymes have been allocated to evolutionary
`families. Some of the families can be grouped together in 'clans'
`that show signs of distant relationship, but nevertheless, it
`appears that there may be as many as 60 evolutionary lines of
`
`peptidases with separate origins. Some of these contain members
`with quite diverse peptidase activities, and yet there are some
`striking examples of convergence. We suggest that the classi-
`fication by families could be used as an extension of the current
`classification by catalytic type.
`
`INTRODUCTION
`Amino acid sequence data are now available for over 600
`peptidases (endopeptidases, exopeptidases and omega pepti-
`dases), and we have examined these in an attempt to establish
`what separate evolutionary lines exist. These take the form of
`families, or groups of related families ('clans'). The properties of
`the peptidases of each family have been considered from two
`main points of view. Firstly, we have asked how widely the
`enzymes have diverged in catalytic activity, and, secondly, we
`have asked to what extent peptidases from separate evolutionary
`lines have converged in properties. Finally, we have considered
`how compatible is a classification of peptidases based on their
`evolutionary relationships with the sort of classification that is
`currently in use, which depends upon the reaction catalysed by
`each enzyme and on the catalytic mechanism.
`
`METHODS
`Sources of data
`Protein sequence data were obtained from the SwissProt database
`[1] (release 21), and the PIR-Protein database [2] (release 32), and
`nucleic acid sequence data from the EMBL database [1] (release
`28 and daily updates). In addition, some sequences were obtained
`directly from the literature.
`
`Detection of evolutionary relationships
`The programs FASTP [3] and FASTA and TFASTA [4] were
`used to detect similarities between peptidases, and, on the basis
`of these, provisional assignments to a system of families was
`made. These assignments were refined by manual construction of
`optimized alignments. In many cases, the similarities between the
`sequences were so close that no further analysis was felt necessary,
`but whenever the similarity was questionable, the RDF program
`[3] was applied. This tests the statistical significance of a similarity
`between amino acid sequences by comparing the score for the
`alignment with those of random shuffles of the sequences. We
`took the value of six standard deviation units as that above which
`the similarity could be regarded as being significant. We assume
`that the significant similarities reflect evolutionary relationship,
`or homology as defined by Reeck et al. [5].
`
`Definition of terms
`The term type is used to refer to a set of peptidases distinguished
`according to the chemical groups responsible for catalysis, as in
`serine-type, cysteine-type, aspartic-type or metallo-type. The
`
`termfamily is used to describe a group of enzymes in which each
`member shows evolutionary relationship to at least one other,
`either throughout the whole sequence or at least in the part of the
`sequence responsible for catalytic activity. As an example of the
`is a chimaeric
`need for this, bone morphogenetic protein 1
`protein that contains a catalytic domain related to that of
`astacin, but also contains segments that are clearly homologous
`with non-catalytic parts of Clr and Cls in the chymotrypsin
`family [6]. We place bone morphogenetic protein 1 in the family
`of astacin and not in that of chymotrypsin.
`A clan comprises a group of families for which there are
`indications of evolutionary relationship, despite the lack of
`statistically significant similarities in sequence. Such indications
`of distant relationship come primarily from the linear order of
`catalytic-site residues and the tertiary structure. Distinctive
`aspects of the catalytic activity such as specificity or inhibitor-
`sensitivity may also contribute occasionally.
`The symbol ' +' is used to indicate the scissile bond in a
`peptidase substrate.
`
`RESULTS AND DISCUSSION
`All of the amino acid sequences of peptidases that were available
`to us in July 1992 were examined for significant similarities as
`described in the Methods section, and grouped in families (Table
`1). Some of the families show evidence of distant relationships to
`others, and these we group together in single 'clans'; others seem
`quite unrelated.
`
`Serine peptidases
`Most of the members of the chymotrypsin (SI) family are endo-
`peptidases, which differ widely in specificity. No exopeptidase
`is known in this family, but it does contain several proteins that
`lack all peptidase activity: azurocidin, procarboxypeptidase A
`complex component III, the haptoglobins, apolipoprotein a,
`hepatocyte growth factor and protein Z. The family includes
`many enzymes of the coagulation, fibrinolysis and complement
`systems that are found in blood plasma, and these are mostly
`chimaeric proteins with modules, some of which are also found
`in other proteins, inserted N-terminally to the site of proteolytic
`activation [27].
`Almost all of the known members of the chymotrypsin family
`have been found in animals, the only exceptions being two
`trypsins from actinomycetes. It is striking that no member of this
`otherwise very successful family has been encountered in proto-
`zoa, fungi or plants.
`The linear order of catalytic triad residues in the polypeptide
`
`AstraZeneca Exhibit 2094
`Mylan v. AstraZeneca
`IPR2015-01340
`
`Page 1 of 14
`
`
`
`206
`
`N. D. Rawlings and A. J. Barreff
`
`Evolutionary families of peptidases
`Table 1
`The peptidases are allocated to families as described in the text. Clans and families are labelled with the prefix S for serine peptidases, C for cysteine, A for aspartic, M for metallo- and U for
`unknown, and listed in this order. It should be noted, however, that these labels are temporary, simply being assigned consecutively through the Table. 'EC' is the enzyme nomenclature number
`[7], but for peptidases the initial '3.4.' has been omitted; '-' indicates that no EC number has been assigned; 'n.a.' indicates that the protein is not known to be an enzyme. Literature references
`to the individual proteins are generally to be found in the database entries for which the codes are given. Most of the codes are from the Swiss-Prot database (release 21), but a code in parentheses
`is an EMBL database accession number and 'PIR' indicates a code from the PIR database. Numbers in square brackets are references to sequences from journal articles. For some viral sequences,
`the code given is that of the viral polyprotein. For some viruses, numerous variants with only minor differences exist, and only a single example of each has been included.
`
`EC
`
`Database code
`
`SERINE PEPTIDASES
`
`Family SI: Chymobypsin
`Trypsin (includes forms 1, 11, 111, IV
`Va and Vb)
`
`(Clan SAt His, Asp, Ser catalytic triad)
`TRYP_SACER, TRYP_STRGR, TRYP ASTFL, TRYP_DROME,
`21.4
`TRYP_SQUAC, TRYP_XENLA, TRYP_BOVIN, TRYl_CANFA,
`TRY2_CANFA, TRYl_HUMAN, TRY2_HUMAN, TRY3_HUMAN,
`TRYP_MOUSE, TRYP_PIG, TRYl_RAT, TRY2_RAT, TRY3_RAT,
`TRY4_RAT, (M77814), (X59012), (X59013)
`Cercarial elastase (Schistosoma)
`CERC SCHMA
`-
`Brachyurin
`21.32 COGS_UCAPU
`Factor C (Limulus)
`(D90271)
`-
`Proclotting enzyme (Tachypleus)
`PCE TACTR
`-
`easter gene product (Drosophila)
`EAST DROME
`-
`snake gene product (Drosophila)
`SNAK DROME
`-
`Vtellin-degrading endopeptidase Bomi
`byx)
`[8]
`-
`Hypodermin C
`21.49 COGS_HYPU
`Serine proteases 1 and 2 (Drosophila)
`SER1 DROME
`-
`Achelase (Lonomia)
`ACH1 LONAC, ACH2 LONAC
`-
`CTR2_VESCR, CTR2_VESOR, CTRA_BOVIN, CTRB_BOVIN,
`Chymotrypsin (includes forms A, B, 11 a
`3nd 2)21.1
`CTR2_CANFA, CTRB_HUMAN, CTRB_RAT
`RWA VIPRU, RWG_VIPRU
`
`Proteinase RW-V (Russell's viper)
`(includes forms a and y)
`Flavoboxin (habu snake)
`Venombin A
`Crotalase
`Enteropeptidase
`Acrosin
`Seminin
`Tissue kallikrein
`
`Renal kallikrein
`Submandibular kallikrein
`
`7S nerve growth factor (includes a and
`y chains)
`Epidermal growth factor-binding protein
`(includes forms 1, 2 and 3)
`Tonin
`Arginine esterase
`Pancreatic elastase I
`Pancreatic elastase II (includes forms A
`and B)
`Pancreatic endopeptidase E (includes
`forms A and B)
`Leukocyte elastase
`Medullasin
`Azurocidin
`
`-
`
`FLVB_TRIFL
`BATX_BOTAT, PTCA_AGKCO
`21.74
`21.74
`[91
`21.9
`[1 0]
`21.10 ACRO_HUMAN, ACRO_MOUSE, ACRO_PIG
`PROS_HUMAN
`-
`21.35 KAG2_CAVPO, KAG1_HUMAN, KAG2_HUMAN, KAG_PIG,
`KAGP_RAT
`21.35 KAGR_MOUSE, (Xl 7352)
`21.35 KAG1_MOUSE, KAG2_MOUSE, KAG3_MOUSE, KAG5_MOUSE,
`KAGB_MOUSE, KAG1_RAT, KAG3_RAT
`21.35 NGFA_MOUSE, NGFG_MOUSE
`
`21.35 EGBA_MOUSE, EGBB_MOUSE, EGBC_MOUSE
`
`21.35 TONI_RAT
`21.35 ESTA CANFA
`ELi_PIG, ELl_RAT, (M27347)
`21.36
`EL2A_HUMAN, EL2B_HUMAN, EL-2MOUSE, EL2_PIG,
`21.71
`EL2_RAT
`21.70 EL3A_HUMAN, EL3B_HUMAN
`
`21.37 ELNE_HUMAN
`ELNE_HUMAN
`-
`CAP7_HUMAN, CAP7_PIG
`n.a.
`
`Page 2 of 14
`
`
`
`Peptidase families
`
`207
`
`Cathepsin G
`Proteinase 3 (myeloblastin)
`Chymase (includes forms I and 11)
`
`y-Renin
`Tryptase (includes forms 1, 2 and 3)
`
`Hepsin
`Granzyme A
`Natural killer cell protease 1
`Granzymes B, C, D, E, F, G and Y
`
`Carboxypeptidase A complex component Ill
`Complement factor D
`Complement factor B
`Complement factor I
`Complement component CTF
`Complement component CTs
`Calcium-dependent serine proteinase
`Complement component C2
`Haptoglobin (includes forms 1 and 2)
`Htaptoglobin-related protein
`Plasmin
`
`Apolipoprotein(a)
`Hepatocyte growth factor
`Thrombin
`t-Plasminogen activator
`u-Plasminogen activator
`
`-
`-
`-
`-
`
`Table 1 (contd.)
`21.20 CATG_HUMAN
`MELB HUMAN, PTN3 HUMAN
`-
`21.39 MCP1_CANFA, TRYM_CANFA, MCP1 _MOUSE, MCP2_MOUSE,
`MCP1_RAT, MCP2_RAT, MCP4_MOUSE, (M69136), (M73759)
`21.54 RENG_MOUSE
`21.59 TRYT CANFA, TRYA_HUMAN, TRYB_HUMAN, (M33493), (M30038),
`MCP6_MOUSE
`HEPS_HUMAN
`GRAA HUMAN, GRAA MOUSE, GRAX MOUSE
`NKP1_RAT
`GRAB_MOUSE, GRAC_MOUSE, GRAD_MOUSE, GRAE_MOUSE,
`GRAF_MOUSE, GRAG_MOUSE, GRAB_HUMAN, GRAY_HUMAN
`CAC3 BOVIN
`n.a.
`21.46 CFAD_HUMAN, ADIP_MOUSE
`21.47 CFAB_HUMAN, CFAB_MOUSE
`21.45 CFAI_HUMAN
`21.41 COiR_HUMAN
`21.42 CiS_HUMAN
`CASP_MESAU
`-
`21.43 C02_HUMAN, C02_MOUSE
`HPT1_HUMAN, HPT2_HUMAN
`n.a.
`HPTR_HUMAN
`n.a.
`PLMN BOVIN, PLMN_HUMAN, PLMN_MACMU, PLMN_MOUSE,
`21.7
`PLMN_PIG, (M62832)
`APOA_HUMAN, APOA_MACMU
`n.a.
`HGF_HUMAN, HGF_RAT
`n.a.
`THRB_BOVIN, THRB_HUMAN, THRB_MOUSE, THRB_RAT
`21.5
`21.68 UROT HUMAN, UROT_MOUSE, UROT_RAT
`21.73 UROK_CHICK, UROK_HUMAN, UROK_MOUSE, UROK_PAPCY,
`UROK_PIG
`21.68 UROT DESRO
`
`Salivary plasminogen activator
`(vampire bat)
`21.34 KAL HUMAN, KAL_RAT, (M58588)
`Plasma kallikrein
`Coagulation factor VII
`FA7 BOVIN, FA7 HUMAN
`21.21
`21.22 FA9 BOVIN, FA9_CANFA, FA9_HUMAN, FA9_MOUSE
`Coagulation factor IX
`Coagulation factor X
`FAl 0 BOVIN, FA10_HUMAN
`21.6
`Coagulation factor Xi
`FAl 1 HUMAN
`21.27
`Coagulation factor XlI
`FAl 2 HUMAN
`21.38
`Protein C
`21.69 PRTC BOVIN, PRTC_HUMAN
`PTRZ BOVIN, PRTZ_HUMAN
`Protein Z
`n.a.
`Family S2. a-Lyfic erdoptkiase
`(Clan SA His, Asp, Ser catalytic triad)
`a-Lytic endopeptidase
`21.12 PRLA_LYSEN
`Proteases A and B (Streptomyces griseus)
`PRTA STRGR, PRTB STRGR
`-
`Glutamyl endopeptidase (Strep. griseus)
`-
`[11]
`Family S3: Togairus enxopeptidase
`(Clan SA: His, Asp, Ser catalytic triad)
`POLS_EEEV, POLS_RRVN, POLS_SFV, POLS_SINDV, POLS_WEEV
`Polyprotein peptidase
`-
`Family S4: Glutamyl endopeptidase
`Glutamyl endopeptidase (Staphylococcus)
`Epidermolytic toxins A and B
`(Staphylococcus)
`"Metalloprotease" (Bacillus subtilis)
`Family S5: Lysyl endopeptiase
`Lysyl endopeptidase (Achrornobacter)
`Family S6: IgA-specic endopeptidase
`IgA-specific serine endopeptidase
`
`21.19 STSP STAAU
`ETA_STAAU, ETB_STAAU
`-
`
`-
`
`[12]
`
`21.50 API_ACHLY
`
`21.72 IGA_NEIGO, (X64357)
`
`Page 3 of 14
`
`
`
`208
`
`N. D. Rawlings and A. J. Barrett
`
`Family S7: FMrus en
`Nonstructural protein NS3
`
`idas
`
`Family S8: SubWliin
`Tripeptidyl-peptidase 11
`Subtilisin
`
`Table 1 (contd.)
`
`-
`
`POLG_DEN2J, POLG_JAEVJ, POLG_KUNJM, POLG_MVEV,
`POLG_TBEVS, POLG_WNV, POLG_YEFV1
`(Asp, His, Ser catalytic triad)
`14.10
`(M73047)
`21.62 SUBT BACAM, SUBT_BACLI, SUBT_BACMS, SUBT_BACSA,
`SUBT BACSD, SUBT_BACSU
`ELYA BACSU
`-
`(PIR Si 1504)
`-
`ISP1 BACSU, (D00862), (D10730)
`-
`SUBF BACSU
`-
`NPRE BACAM, NPRE BACSU
`-
`21.66 THET THEVU
`SCPA STRPY
`-
`P1 P_LACLA, P2P_LACLA, P3P_LACLA
`-
`
`-
`-
`-
`
`AQL1 THEAQ
`PRTS SERMA
`PROA_VIBAL
`
`PIR S11890
`-
`21.64 PRTK_TRIAL
`PRTR_TRIAL
`-
`-
`PRTT TRIAL
`(M73795)
`-
`21.63 AEP ASPOR, AEP_YARLI
`(Zl 1580)
`-
`21.48 PRTB_YEAST
`(M77197)
`-
`
`-
`-
`
`PIR JU0332
`PRCA_ANAVA
`
`Alkaline elastase (Bacillus)
`Serine endopeptidase (Bac. subtilis)
`Major intracellular endopeptidase (Bacillus)
`Bacillopeptidase F (Bac. subtilis)
`Neutral endopeptidase (Bacillus)
`Thermitase
`C5a peptidase (Streptococcus)
`Cell-wall associated endopeptidase
`(Lactococcus) (forms PI, Pll, Plil)
`Aqualysin I (Thermus)
`Extracellular endopeptidase (Serratia)
`Calcium-dependent extracellular
`endopeptidase A (Vibrio)
`Extracellular endopeptidase (Xanthomonas)
`Endopeptidase K
`Endopeptidase R (Tritirachium)
`Endopeptidase T (Tritirachium)
`Cuticle-degrading protease (Metarhizium)
`Oryzin
`Alkaline protease (Aspergillus)
`Cerevisin
`Subtilisin-like protease Ill
`(Saccharomyces)
`Alkaline endopeptidase (Acrernonium)
`Calcium dependent endopeptidase
`(Anabaena)
`KEX2_YEAST, KEX1_KLULA
`Kexin
`21.61
`FURI HUMAN, FURI_MOUSE, FURI RAT, (M81431)
`Furin
`-
`NECl_MOUSE, NEC2_HUMAN, NEC2_MOUSE
`Pituitary convertase (includes PC1 and PC2) -
`(Asp, Ser, His or Ser, Asp, His catalytic triad)
`Family S9: Proly oligopeptidase
`DPP RAT, (X60708)
`14.5
`Dipeptidyl-peptidase IV
`DAP2_YEAST
`Dipeptidyl aminopeptidase B (Saccharomyces)-
`ACPH_PIG, ACPH_RAT
`19.1
`Acylaminoacyl-peptidase
`TLP ECOU
`Protease II (Escherichia colt)
`-
`21.26 PPCE_PIG, (M81461), (M61966)
`Prolyl oligopeptidase
`DNF1_HUMAN
`DNF1 552 protein (3p21 protein)
`n.a.
`Family S10: Serne-type carboxypepfidase
`(Ser, Asp, His catalytic triad)
`CBPY_YEAST, (D10199)
`Serine-type carboxypeptidase
`(Saccharomyces)
`Carboxypeptidase B-like peptidase
`Serine-type carboxypeptidase (forms I
`and 111)
`Carboxypeptidase Y-like protein
`(Arabidopsis)
`Serine-type carboxypeptidase
`(Caenorhabditis)
`Serine-type carboxypeptidase (Aedes)
`Lysosomal carboxypeptidase A
`
`16.1
`
`16.1
`16.1
`
`-
`
`-
`
`KEX1 YEAST, CBP2 HORVU, CBP2_WHEAT,
`CBP1_HORVU, CBP3_HORVU, CBP3_WHEAT, (D10985)
`
`(M81130)
`
`(M75784)
`
`-
`16.1
`
`(M79452)
`PRTP HUMAN, PRTP MOUSE
`
`Page 4 of 14
`
`
`
`Peptidase families
`
`209
`
`CLPP_MARPO, CLPP_TOBAC, CLPP_ORYSA, CLPP_WHEAT
`(D00530), (X14600)
`
`DPP_LACLA, DPP_LACLC
`
`21.53 LON_ECOLI, (D00863)
`
`Table 1 (contd.)
`eptidase (gene da) (Clan SB: Ser, Lys, Ser, Glu catalytic tetrad)
`Family Si 1: D-Ala-D-Ala cabo
`DACA_BACSU, DACA_ECOLI, DACC_ECOLI, (X59965), (M37688)
`Serine-type D-Ala-D-Ala carboxypeptidase
`16.4
`Family S12: D-AIa-D-Ala carboxypeptidase (gene da) (Clan SB: Ser, Lys, Ser, Glu catalytic tetrad)
`DAC_STRSP
`Serine-type D-Ala-D-Ala carboxypeptidase
`16.4
`(M84523)
`D-Aminopeptidase (Ochrobactrum)
`-
`AMPC_CITFR, AMPC_ECOLI, AMPC_ENTCL, AMPC_SERMA
`13-lactamase
`3.5.2.6
`FMDH_BACNO, FMDD_BACNO
`-
`Protein FIMD (Bacteroides)
`Family S13: Pericillin-binding protein 4 (Clan SE: Ser, Lys, Ser, Glu catalytic tetrad)
`[13]
`Serine-type D-Ala-D-Ala carboxypeptidase
`16.4
`Penicillin-binding protein 4
`PBP4_ECOLI
`16.4
`Family S14: ClpP (Ser, His catalytic residues (Asp not known))
`CLPP_ECOLI
`ATP-dependent endopeptidase (CIpP subunit)-
`(Eschenchia colt)
`Chloroplast ATP-dependent endopeptidase
`-
`n.a.
`Potato leaf roll luteovirus genomic RNA
`Family SI5: darimacipeptidyl peptidase IV
`Dipeptidyl peptidase IV (Lactococcus)
`14.5
`Family S16: Endpetidae La
`Endopeptidase La
`Family S17: Bactenofes endopeidase
`Extracellular endopeptidase (Bacteroides)
`FamilyS8: E
`VII
`Protease VII (Escherichia colt)
`-
`Coagulasetfibrinolysin (Yersinia)
`-
`Phosphoglycerate transport system activator -
`(Salmnonella)
`Family S19: Ccidk,kes
`Chymotrypsin-like protease (Coccidioides)
`Family S20: Proteas Do
`Protease Do (Salmonella)
`Family S21: Asenblin, herpesnrus
`Assemblin
`
`-
`
`-
`
`-
`
`-
`
`PRTE BACNO
`
`OMPT_ECOLI
`COLY YERPE
`PGTE_SALTY
`
`(X631 14)
`
`(X54548)
`
`UL26 HSV1 1, VG33_VZVD, CP40_ LV, YEC3_EBV, UL80HCMVA,
`(M64627)
`
`Family S2: Pbcertal protein 11
`Placental protein 11
`
`CYSTEINE PEPTIDASES
`
`Family Cl: Papain
`Dipeptidyl peptidase I
`Cysteine endopeptidases 1 (Haemonchus)
`Cysteine endopeptidases 1 (Haemonchus)
`Surface protective protein (Plasmodium)
`Circumsporozoite protein (Plasmodium)
`Cysteine endopeptidase (Entamoeba)
`Cysteine endopeptidase (Trypanosoma)
`Cruzipain (Trypanosoma)
`Cysteine endopeptidase (Theileria)
`Cysteine endopeptidase (Leishmania)
`Cysteine endopeptidases 1 and 2
`(Dictyostelium)
`Endopeptidase (baculovirus of Autographa)
`Papain
`Chymopapain
`Caricain
`
`-
`
`PP11 HUMAN
`
`7 CA: Gln, Cys, His, Asn active site residues)
`(Clan
`(D90404)
`14.1
`CYS1 HAECO,
`-
`(M80385)
`[14]
`CSP PLACM
`(M27307), (M64712), (M64721)
`CYSP_TRYBR
`(M90067)
`CYSP_THEPA, (M86659)
`(X621 63)
`CYS1_DICDI, CYS2_DICDI
`
`n.a.
`-
`-
`-
`-
`-
`-
`-
`
`(M67451)
`PAPA_CARPA
`PAP2_CARPA
`PAP3_CARPA
`
`22.2
`22.6
`22.30
`
`Page 5 of 14
`
`
`
`210
`
`N. D. Rawlings and A. J. Barrett
`
`Glycyl endopeptidase
`Actinidain
`Cysteine endopeptidase (tomato)
`Thaumatopain (Thaumatococcus)
`Calotropin (Calotropis)
`Cysteine endopeptidase (Brassica nap
`xis)
`Cysteine endopeptidase (mung bean)
`Endopeptidase EP-Cl (Phaseolus vulg
`iaris)
`Protein P34 (soya bean)
`Clone 1 5a protein (garden pea)
`Stem bromelain
`Aleurain (barley)
`Cysteine endopeptidases 2 and 3 (barl
`ley)
`Oryzain (includes forms a, B and y) (ric
`,e)
`Cysteine protease (Caenorhabditis)
`Cysteine endopeptidases 1, 2 and 3
`(Homarus)
`Allergen (Dermatophagoides)
`Allergen (Euroglyphus)
`Cathepsin L
`Cathepsin S
`Cathepsin H
`Cathepsin B
`
`Table 1 (contd.)
`22.25 PAP4_CARPA
`22.14 ACTN_ACTCH
`CYSL_LYCES
`-
`THPA_THADA
`-
`CALl_CALGI
`-
`[15]
`-
`SHEP_VIGMU
`-
`(X63102)
`-
`P34_SOYBN
`n.a.
`-
`[16]
`22.32 BROM_ANACO
`ALEU_HORVU
`-
`-
`[17]
`-
`[18]
`(M74797)
`(X63567), (X63568), (X63569)
`
`-
`
`-
`
`MMAL_DERPT
`(X60073)
`22.15 CATL_CHICK, CATL_HUMAN, CATL_MOUSE, CATL_RAT
`22.27 CATS_BOVIN, (M86553)
`22.16 CATH_HUMAN, CATH_RAT
`CATB_BOVIN, CATB_HUMAN, CATB_MOUSE, CATB_RAT,
`22.1
`(M75822), (M21309)
`(Clan CA Gin, Cys, His, Asn active site residues)
`(M64084)
`-
`(M67499)
`22.17
`22.17 CAPl_CHICK, CAPl_HUMAN, CAPl_RABIT
`22.17 CAP2_HUMAN, CAP2_RABIT
`22.17 CAP3_HUMAN, CAP3_RAT
`22.17 CAP4_MOUSE
`
`Fanily C2: Calpain
`Sol gene product (Drosophila)
`Calpain (Schistosoma)
`Calpain I
`Calpain 11
`Calpain P94
`Calcium-binding protein PMP41
`
`Family C3: Picornam
`Picornain 2A
`
`Picornain 3C
`
`Aphthovirus endopeptidase
`Cardiovirus endopeptidase
`Comovirus endopeptidase
`Family C4: Podyvirs endopepikase 1
`48 kDa endopeptidase
`
`Family C: Adencrus edrpetdase
`Endopeptidase adenovirus
`
`(Clan CB: His, Asp or Glu, Cys catalytic triad)
`22.29 POLG_POL1M, POLG_COXA2, POLG_SVDVH, POLG_BOVEV,
`POLG_HRV14
`22.28 POLH_POL1M, POLG_COXA2, POLG_SVDVH, POLG_BOVEV,
`POLG_HRV14, POLG_ECHO9, POLG_TMEVD
`POLG_FMDVD
`POLG_EMCV
`VGNB_CPMV, (D00657)
`(Clan CB: His, Asp, Cys catalytic triad)
`POLG_PPVD, POLG_PPVRA, POLG_PPVYN, POLG_TEV,
`-
`POLG_TVMV, POLG_WMV2, POLG_OMV
`(Clan CB: His, Cys catalytic triad)
`VPRT_ADEB3, VPRT_ADEB7, VPRT_ADE02, VPRT_ADE03,
`-
`VPRT_ADE04, VPRT_ADE05, VPRT_ADE12, VPRT_ADE40,
`VPRT_ADE41, (M81056)
`
`POLG_PPVD, POLG_PVYN, POLG_TEV, POLG_TVMV
`
`POLN_SINDV, POLN_RRVN, POLN_SFV, POLN_ONNVG,
`
`Fanily C6: PotyWirus endopeptiase 2
`-
`29 kDa endopeptidase
`Family C: Chestnut blight virus p29 enc
`eidse
`(M57938)
`p29 Endopeptidase (Chestnut blight vii
`rus)
`-
`Family C8: Chenut blight virus p48 eam
`rs -id (5e
`p48 Endopeptidase (Chestnut blight vii
`(M57938)
`rus)
`-
`Family C9: Togavirus cysteine erndpept
`tidase
`Togavirus cysteine endopeptidase
`
`Page 6 of 14
`
`
`
`Peptidase families
`
`211
`
`Fanily CI0: Streptopain
`Streptopain
`Family Cl1: Ckostipain
`a-Clostripain
`Family C12: Ubiquitin hydrolase
`Ubiquitin carboxyl-terminal hydrolase
`Fanily C13: Hamogobinse
`Haemoglobinase (Schistosoma)
`Fanily C14: lnterleukin-1 B converting erzyme
`Interleukin-1 B converting enzyme
`
`ASPARTIC PEPTIDASES
`
`Table 1 (contd.)
`
`22.10 STCP_STRPY
`
`22.8
`
`CLOL_CLOHI
`
`-
`
`-
`
`-
`
`UBL1_HUMAN, UBL3_HUMAN, [19]
`
`HGLB_SCHMA
`
`[20]
`
`Family Al: Pepsin
`Aspergillopepsin I
`Penicillopepsin
`Rhizopuspepsin
`Endothiapepsin
`Mucorpepsin
`Candidapepsin
`
`Polyporopepsin
`Saccharopepsin
`"Barrier" protein (Saccharomyces)
`Aspartic proteinase (barley)
`Pepsin A
`
`Aspartic endopeptidase P1
`Gastricsin
`Chymosin
`Embryonic pepsin (chicken)
`Renin, submandibular
`Renin, renal
`Cathepsin D
`Cathepsin E
`Family A2: Retropep6in
`Retropepsin
`
`(Clan AA. Asp, Asp catalytic residues)
`23.18 PEPA_ASPAW
`23.20 PENP_PENJA
`CARP_RHICH, CARP_RHINI,
`23.21
`23.22 CARP_CRYPA
`23.23 CARP_RHIMI, CARP_RHIPU
`23.24 CARP_CANAL, (X61438), (Zl 1918), (M83663), (X56867),
`(Z1 91 9)
`23.29 CARP_IRPLA
`23.25 CARP_SACFI, CARP_YEAST, (D10198)
`BAR1 YEAST
`-
`(X56136)
`-
`PEPA_CHICK, PEPA_BOVIN, PEPA_HUMAN, PEPA_MACFU,
`23.1
`PEPA MACMU, PEPA PIG,
`PIR JT0398
`-
`PEPC_HUMAN, PEPC_MACFU, PEPC_RAT
`23.3
`CHYM_BOVIN, CHYM_SHEEP
`23.4
`PEPE CHICK
`-
`23.15 RENS_MOUSE
`23.15 RENI_HUMAN, RENI_MOUSE, RENI_RAT
`CATD_HUMAN, CATD_MOUSE, CATD_PIG, CATD_RAT
`23.5
`23.34 CATE_HUMAN
`(Clan AA Asp, Asp catalytic residues)
`23.16 POL_HIV1A, POL_HIV2D, POL_SIVMK, POL BIV06, POL_EIAV,
`POL VILV, VPRT_MPMV, VPRT_MMTVB, GAG_RSVP, VPRT_BLV,
`POL_FLV, POL_GALV, VPRT_HTL1A, POL_MLVAV, VPRT SMRVH,
`VPRT_SRV1
`VPRT HUMAN
`(M25392)
`
`Retrovirus-related endopeptidase (hurn
`nan)
`-
`Retropepsin-like protein (vaccinia virus
`;)-
`
`METALLO-PEPTIDASES
`
`Family Ml: Aialyl aminopeptidase
`Membrane alanyl aminopeptidase
`
`Lysyl aminopeptidase (Lactoooccus)
`Aminopeptidase ysclI (Saccharomyces)
`BP-1/6C3 antigen, mouse
`Leukotriene A4hydrolase
`Family M2: Pepidy-dipepidae A
`Peptidyl-dipeptidase A
`
`(Clan MA: Peptidases with HEXXH zinc-binding rnotifl
`AMPN_ECOLI, AMPN_HUMAN, AMPN_PIG, AMPN_RAT, (X51508),
`11.2
`(M75750)
`(X61230)
`11.15
`(X63998)
`-
`BP1 MOUSE
`-
`LKHA_HUMAN, (M63848)
`3.3.2.6
`(Clan MA Peptidases with HEEXXH zinc-binding moti)
`ACE HUMAN, ACET_HUMAN, ACE_MOUSE, ACET_MOUSE,
`15.1
`ACE_RABIT, ACET_RABIT
`
`Page 7 of 14
`
`
`
`212
`
`N. D. Rawlings and A. J. Barrett
`
`Family M3: Thimet oligopeptidase
`Peptidyl-dipeptidase, bacterial
`Oligopeptidase (Salmonella)
`Mitochondrial intermediate peptidase
`Saccharolysin
`Thimet oligopeptidase
`Fanily M4:4 rTheroysin
`Thermolysin
`Pseudolysin
`Neutral endopeptidase (Bacillus
`stearotherrophilus)
`Bacillolysin
`
`Table 1 (contd.)
`(Clan MA: Peptidases with HEXXH zinc-binding mothf
`(X57947), (M84575)
`-
`(M84574)
`(M96633)
`-
`(X59720 - oar YCL57w)
`24.37
`24.15 MEPD_RAT
`(Clan MA: Peptidases with HEXXH zinc-binding mrotif)
`24.27 THER_BACST, THER_BACTH
`24.26 ELAS_PSEAE
`PIR B36706
`-
`
`24.28 THER_BACCE, THER_BACCL, NPRE_BACSU, (D00861), (K02497),
`(M64815), (X61380)
`PROA_LEGPN
`-
`(M64809), (M59466)
`-
`(M36651)
`-
`PROL_LISMO
`-
`(M37185)
`24.30
`(Clan MA Peptidases with HEXXH zinc-binding moti
`NPR_STRCI
`24.31
`(Clan MA: Peptidases with HEXXH zinc-binding motif)
`INA_BACTL
`-
`
`Metalloendopeptidase (Legionella)
`Vibriolysin (Vibrio)
`Extracellular endopeptidase (Erwinia)
`Metalloendopeptidase (Listeria)
`Coccolysin
`Family M5: Mycolysin
`Mycolysin
`Family M6: Immune inhibitor A
`Immune inhibitor A (Bacillus
`thuringiensis)
`Family MT.: St
`small neutral proea(Clan MA Peptidases with HEXXH zinc-binding motif)
`v
`Small neutral protease (Streptom}ves)
`-
`(M81 703), (M86606), (Zl 1929)
`Family M8: Lenshmanlyn
`(Clan MA Peptidases with HEXXH zinc-binding motif
`Leishmanolysin
`24.36 GP63_LEICH, GP63_LEIDO, GP63_LEIMA, (X64394)
`Family MI: Mirbia col
`(Clan MA Peptidases with HEXXH zinc-binding mwotfn
`Collagenase (Vibrio)
`24.3
`[21]
`Family M1: Irterstitial cola
`(Clan MA: Peptidases with HEXXH zinc-binding mnotif
`Serralysin
`24.40 PRTB ERWCH, PRTC ERWCH, PRTX ERWCH, PRZN SERSP
`Envelysin
`24.12 HE_PARLI
`24.23 COG7_HUMAN
`Matrilysin
`COG1_HUMAN, COG1_PIG, COG1_RABIT
`Interstitial collagenase
`24.7
`24.34 COG8_HUMAN
`Neutrophil collagenase
`24.17 COG3_HUMAN, COG3_RABIT, COG3_RAT
`Stromelysin 1
`Stromelysin 2
`24.22 COGX HUMAN, COGX RAT
`Stromelysin 3
`COGY_HUMAN
`-
`Gelatinase A
`24.24 GOG2_HUMAN
`24.35 COG9_HUMAN
`Gelatinase B
`Family MI 1: Atoysin
`(Clan MA Peptidases with HEXXH zinc-binding mnotif
`Autolysin
`[22]
`24.38
`Family M12: Astacin
`(Clan MA Peptidases with HEXXH zinc-binding motif
`Metalloendopeptidase (Caenorthabditis)
`(M75746)
`-
`Blastula protease-1 0 (Paracentrotus)
`(X56224)
`-
`24.21. ASTA-ASTFL
`Astacin
`(M76976)
`tolloid gene product (Drosophila)
`-
`UVS.2 protein (Xenopus)
`[23]
`-
`24.48 HRT2 CRORU
`Ruberlysin
`24.42 HRTD_CROAT
`Atrolysin c
`24.53 HR2 TRIFL
`Trimerelysin 11
`HR2a-endopeptidase (habu snake)
`HR2A TRIFL
`-
`HR1 B-endopeptidase (habu snake)
`HR1 B TRIFL
`-
`HRL2 LACMU
`Haemorrhagic factor LHFII
`-
`(bushmaster snake)
`Meprin A
`
`ase
`
`se
`
`24.1 8
`
`(M74897)
`
`Page 8 of 14
`
`
`
`Peptidase families
`
`213
`
`PABA-peptide hydrolase
`Bone morphogenetic protein 1
`Family M13: Neprilysin
`Neprilysin
`Kell blood group protein
`Fanily M14: Carbypetidase A
`Zinc-carboxypeptidase (Streptomyces)
`Carboxypeptidase T (Thermoactinomyces)
`Carboxypeptidase B
`Carboxypeptidase A
`
`Table 1 (contd.)
`(M82962)
`24.18
`BMP1_HUMAN
`-
`(Clan MA Peptidases with HEXXH zinc-binding motif
`24.11 NEP HUMAN, NEP_RABIT, NEP_RAT
`KELL HUMAN
`-
`(HXXE zinc-binding motif
`-
`CBPS_STRGR
`(X56901)
`-
`CBPB ASTFL, CBPB BOVIN, CBPB_RAT, (M75106)
`17.2
`CBPA BOVIN, CBPC_HUMAN, CBPC_MOUSE, CBP1_RAT,
`17.1
`CBP2_RAT, (A25833)
`Lysine carboxypeptidase
`CBPN HUMAN
`17.3
`17.10 CBPH BOVIN, CBPH_HUMAN, CBPH_RAT, (X61232), [24]
`Carboxypeptidase H
`17.12 CBPM_HUMAN
`Carboxypeptidase M
`Family M15: Muramnoylpetapekide carbaKypeAidwe (HXH zinc-binding motif
`Muramoyl-pentapeptide carboxypeptidase
`CBPM_STRGR
`17.8
`Family M16: Pitrilysin
`(HYXXEH zinc-binding motif
`Pitrilysin
`99.44 PTR_ECOLI
`pqqF gene product (KJebsiella)
`(X58778)
`-
`IDE DROME, IDE_HUMAN
`Insulinase
`99.45
`99.41 MPP1_NEUCR, MPP1_YEAST, MPP1_RAT
`Mitochondrial processing peptidase
`MPP2_NEUCR, MPP2_YEAST
`Processing enhancing protein
`-
`UCR1_YEAST, UCR2_YEAST, UCR2_HUMAN
`Ubiquinol-cytochrome c reductase
`1.6.99.3
`core proteins 1 and 2
`Family M17: L1uyl aminopeptikase
`Leucyl aminopeptidase
`Aminopeptidase A (Escherichia co/i)
`Family M18: Amiropepidase yscl
`Aminopeptidase yscl (Saccharomyces)
`Family M19: Membrae dipeptidase
`Membrane dipeptidase
`Open reading frame X product (K/ebsiella)
`Gene R product (Acinetobacte)
`Family M20: CarbaKypeptkiase G2
`Carboxypeptidase G2 (Pseudomonas)
`Peptidase T (Salrnonella)
`Family M21: Gly-X carboxypeptidase
`Gly-X carboxypeptidase (Saccharomyces)
`Family M22: Al Glycoprotease
`Al Glycoprotease (Pasteurel/a)
`OrfX (Escherichia coli)
`OrfX (Saimonel/a)
`Family M23: Blytic endopeptkiase
`B-Lytic endopeptidase
`LasA protein (Pseudomonas)
`Family M24: Methonyl aminopepfidase
`Methionyl aminopeptidase
`Aminopeptidase P (Escherichia co/i)
`X-Pro dipeptidase
`Family M25: X-His-dipeptidase
`X-His dipeptidase
`
`(Peptidases binding two zinc atoms: Lys, Glu, Asp, Asp, Glu)
`AMPL BOVIN, (X63444)
`11.1
`AMPA_ECOLI, (M68966)
`-
`
`-
`
`AMPL_YEAST, LAP4_YEAST
`
`13.19 MDP4 HUMAN, MDP4_PIG
`(X58778)
`(X06452)
`
`-
`-
`
`CBPG_PSES6
`(M62725)
`
`17.4
`
`(X57316)
`
`-
`-
`-
`
`(M62364)
`YRUX_ECOLI
`(M 14427)
`
`24.32 PRLB_LYSEN, (M60896)
`LASA_PSEAE
`-
`
`11.18
`
`13.9
`
`AMPM_BACSU, AMPM_ECOLI, AMPM_SALTY
`AMPP_ECOLI
`PEPQ_ECOLI, PEPD_HUMAN
`
`13.3
`
`PEPD_ECOLI
`
`PEPTIDASES OF UNKNOWN CATALYTIC TYPE
`
`Family Ul: Aminopepfidase T
`Aminopeptidase T (Thermus)
`
`-
`
`AMPT_THEAQ
`
`Page 9 of 14
`
`
`
`214
`
`N. D. Rawlings and A. J. Barreft
`
`Table 1 (contd.)
`
`-
`
`IAP_ECOLI
`
`Family U2: Aminopetdase IAP
`Alkaline phosphatase isozyme conversion
`protein (Escherichia colt)
`Family U3: Spore enIopeptkJase, Bacilus mnegatenLn
`Spore endopeptidase (Bacillus megaterium)
`(M55262)
`-
`Family U4: Sporuation sigma factor processing peptidase
`Sporulation sigma factor processing
`SP2G_BACSU
`peptidase (Bacillus subtilis)
`Family U5: Tail-specific proteae
`Tail-specific protease (Esherichia colt)
`Family U6: Murein endopeptidase
`Penicillin-insensitive murein endopeptidase
`(Esherichia colt)
`Family U7: Endopepidase IV
`Endopeptidase IV (Escherichia coll)
`sohB gene product (E. coll)
`Minor capsid protein precursor C
`(bacteriophage lambda)
`Family U8: Bacteriophage endopeptidase
`Endopeptidase (bacteriophage)
`
`(M75634)
`
`MEPA_ECOLI
`
`SPPA_ECOLI, LICA_HAEIN
`(M73320)
`VCAC_LAMBD
`
`Fanily U9: Prohead endopeptidase
`Prohead endopeptidase (bacteriophage T4)
`Family Ul0: Leade pepkiase
`Leader peptidase
`Mitochondrial inner membrane peptidase 1
`(Saccharomyces)
`Fanily Ul 1: Premurein leder peptkiase
`Premurein leader peptidase
`
`99.35
`
`-
`
`-
`
`-
`
`-
`
`Family U12: Prepilin leader pekidase
`Prepilin leader peptidase (Vibrio)
`Late competence protein (Bacillus)
`xpcA protein (Pseudormonas)
`Pullulanase secretion protein (Klebsiella)
`Family U13: Leader pepiklase componert 3-4
`Leader peptidase 21 kDa subunit (dog)
`99.36
`Leader peptidase 18 kD subunit (dog)
`99.36
`Leader peptidase (sec1 1) (Saccharomyces) 99.36
`Family U14: Leader petidase component 2
`Leader peptidase 22-23 kDa subunit (dog)
`99.36
`Microsomal leader peptidase (chicken)
`99.36
`Leader peptidase (Drosophila)
`99.36
`Family Ul5: Multkcaalyic endopeptidase complex
`Multicatalytic endopeptidase subunits
`99.46
`
`SCL1 suppressor protein (Saccharomyces) 99.46
`Family U16: Thermopsin
`Thermopsin
`99.43
`Family U17: Ubiquti-pefic procesi-protease
`Ubiquitin-specific processing protease I
`(Saccharomyces)
`
`ENPP_BPPA2, ENPP_BPP22, ENPP_BPT3, ENPP_BPT7,
`ENPP_LAMBD
`
`PCPP_BPT4
`
`99.36
`-
`
`LEP_ECOLI, LEP_SALTY, (X56466), (Zl 1847)
`[25]
`
`LPSA_ECOLI,, LPSA_ENTAE, LPSA_PSEFL, (M83994),
`(M84707)
`
`(M74708)
`COMC_BACSU
`PILD_PSEAE
`PULO_KLEPN
`
`SPC3_CANFA
`SPC4_CANFA
`[26]
`
`SPC2_CANFA
`(X60795)
`(M32022)
`
`PRCA_THEAC, (M83674), (J05358), PRC1_YEAST, PRC7_YEAST,
`(M63641), PRCD_YEAST, PRCB_YEAST, PRCX_YEAST,
`PRCZ_YEAST, PR28_DROME, PR29_DROME, PR35_DROME,
`PRC3_XENLA, (X62709), PRC2_RAT, PRC5_RAT, PRC3_RAT,
`PRC8_RAT, PRC9_RAT, (M64992), (D00760), (D00761), (D00762),
`(D00763), (D10729), (X64449)
`SCL1_YEAST
`
`THPS_SULAC
`
`(M63484)
`
`Page 10 of 14
`
`
`
`Table 1 (contd.)
`
`Peptidase families
`
`215
`
`23.32 PRTB SCYLI
`PRTA ASPNG
`-
`
`Family U18: Scytaidaen
`Scytalidiapepsin B
`Scytalidiapepsin (Aspergillus)
`Fanily U19: PestMrusndsoepdase
`Endopeptidase (cattle viral diarrhoea virus)
`(M37795), (M62430)
`-
`Family U20: y-D-Glutamyl-L-diamino acid endopeptidase 11
`y-D-glutamyl-L-diamino acid endopeptidase
`-
`(X64809)
`II (Bacillus sphaericus)
`Family U21: Potyvirus endopeptiase 3
`35 kDa endopeptidase
`
`-
`
`POLG_PPVD, POLG PPVRA, POLG_PPVYN, POLG_TEV,
`POLG TVMV
`
`chains of the enzymically active members of family Si is His,
`Asp, Ser. rhe same order of residues is seen in family S2 (a-lytic
`endopeptidase) and family S3 (togavirus endopeptidase), and
`members of these families also have tertiary structures similar to
`that of chymotrypsin [28,29]. This strongly suggests that they
`share a common evolutionary origin, despite the differences of
`sequence, and accordingly we group families Si, S2 and S3 in a
`single clan (SA). The evidence is less complete for families S4, S5,
`S6 and S7, but there are indications that these also may belong
`in this clan [30-33].
`The enzymes of the subtilisin (S8) family have a different order
`of catalytic-site residues from chymotrypsin, namely Asp, His,
`Ser, and also have different tertiary structures. It is therefore
`quite clear that the family represents a separate evolutionary line
`of serine peptidases [34]. The family contains an exopeptidase
`(tripeptidyl peptidase II) as well as endopeptidases with various
`specificities. Most of the microbial members of the family
`have specificities somewhat like that of chymotrypsin, but the
`eukaryote enzymes include the proprotein convertases such as
`kexin and furin, which are specific for substrates containing
`paired basic residues [35].
`We consider that the family of prolyl oligopeptidase (S9)
`reflects a further distinct evolutionary line of serine peptidases.
`In this family there is again a different order of catalytic residues,
`Ser554 and His680 being known for pig prolyl oligopeptidase [36].
`We have suggested that if an Asp residue completes a catalytic
`triad, Asp529 is the most likely [37]. There is evidence that prolyl
`oligopeptidase differs significantly in catalytic mechanism from
`the enzymes of families S1 and S8 [38,39]. The family contains
`two endopeptidases with the restricted specificity for substrate
`size that makes them oligopeptidases [37]; one of these cleaves
`prolyl bonds, whereas the other acts on bonds with a basic
`residue in the P1 position. The family also contains a dipeptidyl
`peptidase and an omega peptidase [37].
`The serine-type carboxypeptidases form family S10, in which
`the order of catalytic residues is Ser, Asp, His. The tertiary
`structure of these enzymes is unlike those known for other
`families, and they are unusual amongst serine-type hydrolases in
`being maximally active at about pH 5 [40]. There are similarities
`between the structures of the active sites of these enzymes and
`those lipases [40] and acetylcholinesterases.
`There are three distinct families of serine-type D-Ala-D-Ala
`carboxypeptidases, Sll, S12 and S13, all confined to bacteri