`
`Carboxylic ester hydrolases: Classification
`and database derived from their primary,
`secondary, and tertiary structures
`
`Yingfei Chen,1 Daniel S. Black,2 and Peter J. Reilly1*
`
`1Department of Chemical and Biological Engineering, Iowa State University, Ames, Iowa 50011
`2Information Technology Services, Iowa State University, Ames, Iowa 50011
`
`Received 6 August 2016; Accepted 12 August 2016
`DOI: 10.1002/pro.3016
`Published online 17 August 2016 proteinscience.org
`
`Abstract: We classified the carboxylic ester hydrolases (CEHs) into families and clans by use of
`multiple sequence alignments, secondary structure analysis, and tertiary structure superposi-
`tions. Our work for the first time has fully established their systematic structural classification.
`Family members have similar primary, secondary, and tertiary structures, and their active sites
`and reaction mechanisms are conserved. Families may be gathered into clans by their having
`similar secondary and tertiary structures, even though primary structures of members of different
`families are not similar. CEHs were gathered from public databases by use of Basic Local Align-
`ment Search Tool (BLAST) and divided into 91 families, with 36 families being grouped into five
`clans. Members of one clan have standard a/b-hydrolase folds, while those of other two clans
`have similar folds but with different sequences of their b-strands. The other two clans have mem-
`bers with six-bladed b-propeller and three-a-helix bundle tertiary structures. Those families not in
`clans have a large variety of structures or have no members with known structures. At the time
`of writing, the 91 families contained 321,830 primary structures and 1378 tertiary structures. From
`these data, we constructed an accessible database: CASTLE (CArboxylic eSTer hydroLasEs,
`http://www.castle.cbe.iastate.edu).
`
`Keywords: carboxylesterases; cholinesterases; cocaine esterases; cutinases; lysopholipases; phos-
`pholipases; triacylglycerol lipases
`
`Introduction
`Carboxylic ester hydrolases (CEHs) catalyze the
`hydrolysis of ester bonds into alcohols and carboxylic
`acids, and they are ubiquitous throughout life. Mem-
`bers of this enzyme group that attack different sub-
`strates, form different products, and have different
`
`Additional Supporting Information may be found in the online
`version of this article.
`
`*Correspondence to: Peter J. Reilly, Department of Chemical
`and Biological Engineering, 2114 Sweeney Hall, Iowa State Uni-
`versity, Ames, IA 50011-2230. E-mail: reilly@iastate.edu
`
`names are listed as EC 3.1.1.1 to EC 3.1.1.98, with
`entries.1 Carboxylesterases
`seven deleted
`(EC
`3.1.1.1), triacylglycerol lipases (EC 3.1.1.3), phospho-
`lipase A2s (EC 3.1.1.4),
`lysophospholipases (EC
`3.1.1.5), acetylcholinesterases (EC 3.1.1.7), butyryl-
`cholinesterases (EC 3.1.1.8), aminoacyl-tRNA hydro-
`lases (EC 3.1.1.29), and cocaine esterases (EC
`3.1.1.84) are the most extensively researched CEHs.
`According to the CATH database,2 many CEHs
`have standard a/b hydrolase folds, which are com-
`posed of three a/b/a layers, with the second b-strand
`
`1942
`
`PROTEIN SCIENCE 2016 VOL 25:1942—1953
`
`Published by Wiley-Blackwell. VC 2016 The Protein Society
`
`Exhibit 2077
`Page 01 of 12
`
`
`
`being antiparallel to generally seven others in the b-
`sheet.3,4 Other CEHs with other types of a/b hydro-
`lase folds have different arrangements of their a-
`helices and b-strands. Some CEHs have
`six-
`propeller folds, which consist of a six-bladed b-sheet
`with a central axis. Others have four-layer b-sand-
`wich folds, where several antiparallel b-strands are
`arranged in two b-sheets. Three-solenoid folds are
`also found in CEH structures; they consist of many
`parallel b-strands arranged into three b-sheets. The
`outer-membrane CEHs are commonly found in b-
`barrel folds.
`To this point, there is no systematic structural
`classification of the CEHs. Such a classification is likely
`to yield very different results than found with Enzyme
`Commission (EC) numbering, because enzymes with
`different EC numbers and different names may have
`very similar amino acid sequences (primary structures)
`and three-dimensional (tertiary) structures. This has
`been demonstrated in the Carbohydrate-Active EnzYme
`(ThYme) databases and Thioester-Active EnzYme data-
`bases,5,6 as well as elsewhere.
`Earlier research has partially covered this topic,
`but with many fewer primary structures than
`amassed in this project. The ESTerases and a/b-
`Hydrolase Enzymes and Relatives (ESTHERs) data-
`base7,8 covers some CEHs, focusing on a/b hydrolase
`structures. It is not limited to CEHs, but includes
`other enzymes such as peptidases and thioesterases
`that have this fold. ESTHER classifies sequences
`into three levels: blocks, rank 1 families, and rank 2
`families, where each block is based on conserved
`and characteristic parts of sequences. At the time of
`writing, it had over 40,000 primary structures clas-
`sified into four blocks, 94 rank 1 families, and 93
`rank 2 families further divided from 11 rank 1
`families.
`The CAZy database has classified 15 homolo-
`gous families of carbohydrate esterases, which cata-
`lyze the de-O- or de-N-acylation of substituted
`saccharides, by their primary structures. Twelve of
`these families contain CEHs, mainly acetyl xylan
`esterases, and mainly with a/b hydrolase folds. At
`the time of writing, about 32,000 primary structures
`of carbohydrate esterases were included, of which
`almost 21,000 were CEHs.
`The lipase engineering database (LED)9 classi-
`fied three classes and 38 superfamilies with a/b
`hydrolase folds, of which 16 included lipases and 10
`covered other CEHs, by their functions, sequences,
`and crystal structures. Its founders employed much
`smaller E-values in their use of Basic Local Align-
`(BLAST)10 to gather primary
`ment Search Tool
`structures than used in this work, implying that the
`family members in LED were more similar to each
`other than here, perhaps leading to more families. It
`encompassed 112 homologous families and almost
`
`25,000 primary and over 1100 tertiary structures.
`However, it has not been maintained since 2009.
`The MELDB database sorted microbial carboxy-
`lesterases and triacylglycerol lipases by their prima-
`ry structure similarities.11 It corresponded to parts
`of the LED database, but it appears to be no longer
`available.
`The research reported in this article systemati-
`cally classifies the CEHs by their primary, second-
`ary, and tertiary structure similarities, as opposed to
`classifying them by their EC numbers. This will cast
`light on the various ways that CEHs with different
`structures catalyze the hydrolysis of ester bonds to
`yield carboxylic acids and alcohols.
`
`Potential Family Identification
`CEH families were generally identified by the tech-
`niques used for classifying fatty acid synthesis
`enzymes.6,12,13 All the primary structures of CEHs,
`chosen by their EC numbers, with evidence at pro-
`tein level in the UniProt database14 were collected.
`These totaled 752 sequences. The criterion of evi-
`dence at protein level is to ensure that wet laborato-
`ry experiments had been conducted on these
`proteins to verify their functions as CEHs. This cri-
`terion ruled out most available CEH sequences,
`mainly those obtained from whole-genome projects,
`whose functions are putative because their sequen-
`ces have been compared only with those of known
`CEHs rather than being verified experimentally.
`The collected primary structures were checked
`on the Pfam database15 to obtain their catalytic
`domains only. BLAST was used consecutively to find
`primary structures similar to these catalytic domains
`(query sequences) from the National Center for Bio-
`technology Information’s up-to-date nr database,16
`which gathers nonredundant protein sequences from
`various sources such as the GenBank,17 Protein Data
`Bank (PDB),18 Protein Information Resource,19 Pro-
`tein Research Foundation,20 RefSeq,16 and Swiss-
`Prot14 databases. The threshold E-value in BLAST
`was set to 0.001. Protein sequences with E-values
`lower than 0.001 were regarded as similar enough to
`the query sequence to be included in one potential
`family.10 In-house Python and shell scripts were
`implemented to automate the process of obtaining
`catalytic domains of query sequences in Pfam, to con-
`duct BLAST consecutively, and to further analyze the
`results of structure comparison. All the scripts were
`run on the Google cloud platform with Linux Cent
`OS7 installed.
`
`Family Verification
`secondary
`(MSA),
`Multiple
`sequence alignment
`structure comparison, and tertiary structure super-
`position are the three main techniques to verify
`membership in the potential
`families, possibly
`
`Chen et al.
`
`PROTEIN SCIENCE VOL 25:1942—1953
`
`1943
`
`Exhibit 2077
`Page 02 of 12
`
`
`
`merging or splitting them. It is assumed that all
`members of a family have the same protein ancestor.
`A random sample of sequences in each potential
`family was used to perform MSA with ClustalX
`2.1.21 The alignment is to ensure that these sequen-
`ces are similar enough, with several positions of
`amino acid residues conserved along the entire sam-
`ple. Different potential families gathered by BLAST
`were subjected to joint MSA to ascertain whether
`they could be merged into one family. Conversely, if
`no amino acid residue is conserved and if clear dif-
`ferences are observed in the MSA result, then the
`potential family was split into two or more families.
`Occasionally no residue is conserved in what clearly
`is a family because of a sequence error in one or a
`few of its members.
`Up to 50 tertiary structures from each potential
`family, if available in the PDB, were superimposed
`by MultiProt.22 The monomer of each tertiary struc-
`ture was extracted and compared. The root mean
`square deviation (RMSD) of the a-carbon atoms of
`the different
`tertiary structures was calculated,
`together with the Pavg, a measure of the percentage
`of these atoms that are close enough (<4.0 A˚ ) to be
`compared.12,13
`A further criterion to verify family membership
`is that active sites of potential members should
`remain in similar positions within each family. Also,
`secondary structure elements, based on the DSSP
`database23,24 embedded in the PDB, were compared
`and analyzed to ensure that potential members of
`each family have almost the same elements.
`Finally, memberships of potential families were
`manually inspected to confirm that they held a sig-
`nificant number of entries with names and EC num-
`bers specific to CEHs.
`
`Clan Identification and Verification
`Clans are composed of two or more different fami-
`lies, where their active sites, reaction mechanisms,
`and secondary and tertiary structures are conserved
`from family to family, although their primary struc-
`tures may not be significantly similar from one fami-
`ly to the next. It is assumed that family members of
`different clans are more distantly derived from the
`same protein ancestor than are members of the
`same family.
`We used CATH-defined folds to first divide
`available tertiary structures in different families
`into separate groups. We then used two separate
`procedures to determine membership of families in
`clans. In the first, tertiary structure representatives
`from different families with similar tertiary struc-
`tures were superimposed by MultiProt. Varying from
`previous methods to calculate RMSD and Pavg val-
`ues of members of all potential families in a clan,
`pairwise RMSD and Pavg values after overlapping
`representative tertiary structures from pairs of
`
`families were calculated. This variation was caused
`by the large number of families with similar folds,
`making it difficult to visually distinguish them by
`PyMOL.25 The MultiProt superposition, along with
`RMSD and Pavg calculations, were implemented by
`Python scripts, and RMSD and Pavg values were
`recorded in matrices. To cluster similar structures
`into potential clans, the pairwise RMSD matrices
`were imported into MEGA 6.06,26 and neighbor-
`joining trees were produced as curved and circular
`trees. Although MEGA was intended to produce phy-
`logenetic trees for the study of molecular evolution,
`the pairwise distance matrix used in MEGA is simi-
`lar enough to be used for RMSD matrices. Potential
`clans were proposed according to the trees. Then the
`structures of potential clan members were superim-
`posed and inspected in PyMOL.
`In the second procedure, similar secondary and
`tertiary structures from different
`families were
`grouped roughly into potential clans, and then their
`structures were superimposed by MultiProt and
`visually inspected by PyMOL. If the superposition
`were satisfactory, RMSD and Pavg values for single
`representatives of all the potential family members
`of a potential clan were calculated. The proposed
`classification was tuned until the structures super-
`imposed in PyMOL were in good alignment and
`RMSD and Pavg values were minimized. Active sites
`were checked, if available, to see whether the cata-
`lytic residues are in similar positions to act on the
`substrates, and whether they share the same mecha-
`nism in each clan.
`Interestingly, the initial pairwise alignment pro-
`cedure did not perform as well as the initial visual
`inspection procedure. Therefore, the latter technique
`was chosen to assign families to clans.
`
`Results
`BLAST searches using the 752 query sequences
`yielded 480,148 primary structures of CEHs and
`other enzymes. In addition, 2101 tertiary structures
`were gathered from the PDB. The primary struc-
`tures were classified into 130 potential families.
`The membership of each potential family was
`verified by three methods: MSA of primary struc-
`tures, secondary structure analysis, and tertiary
`structure superpositions. The potential 130 families
`obtained by BLAST became 91 families after MSA
`using ClustalX, secondary structure analysis, and
`tertiary structure superposition by MultiProt and
`PyMOL, and after noting that some potential fami-
`lies had no or very few CEH members (Table I).
`After these operations, 321,830 primary structures,
`1490 sequences with evidence at protein level, and
`1378 tertiary structures remained.
`The ClustalX sequence alignment files, using 50
`representative sequences of each family (or all that
`were available, if <50), are in Supporting Information
`
`1944
`
`PROTEINSCIENCE.ORG
`
`Classification and Database of CEHs
`
`Exhibit 2077
`Page 03 of 12
`
`
`
`Table I. Clans and Families of Carboxylic Ester Hydrolases
`
`Number of
`sequences with
`evidence at
`protein level
`
`Number of
`known tertiary
`structures
`(representative
`PDB structures)
`
`Family
`
`Number of
`sequences
`
`Producing
`organismsa
`
`Dominant
`enzyme
`names
`
`Dominant
`EC numbers
`
`Clan A (a/b-hydrolase, three-layer a/b/a sandwich, Rossmann fold, second b-strand antiparallel with sequence 1, 2, 4, 3, 5,
`6, 7, and 8)
`1216
`
`1
`
`5
`
`1 (3QIT)
`
`B, Ea
`
`3.1.1.–
`
`2
`
`3
`
`4
`
`5
`6
`
`7
`
`8
`9
`
`10
`
`11
`
`31,202
`
`26,277
`
`497
`
`2191
`1181
`
`1437
`
`730
`2432
`
`3896
`
`1359
`
`41
`
`69
`
`5
`
`5
`10
`
`8
`
`2
`13
`
`3
`
`4
`
`131 (5ALM)
`
`53 (3L1J)
`
`10 (4CG1)
`
`14 (3FYU)
`2 (3C5V)
`
`B
`
`B
`B, E
`
`3.1.1.24
`
`3.1.1.–
`
`3.1.1.3
`
`3.1.1.72
`3.1.1.89, 1.11.1.–
`
`6 (3D59)
`
`B, E
`
`3.1.1.–, 3.1.1.47
`
`4 (4G4G)
`2 (1K8Q)
`
`2 (3HXK)
`
`14 (4UYU)
`
`B, E
`E
`
`B, E
`
`B, E
`
`3.1.1.72
`3.1.1.–
`
`3.2.1.8, 3.1.1.–
`
`3.1.1.–
`
`169
`
`280 (2JGJ)
`
`a/b-Hydrolase, esterase,
`thioesterase
`A, B, E a/b-Hydrolase, 3-
`oxoadipate enol-
`lactonase
`A, B, E a/b-Hydrolase, acetyles-
`terase, esterase/lipase
`Lipase, triacylglycerol
`lipase
`Acetyl xylan esterase
`Protein phosphatase
`methylesterase,
`peroxidase
`Carboxylic ester hydro-
`lase, platelet-activating
`factor acetylhydrolase
`Acetyl xylan esterase
`Lysomal acid lipase,
`lipase member M
`Xylanase, pectin acetyles-
`terase, esterase
`Pectin acetylesterase, pro-
`tein notum homolog
`A, B, E Carboxylesterase, carbox-
`ylic ester hydrolase,
`acetylcholinesterase,
`cholinesterase, cocaine
`esterase
`Esterase
`
`B, E
`
`3.1.1.8, 3.1.1.84
`
`3.1.1.–
`
`12
`
`24,560
`
`13
`
`4538
`
`6
`
`19 (3ZI7)
`
`Clan B (a/b-hydrolase, three-layer a/b/a sandwich, Rossmann fold, all b-strands parallel with sequence 2, 1, 3, 4, and 5)
`14
`410
`5
`2 (1YQE)
`A, E
`D-Aminoacyl-tRNA
`3.1.1.96
`deacylase
`GDSL-like lipase, aryles-
`terase, acyl-CoA
`thioesterase
`Lipase
`Cutinase, acetyl xylan
`esterase
`GDSL-like lipase,
`acetylhydrolase
`A, B, E Rhamnogalacturonan ace-
`tylesterase, GDSL family
`lipase, carbohydrate
`esterase family 12 protein
`a/b-Hydrolase, esterase
`GDSL family lipase,
`triacylglycerol lipase
`Lipase, lactonizing lipase
`Phospholipase A2,
`lecithin-cholesterol
`acyltransferase
`Lipase, triacylglycerol
`lipase
`
`15
`
`16
`17
`
`18
`
`19
`
`20
`21
`
`22
`23
`
`24
`
`5264
`
`1869
`2960
`
`1463
`
`2262
`
`1717
`1985
`
`2374
`498
`
`1537
`
`4
`
`2
`12
`
`14
`
`6
`
`15
`5
`
`9
`8
`
`17
`
`7 (1U8U)
`
`B
`
`3.1.1.–, 3.1.1.2, 3.1.2.2
`
`13 (1R50)
`60 (1XZG)
`
`10 (1BWQ)
`
`6 (1DEO)
`
`43 (1EB8)
`4 (1ESD)
`
`18 (1CVL)
`7 (4X92)
`
`B, E
`B, E
`
`B, E
`
`B, E
`B, E
`
`B, E
`B, E
`
`3.1.1.–
`3.1.1.74, 3.1.1.72
`
`3.1.1.–
`
`3.1.1.86, 3.1.1.–
`
`3.1.1.–
`3.1.1.–, 3.1.1.3
`
`3.1.1.3
`3.1.1.4, 2.3.1.43
`
`15 (4X71)
`
`B, E
`
`3.1.1.–, 3.1.1.3
`
`Clan C (a/b-hydrolase, three-layer a/b/a sandwich, Rossmann fold, first b-strand antiparallel with sequence 1, 3, 2, 4, 5, 6,
`and 7)
`25
`
`A, B, E Carboxymethylenebuteno-
`lidase, dienelactone
`hydrolase
`
`3.1.1.45
`
`6115
`
`6
`
`19 (1ZJ5)
`
`Chen et al.
`
`PROTEIN SCIENCE VOL 25:1942—1953
`
`1945
`
`Exhibit 2077
`Page 04 of 12
`
`
`
`Table I. Continued
`
`Family
`
`Number of
`sequences
`
`Number of
`sequences with
`evidence at
`protein level
`
`26
`
`27
`
`28
`
`29
`30
`
`8649
`
`4643
`
`5655
`
`4262
`1818
`
`9
`
`15
`
`30
`
`3
`4
`
`Number of
`known tertiary
`structures
`(representative
`PDB structures)
`
`14 (4ETW)
`
`5 (4UUQ)
`
`9 (3CN9)
`
`18 (1TQH)
`2 (3BF8)
`
`31
`
`314
`
`1
`
`14 (1LBS)
`
`Clan D (six-bladed b-propeller)
`32
`8864
`16
`
`42 (4GN9)
`
`33
`
`672
`
`18
`
`7 (3SRE)
`
`Producing
`organismsa
`
`Dominant
`enzyme
`names
`
`B, E
`
`A, B, E a/b-Hydrolase, hydrolase
`(biotin biosynthesis),
`carboxylesterase
`A, B, E a/b-Hydrolase,
`lysophospholipase
`Carboxylesterase,
`phospholipase
`Esterase, carboxylesterase
`A, B
`A, B, E a/b-Hydrolase, 3-
`oxoadipate enol
`lactonase, 2-succinyl-6-
`hydroxy-2,4-
`cyclohexadiene-1-
`carboxylate synthase
`Lipase
`
`B, E
`
`A, B, E Gluconolactonase, SMP-
`30/gluconolactone/LRE-
`like region
`Serum paraoxonase/
`arylesterase 2
`
`B, E
`
`Dominant
`EC numbers
`
`3.1.1.1
`
`3.1.1.5
`
`3.1.1.1, 3.1.1.–
`
`3.1.1.–, 3.1.1.1
`3.1.1.24, 4.2.99.20
`
`3.1.1.–
`
`3.1.1.17
`
`3.1.1.2, 3.1.1.81
`
`Clan E (three a-helix bundle)
`34
`2791
`325
`35
`738
`14
`36
`269
`5
`
`288 (1TG1)
`1 (1POC)
`3 (2WG7)
`
`E
`B, E
`B, E
`
`Phospholipase A2
`Phospholipase A2
`Phospholipase A2
`
`3.1.1.4
`3.1.1.4
`3.1.1.4
`
`Not part of a clan
`(a/b-Hydrolase, three-layer a/b/a sandwich, Rossmann fold, various b-strand arrangements)
`37
`1715
`15
`8 (3ERJ)
`A, B, E Peptidyl-tRNA
`hydrolase
`6-Phosphogluconolactonase
`Pancreatic glycerol lipase,
`phospholipase A1
`Lipase
`Phospholipase A1, lipase
`Lysophospholipase
`Esterase
`Diacylglycerol lipase
`Peptidyl-tRNA hydrolase
`Lipase
`Chemotaxis-specific
`regulator protein,
`protein-glutamate
`methylesterase
`
`38
`39
`
`40
`41
`42
`43
`44
`45
`46
`47
`
`7996
`4994
`
`2593
`1493
`1059
`688
`869
`11,783
`43
`12,511
`
`29
`56
`
`32
`11
`11
`2
`6
`17
`1
`8
`
`15 (3EB9)
`12 (1LPB)
`
`35 (4GBG)
`1 (2YIJ)
`1 (1CJY)
`1 (3KVN)
`4 (1UWC)
`43 (4HOY)
`1 (1TIA)
`4 (1CHD)
`
`B, E
`B, E
`
`B, E
`B, E
`E
`B
`E
`B, E
`E
`A, B
`
`3.1.1.29
`
`3.1.1.31
`3.1.1.3, 3.1.1.32
`
`3.1.1.–
`3.1.1.32
`3.1.1.5
`3.1.1.–
`3.1.1.–
`3.1.1.29
`3.1.1.–
`3.1.1.61
`
`3.1.1.–
`
`3.1.1.58
`
`3.1.1.57, 3.5.1.–, 3.5.2.–
`
`Patatin-like fold
`48
`11,787
`
`a/b TIM barrel
`49
`8906
`
`50
`
`3765
`
`Seven-bladed b-propeller
`51
`8410
`
`40
`
`10
`
`7
`
`8
`
`4 (4PKB)
`
`B, E
`
`Patatin, patatin-like
`phospholipase family
`
`A, B, E Polysaccharide
`deacetylase
`2-Pyrone-4,6-dicarboxylate
`hydrolase,
`amidohydrolase
`
`B, E
`
`15 (3CL6)
`
`7 (4DI9)
`
`5 (3FGB)
`
`A, B, E 6-Phosphogluconolactonase,
`3-carboxy-muconate
`cyclase
`
`3.1.1.31, 5.5.1.5
`
`1946
`
`PROTEINSCIENCE.ORG
`
`Classification and Database of CEHs
`
`Exhibit 2077
`Page 05 of 12
`
`
`
`Number of
`sequences with
`evidence at
`protein level
`
`Number of
`known tertiary
`structures
`(representative
`PDB structures)
`
`Producing
`organismsa
`
`Dominant
`enzyme
`names
`
`Table I. Continued
`
`Family
`
`Number of
`sequences
`
`Three-solenoid fold
`52
`8166
`
`Four-layer a/b/b/a fold
`53
`1325
`
`Three-layer b/b/a fold
`54
`8877
`
`b-barrel
`55
`
`1024
`
`56
`
`3296
`
`Two-layer sandwich fold
`57
`3136
`
`b-sandwich
`58
`
`2129
`
`Not described
`59
`5037
`
`60
`
`395
`
`22
`
`2
`
`14
`
`2
`
`2
`
`94
`
`16
`
`4
`
`10
`
`3
`
`Dominant
`EC numbers
`
`3.1.1.11
`
`13 (2NSP)
`
`A, B, E Pectinesterase, pectin
`methylesterase
`
`3 (2WYM)
`
`A, B
`
`L-Ascorbate 6-phosphate
`lactonase, b-lactamase
`
`3.1.1.–, 3.5.2.6
`
`19 (3KO9)
`
`B, E
`
`1 (2ERV)
`
`B
`
`7 (1ILZ)
`
`B, E
`
`4 (1J26)
`
`B, E
`
`D-Tyrosyl-tRNA(Tyr)
`deacylase, D-aminoacyl-
`tRNA deacylase
`
`Outer membrane enzyme,
`deacylase, lipid A 3-O-
`deacylase
`Phospholipase A1
`
`Peptidyl-tRNA hydrolase,
`peptide chain release
`factor I
`
`3 (1BCI)
`
`B, E
`
`Cytosolic phospholipase
`A2
`
`5 (2RTX)
`
`B, E
`
`9 (4C7W)
`
`9 (2JZ7)
`
`V
`
`B
`
`3.1.–.–
`
`3.1.1.77
`
`3.1.1.32
`
`3.1.1.29
`
`3.1.1.4
`
`3.1.1.29
`
`3.1.1.53
`
`3.1.1.–
`
`Peptide chain release fac-
`tor 1, peptidyl-tRNA
`hydrolase
`Hemagglutinin-esterase,
`E3 glycoprotein
`Lipase, polyurethanase,
`hemolysin E
`Senescence-associated
`carboxylesterase
`Feruloyl esterase, tannase
`Phospholipase B-like 2
`
`1 (4NFU)
`
`1 (3WMT)
`5 (3FBX)
`
`E
`
`B, E
`B, E
`
`61
`
`62
`
`63
`64
`
`521
`
`386
`
`3312
`944
`
`No known tertiary structure
`65
`13
`66
`438
`
`67
`
`68
`
`69
`
`70
`
`71
`
`72
`
`73
`74
`75
`
`11
`
`728
`
`1234
`
`1334
`
`34
`
`2686
`
`335
`3091
`7331
`
`4
`
`5
`14
`
`1
`7
`
`11
`
`8
`
`4
`
`1
`
`1
`
`10
`
`1
`1
`14
`
`0
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`0
`0
`
`B
`B, E
`B, E
`
`E
`E
`
`E
`
`E
`
`B, E
`
`B, E
`
`B
`
`Phospholipase A2
`Putative peptidyl-tRNA
`hydrolase
`Phospholipase A1,
`phospholipase A2
`Groups XIIA and XIIB
`secretory phospholipase
`A2
`Poly(3-hydroxybutyrate)
`depolymerase, feruloyl
`esterase
`Carboxymethylenebuteno-
`lidase, dienelactone
`hydrolase
`Poly(3-hydroxyoctanoate)
`depolymerase
`A, B, E Polyhydroxybutyrate
`depolymerase family,
`acetylxylan esterase, fer-
`uloyl esterase, carbohy-
`drate esterase family 1
`Phospholipase A1
`Lysophospholipase L2
`GDSL family esterase/
`lipase
`
`3.1.1.1
`
`3.1.1.20, 3.1.1.73
`3.1.1.–
`
`3.1.1.4
`3.1.1.29
`
`3.1.1.4, 3.1.1.32
`
`3.1.1.4
`
`3.1.1.75, 3.1.1.73
`
`3.1.1.45
`
`3.1.1.76
`
`3.1.1.75, 3.1.1.72, 3.1.1.73
`
`3.1.1.32
`3.1.1.5
`3.1.1.–
`
`Chen et al.
`
`PROTEIN SCIENCE VOL 25:1942—1953
`
`1947
`
`Exhibit 2077
`Page 06 of 12
`
`
`
`Table I. Continued
`
`Number of
`sequences with
`evidence at
`protein level
`
`Number of
`known tertiary
`structures
`(representative
`PDB structures)
`
`Producing
`organismsa
`
`Dominant
`enzyme
`names
`
`Family
`
`Number of
`sequences
`
`76
`77
`78
`
`79
`
`80
`
`81
`
`82
`
`83
`
`84
`85
`86
`87
`88
`89
`
`90
`
`91
`
`261
`284
`688
`
`514
`
`1828
`
`212
`
`192
`
`2953
`
`332
`55
`799
`439
`2378
`465
`
`1666
`
`862
`
`4
`1
`2
`
`3
`
`10
`
`1
`
`2
`
`3
`
`3
`3
`1
`2
`34
`3
`
`17
`
`14
`
`0
`0
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`0
`0
`0
`0
`0
`
`0
`
`0
`
`B, E
`E
`B, E
`
`E
`
`B, E
`
`E
`
`E
`
`B, E
`
`E
`B
`B, E
`B
`E
`B
`
`B, E
`
`B, E
`
`Chlorophyllase
`Triacylglycerol lipase
`Acetylhydrolase, esterase,
`lipase, a/b-hydrolase
`Triglyceride lipase, choles-
`terol esterase, lysosomal
`acid lipase
`Patatin-like phospholipase
`domain
`ATG15 protein, triacylgly-
`cerol lipase
`Steroyl esterase, YEH1p,
`YEH2p
`Sialate O-acetylesterase,
`9-O-acetylesterase
`Acyloxyacyl hydrolase
`EstP (carboxylesterase)
`Lipase
`Lipase
`Phospholipase DDHD
`D-(2)23-hydroxybutyrate
`oligomer hydrolase,
`cytochrome C1
`Patatin-like phospholipase
`domain protein, triacyl-
`glycerol lipase
`Phospholipase B1
`
`Most prevalent producers are bolded.
`a A, archaea; B, bacteria; E, eukaryota; V, virus.
`
`Dominant
`EC numbers
`
`3.1.1.14
`3.1.1.3
`3.1.1.–
`
`3.1.1.–, 3.1.1.13
`
`3.1.1.–
`
`3.1.1.3
`
`3.1.1.13
`
`3.1.1.53
`
`3.1.1.77
`3.1.1.1
`3.1.1.–
`3.1.1.–
`3.1.1.–
`3.1.1.22
`
`3.1.1.–, 3.1.1.3
`
`3.1.1.–
`
`Figure S1. The conserved amino acid counts in each
`family are summarized in Supporting Information Table
`SI. In the great majority of cases, at least one and usual-
`ly substantially more amino acid residues are totally
`conserved over all primary structures. A roughly equal
`number of residues of chemically similar but not identi-
`cal amino acid residues are also conserved. The second-
`ary structures of a representative of each family with a
`known tertiary structure are shown in Supporting
`Information Figure S2. Each family member has similar
`secondary structures in its core, but some members
`have either extra or missing a-helices or b-strands.
`RMSD and Pavg values obtained by tertiary structure
`superposition also appear in Supporting Information
`Table SI. In a large fraction of all cases where multiple
`tertiary structures in a single family are available,
`RMSD < 1.5 A˚ and Pavg > 90%.
`A large majority of families have >1000 primary
`structures, and most of the rest have 100–1000
`sequences (Table I). As expected, all families have at
`least one sequence with evidence at protein level.
`Among them, 64 of the 91 families have known ter-
`tiary structures, with 27 families having none.
`Many families have members produced by
`organisms in all
`three life kingdoms (Table I).
`
`Members of other families are produced by organ-
`isms in two kingdoms, usually bacteria and eukar-
`yota, or in a single kingdom (Table I).
`A total of 36 families were grouped into five
`clans by having secondary and tertiary structures
`that could be closely superimposed (Table I), even
`though their primary structures may not be signifi-
`cantly similar. Supporting Information Table SII
`shows RMSD and Pavg values and protein folds of
`each clan. In all cases, RMSD < 2.5 A˚ . However, Pavg
`values cover a wide range. Tertiary structures of
`families within clans are more variable than those
`of family members, as the former do not share simi-
`lar primary structures while the latter do.
`Each clan has characteristic tertiary folds (Fig. 1).
`Clan A CEHs all have standard a/b hydrolase folds,
`in which the second b-strand is antiparallel to the
`others in the b-sheet.3,4 The b-sheet generally has
`eight b-strands, which are found in the order 1, 2, 4,
`3, 5, 6, 7, and 8, based on their amino acid residue
`numbers. Clan B members have similar tertiary struc-
`tures as those in clan A; however, all their b-strands
`are parallel to each other, proceeding in the same
`direction, and they are arranged in the order 2, 1, 3,
`4, and 5. A sixth b-strand, if present, may be found
`
`1948
`
`PROTEINSCIENCE.ORG
`
`Classification and Database of CEHs
`
`Exhibit 2077
`Page 07 of 12
`
`
`
`Figure 1. Tertiary structures of representative families from the five CEH clans. Clan A: family 2, Burkholderia xenovorans 3-
`oxoadipate enol-lactonase, PDB 2XUA. Clan B: family 17, Trichoderma reesei cutinase, PDB 4PSD. Clan C: family 30, Escheri-
`chia coli esterase, PDB 3BF8. Clan D: family 32, mouse SMP30/GNL hydrolase, PDB 4GN9. Clan E: family 34, Daboia russelii
`phospholipase A2, PDB 1TG1.
`
`before or after the fifth one. Clan C enzymes have a/b
`hydrolase-like tertiary structures as well, but their first
`rather than their second b-strands are antiparallel to
`the others on the same b-sheet. The b-strand order is 1,
`3, 2, 4, 5, 6, and 7. The tertiary structures of clan D
`members are six-bladed b-propeller folds, where each
`blade, consisting of four b-sheets, shares a central axis.
`Clan E enzymes have three-a-helix up-down bundle ter-
`tiary structures.
`At present, over 60% of the CEH families are not
`part of clans, either because they have no known ter-
`tiary structures or because their members cannot be
`superimposed well on the tertiary structures of the
`members of existing clans or with each other, even
`when they share much the same fold. A total of 22 of
`these families have known tertiary structures desig-
`nated by CATH or the Structural Classification of
`Proteins (SCOP) database,27 such as three-layer a/b/
`a sandwiches (Rossmann folds), a/b-hydrolase struc-
`tures, patatin-like folds, a/b-TIM barrels, seven-
`bladed b-propellers, three-solenoid folds, three-layer
`b/b/a sandwiches, b-barrels,
`two-layer
`sandwich
`folds, and b-sandwiches (Table I). A further six fami-
`lies have tertiary structures that are not classified by
`either CATH or SCOP.
`Of the most studied CEHs, large numbers of
`named carboxylesterases or EC 3.1.1.1 designations
`are found in families 12 (clan A), 26, 28, 29 (clan C),
`62, and 85, of those not in any clan (Table I). Fami-
`lies 4 (clan A), 21, 24 (clan B), 39, 77, 79, 81, and 90
`
`(no clan) have triacylglycerol lipases (EC 3.1.1.3).
`Phospholipase A2s (EC 3.1.1.4) exist in families 23
`(clan B), 34–36 (clan E), 58, 65, 67, and 68 (no clan).
`Lysophospholipases (EC 3.1.1.5) are in families 27
`(clan C), 42, and 74 (no clan). Acetylcholinesterases
`(EC 3.1.1.7) and butyrylcholinesterases (often simply
`called cholinesterases) (EC 3.1.1.8) are found only in
`family 12 (clan A), as are cocaine esterases (EC
`3.1.1.84). Cutinases (EC 3.1.1.74) are in family 17
`(clan B). Enzyme name and EC number assignments
`tend to be somewhat elastic and arbitrary, causing
`some spurious variation, and some families now not
`assigned to clans will eventually find their way into
`them once they have known tertiary structures.
`Therefore, these lists are subject to future revision.
`However, it is evident by the number of sequences
`in each family that carboxylesterases are mainly in
`families 26, 28, and 29 (clan C), triacylglycerol
`lipases exist largely in families 21, 24 (clan B), and
`39 (no clan), and phospholipase A2s are found pre-
`dominantly throughout clan E and in family 58 (no
`clan).
`
`CEH Mechanisms
`Table II presents catalytic residues from representa-
`tive families from each of the 64 families that have
`known tertiary structures, gathered from the
`articles corresponding to the PDB structures listed
`there. Clans A, B, and C all have catalytic residues
`characteristic of serine protease mechanisms, with
`
`Chen et al.
`
`PROTEIN SCIENCE VOL 25:1942—1953
`
`1949
`
`Exhibit 2077
`Page 08 of 12
`
`
`
`Table II. Catalytic Residues of Carboxylic Ester Hydrolase Families with Known Tertiary Structures
`
`Clan
`
`Family
`
`PDB
`designation
`
`Producing species and enzyme
`
`Catalytic residues
`
`A
`
`A
`A
`A
`
`A
`
`A
`A
`
`A
`
`A
`A
`A
`A
`A
`
`B
`B
`
`B
`B
`B
`B
`
`B
`B
`
`B
`B
`B
`C
`
`C
`
`C
`C
`
`C
`
`C
`C
`
`D
`D
`
`E
`
`E
`E
`–
`
`–
`
`–
`–
`–
`–
`–
`
`–
`
`1
`
`2
`3
`4
`
`5
`
`6
`7
`
`8
`
`9
`10
`11
`12
`13
`
`14
`15
`
`16
`17
`18
`19
`
`20
`21
`
`22
`23
`24
`25
`
`26
`
`27
`28
`
`29
`
`30
`31
`
`32
`33
`
`34
`
`35
`36
`37
`
`38
`
`39
`40
`41
`42
`43
`
`44
`
`1QIT
`
`4CCY
`4KRX
`4CG1
`
`2XLB
`
`3C5V
`3D59
`
`4G4G
`
`1K8Q
`–
`4UYU
`2JGJ
`1JJF
`
`–
`1U8U
`
`1I6W
`4PSD
`1VYH
`1DEO
`
`1EB8
`4HYQ
`
`1CVL
`4X92
`4FDM
`1DIN
`
`4ETW
`
`3HJU
`3CN9
`
`1TQH
`
`3BF7
`1LBS
`
`2DG0
`1V04
`
`1PPA
`
`1POC
`2WG7
`1RZW
`
`3EB9
`
`1RP1
`1TGL
`–
`1CJY
`3KVN
`
`1UWC
`
`Moorea producens (Lyngbya majuscula)
`decarboxylating thioesterase
`Bacillus subtilus carboxylesterase CesB
`Escherichia coli acetyl esterase Aes
`Thermobifida fusca polyethylene tere-
`phthalate degrading hydrolase
`Bacillus pumilus (Bacillus mesentericus)
`acetyl xylan esterase
`Human PP2A-specific methylesterase
`Human plasma platelet-activating factor
`acetylhydrolase
`Sporotrichum thermophile (Mycelioph-
`thora thermophila) glucuronyl
`esterase
`Dog gastric lipase
`
`–
`Human WNT deacylase notum
`Mouse acetylcholinesterase
`Clostridium thermocellum feruloyl
`esterase
`
`–
`Escherichia coli thioesterase/protease/
`lysophospholipase L1
`Bacillus subtilis lipase
`Trichoderma reesei cutinase
`Mouse PAF-AH holoenzyme
`Aspergillus aculeatus rhamnogalactur-
`onan acetylesterase
`Manihot esculenta hydroxynitrile lyase
`Streptomyces albidoflavus phospholipase
`A1
`Chromobacterium viscosum lipase
`Human lysosomal phospholipase A2
`Bacillus L2 lipase
`Pseudomonas knackmussi dienelactone
`hydrolase
`Shigella flexneri enzyme/ACP substrate
`gatekeeper
`Human monoglyceride lipase
`Pseudomonas aeruginosa
`carboxylesterase
`Geobacillus stearothermophilus
`carboxylesterase Est30
`Escherichia coli esterase
`Moesziomyces antarticus triacylglycerol
`hydrolase
`Staphylococcus aureus lactonase
`Human/rabbit/mouse/rat serum
`paraoxonase
`Agkistrodon piscivorus lysine 49 phos-
`pholipase A2
`Apis mellifera venom phospholipase A2
`Rice class X1b phospholipase A2
`Archaeglobus fulgidis peptidyl-tRNA
`hydrolase
`Trypanosoma brucei 6-
`phosphogluconolactonase
`Dog pancreatic lipase related protein
`Rhizomucor michei triacylglycerol lipase
`–
`Human cytosolic phospholipase A2
`Pseudomonas aeruginosa autotrans-
`porter EstA
`Aspergillus niger feruloyl esterase
`
`S100/E124/H266
`
`S130/E245/H274
`S165/D262/H292
`S130/D176/H208
`
`S181/D269/H298
`
`S156/D181/H349
`S273/D296/H351
`
`S213/E236/H346
`
`S153/D324/H353
`–
`S232/D340/H389
`S203/E334/H447
`S172/D230/H260
`
`–
`S10/D154/H157
`
`S77/D133/H156
`S164/D216/H229
`S48/D/193/H196
`S9/D192/H195
`
`S80/D208/H236
`S11/H218
`
`S87/D263/H285
`S165/D327/H359
`S113/D317/H358
`C123/D171/H202
`
`S82/D207/H235
`
`S121/D239/H269
`S113/D166/H197
`
`S94/D193/H223
`
`S89/D113/S206/H234
`S105/D187/H224
`
`D138/D236
`H115/H134
`
`H48/Y73/D99
`
`H34/D35
`H61/D62
`H20/D93/H113
`
`D163/H165
`
`S152/D176/H263
`S144/D203/H257
`–
`S228/D549
`S14/D286/H289
`
`S133/D194/H247
`
`1950
`
`PROTEINSCIENCE.ORG
`
`Classification and Database of CEHs
`
`Exhibit 2077
`Page 09 of 12
`
`
`
`Table II. Continued
`
`Clan
`
`Family
`
`PDB
`designation
`
`Producing species and enzyme
`
`Catalytic residues
`
`–
`
`–
`–
`
`–
`–
`–
`
`–
`–
`
`–
`
`–
`
`–
`–
`–
`–
`–
`
`–
`
`–
`–
`–
`–
`
`45
`
`46
`47
`
`48
`49
`50
`
`51
`52
`
`53
`
`54
`
`55
`56
`57
`58
`59
`
`60
`
`61
`62
`63
`64
`
`4HOY
`
`1TIA
`1CHD
`
`4PKB
`3CL6
`4DI9
`
`–
`2NSP
`
`2WYM
`
`3KO9
`
`2ERV
`1ILZ
`–
`1CJY
`2JY9
`
`1FLC
`
`2Z8X
`4NFU
`3WMT
`–
`
`Acinetobacter baumanni peptidyl-tRNA
`hydrolase
`Penicillium camemberti