throbber
REVIEW
`
`Carboxylic ester hydrolases: Classification
`and database derived from their primary,
`secondary, and tertiary structures
`
`Yingfei Chen,1 Daniel S. Black,2 and Peter J. Reilly1*
`
`1Department of Chemical and Biological Engineering, Iowa State University, Ames, Iowa 50011
`2Information Technology Services, Iowa State University, Ames, Iowa 50011
`
`Received 6 August 2016; Accepted 12 August 2016
`DOI: 10.1002/pro.3016
`Published online 17 August 2016 proteinscience.org
`
`Abstract: We classified the carboxylic ester hydrolases (CEHs) into families and clans by use of
`multiple sequence alignments, secondary structure analysis, and tertiary structure superposi-
`tions. Our work for the first time has fully established their systematic structural classification.
`Family members have similar primary, secondary, and tertiary structures, and their active sites
`and reaction mechanisms are conserved. Families may be gathered into clans by their having
`similar secondary and tertiary structures, even though primary structures of members of different
`families are not similar. CEHs were gathered from public databases by use of Basic Local Align-
`ment Search Tool (BLAST) and divided into 91 families, with 36 families being grouped into five
`clans. Members of one clan have standard a/b-hydrolase folds, while those of other two clans
`have similar folds but with different sequences of their b-strands. The other two clans have mem-
`bers with six-bladed b-propeller and three-a-helix bundle tertiary structures. Those families not in
`clans have a large variety of structures or have no members with known structures. At the time
`of writing, the 91 families contained 321,830 primary structures and 1378 tertiary structures. From
`these data, we constructed an accessible database: CASTLE (CArboxylic eSTer hydroLasEs,
`http://www.castle.cbe.iastate.edu).
`
`Keywords: carboxylesterases; cholinesterases; cocaine esterases; cutinases; lysopholipases; phos-
`pholipases; triacylglycerol lipases
`
`Introduction
`Carboxylic ester hydrolases (CEHs) catalyze the
`hydrolysis of ester bonds into alcohols and carboxylic
`acids, and they are ubiquitous throughout life. Mem-
`bers of this enzyme group that attack different sub-
`strates, form different products, and have different
`
`Additional Supporting Information may be found in the online
`version of this article.
`
`*Correspondence to: Peter J. Reilly, Department of Chemical
`and Biological Engineering, 2114 Sweeney Hall, Iowa State Uni-
`versity, Ames, IA 50011-2230. E-mail: reilly@iastate.edu
`
`names are listed as EC 3.1.1.1 to EC 3.1.1.98, with
`entries.1 Carboxylesterases
`seven deleted
`(EC
`3.1.1.1), triacylglycerol lipases (EC 3.1.1.3), phospho-
`lipase A2s (EC 3.1.1.4),
`lysophospholipases (EC
`3.1.1.5), acetylcholinesterases (EC 3.1.1.7), butyryl-
`cholinesterases (EC 3.1.1.8), aminoacyl-tRNA hydro-
`lases (EC 3.1.1.29), and cocaine esterases (EC
`3.1.1.84) are the most extensively researched CEHs.
`According to the CATH database,2 many CEHs
`have standard a/b hydrolase folds, which are com-
`posed of three a/b/a layers, with the second b-strand
`
`1942
`
`PROTEIN SCIENCE 2016 VOL 25:1942—1953
`
`Published by Wiley-Blackwell. VC 2016 The Protein Society
`
`Exhibit 2077
`Page 01 of 12
`
`

`

`being antiparallel to generally seven others in the b-
`sheet.3,4 Other CEHs with other types of a/b hydro-
`lase folds have different arrangements of their a-
`helices and b-strands. Some CEHs have
`six-
`propeller folds, which consist of a six-bladed b-sheet
`with a central axis. Others have four-layer b-sand-
`wich folds, where several antiparallel b-strands are
`arranged in two b-sheets. Three-solenoid folds are
`also found in CEH structures; they consist of many
`parallel b-strands arranged into three b-sheets. The
`outer-membrane CEHs are commonly found in b-
`barrel folds.
`To this point, there is no systematic structural
`classification of the CEHs. Such a classification is likely
`to yield very different results than found with Enzyme
`Commission (EC) numbering, because enzymes with
`different EC numbers and different names may have
`very similar amino acid sequences (primary structures)
`and three-dimensional (tertiary) structures. This has
`been demonstrated in the Carbohydrate-Active EnzYme
`(ThYme) databases and Thioester-Active EnzYme data-
`bases,5,6 as well as elsewhere.
`Earlier research has partially covered this topic,
`but with many fewer primary structures than
`amassed in this project. The ESTerases and a/b-
`Hydrolase Enzymes and Relatives (ESTHERs) data-
`base7,8 covers some CEHs, focusing on a/b hydrolase
`structures. It is not limited to CEHs, but includes
`other enzymes such as peptidases and thioesterases
`that have this fold. ESTHER classifies sequences
`into three levels: blocks, rank 1 families, and rank 2
`families, where each block is based on conserved
`and characteristic parts of sequences. At the time of
`writing, it had over 40,000 primary structures clas-
`sified into four blocks, 94 rank 1 families, and 93
`rank 2 families further divided from 11 rank 1
`families.
`The CAZy database has classified 15 homolo-
`gous families of carbohydrate esterases, which cata-
`lyze the de-O- or de-N-acylation of substituted
`saccharides, by their primary structures. Twelve of
`these families contain CEHs, mainly acetyl xylan
`esterases, and mainly with a/b hydrolase folds. At
`the time of writing, about 32,000 primary structures
`of carbohydrate esterases were included, of which
`almost 21,000 were CEHs.
`The lipase engineering database (LED)9 classi-
`fied three classes and 38 superfamilies with a/b
`hydrolase folds, of which 16 included lipases and 10
`covered other CEHs, by their functions, sequences,
`and crystal structures. Its founders employed much
`smaller E-values in their use of Basic Local Align-
`(BLAST)10 to gather primary
`ment Search Tool
`structures than used in this work, implying that the
`family members in LED were more similar to each
`other than here, perhaps leading to more families. It
`encompassed 112 homologous families and almost
`
`25,000 primary and over 1100 tertiary structures.
`However, it has not been maintained since 2009.
`The MELDB database sorted microbial carboxy-
`lesterases and triacylglycerol lipases by their prima-
`ry structure similarities.11 It corresponded to parts
`of the LED database, but it appears to be no longer
`available.
`The research reported in this article systemati-
`cally classifies the CEHs by their primary, second-
`ary, and tertiary structure similarities, as opposed to
`classifying them by their EC numbers. This will cast
`light on the various ways that CEHs with different
`structures catalyze the hydrolysis of ester bonds to
`yield carboxylic acids and alcohols.
`
`Potential Family Identification
`CEH families were generally identified by the tech-
`niques used for classifying fatty acid synthesis
`enzymes.6,12,13 All the primary structures of CEHs,
`chosen by their EC numbers, with evidence at pro-
`tein level in the UniProt database14 were collected.
`These totaled 752 sequences. The criterion of evi-
`dence at protein level is to ensure that wet laborato-
`ry experiments had been conducted on these
`proteins to verify their functions as CEHs. This cri-
`terion ruled out most available CEH sequences,
`mainly those obtained from whole-genome projects,
`whose functions are putative because their sequen-
`ces have been compared only with those of known
`CEHs rather than being verified experimentally.
`The collected primary structures were checked
`on the Pfam database15 to obtain their catalytic
`domains only. BLAST was used consecutively to find
`primary structures similar to these catalytic domains
`(query sequences) from the National Center for Bio-
`technology Information’s up-to-date nr database,16
`which gathers nonredundant protein sequences from
`various sources such as the GenBank,17 Protein Data
`Bank (PDB),18 Protein Information Resource,19 Pro-
`tein Research Foundation,20 RefSeq,16 and Swiss-
`Prot14 databases. The threshold E-value in BLAST
`was set to 0.001. Protein sequences with E-values
`lower than 0.001 were regarded as similar enough to
`the query sequence to be included in one potential
`family.10 In-house Python and shell scripts were
`implemented to automate the process of obtaining
`catalytic domains of query sequences in Pfam, to con-
`duct BLAST consecutively, and to further analyze the
`results of structure comparison. All the scripts were
`run on the Google cloud platform with Linux Cent
`OS7 installed.
`
`Family Verification
`secondary
`(MSA),
`Multiple
`sequence alignment
`structure comparison, and tertiary structure super-
`position are the three main techniques to verify
`membership in the potential
`families, possibly
`
`Chen et al.
`
`PROTEIN SCIENCE VOL 25:1942—1953
`
`1943
`
`Exhibit 2077
`Page 02 of 12
`
`

`

`merging or splitting them. It is assumed that all
`members of a family have the same protein ancestor.
`A random sample of sequences in each potential
`family was used to perform MSA with ClustalX
`2.1.21 The alignment is to ensure that these sequen-
`ces are similar enough, with several positions of
`amino acid residues conserved along the entire sam-
`ple. Different potential families gathered by BLAST
`were subjected to joint MSA to ascertain whether
`they could be merged into one family. Conversely, if
`no amino acid residue is conserved and if clear dif-
`ferences are observed in the MSA result, then the
`potential family was split into two or more families.
`Occasionally no residue is conserved in what clearly
`is a family because of a sequence error in one or a
`few of its members.
`Up to 50 tertiary structures from each potential
`family, if available in the PDB, were superimposed
`by MultiProt.22 The monomer of each tertiary struc-
`ture was extracted and compared. The root mean
`square deviation (RMSD) of the a-carbon atoms of
`the different
`tertiary structures was calculated,
`together with the Pavg, a measure of the percentage
`of these atoms that are close enough (<4.0 A˚ ) to be
`compared.12,13
`A further criterion to verify family membership
`is that active sites of potential members should
`remain in similar positions within each family. Also,
`secondary structure elements, based on the DSSP
`database23,24 embedded in the PDB, were compared
`and analyzed to ensure that potential members of
`each family have almost the same elements.
`Finally, memberships of potential families were
`manually inspected to confirm that they held a sig-
`nificant number of entries with names and EC num-
`bers specific to CEHs.
`
`Clan Identification and Verification
`Clans are composed of two or more different fami-
`lies, where their active sites, reaction mechanisms,
`and secondary and tertiary structures are conserved
`from family to family, although their primary struc-
`tures may not be significantly similar from one fami-
`ly to the next. It is assumed that family members of
`different clans are more distantly derived from the
`same protein ancestor than are members of the
`same family.
`We used CATH-defined folds to first divide
`available tertiary structures in different families
`into separate groups. We then used two separate
`procedures to determine membership of families in
`clans. In the first, tertiary structure representatives
`from different families with similar tertiary struc-
`tures were superimposed by MultiProt. Varying from
`previous methods to calculate RMSD and Pavg val-
`ues of members of all potential families in a clan,
`pairwise RMSD and Pavg values after overlapping
`representative tertiary structures from pairs of
`
`families were calculated. This variation was caused
`by the large number of families with similar folds,
`making it difficult to visually distinguish them by
`PyMOL.25 The MultiProt superposition, along with
`RMSD and Pavg calculations, were implemented by
`Python scripts, and RMSD and Pavg values were
`recorded in matrices. To cluster similar structures
`into potential clans, the pairwise RMSD matrices
`were imported into MEGA 6.06,26 and neighbor-
`joining trees were produced as curved and circular
`trees. Although MEGA was intended to produce phy-
`logenetic trees for the study of molecular evolution,
`the pairwise distance matrix used in MEGA is simi-
`lar enough to be used for RMSD matrices. Potential
`clans were proposed according to the trees. Then the
`structures of potential clan members were superim-
`posed and inspected in PyMOL.
`In the second procedure, similar secondary and
`tertiary structures from different
`families were
`grouped roughly into potential clans, and then their
`structures were superimposed by MultiProt and
`visually inspected by PyMOL. If the superposition
`were satisfactory, RMSD and Pavg values for single
`representatives of all the potential family members
`of a potential clan were calculated. The proposed
`classification was tuned until the structures super-
`imposed in PyMOL were in good alignment and
`RMSD and Pavg values were minimized. Active sites
`were checked, if available, to see whether the cata-
`lytic residues are in similar positions to act on the
`substrates, and whether they share the same mecha-
`nism in each clan.
`Interestingly, the initial pairwise alignment pro-
`cedure did not perform as well as the initial visual
`inspection procedure. Therefore, the latter technique
`was chosen to assign families to clans.
`
`Results
`BLAST searches using the 752 query sequences
`yielded 480,148 primary structures of CEHs and
`other enzymes. In addition, 2101 tertiary structures
`were gathered from the PDB. The primary struc-
`tures were classified into 130 potential families.
`The membership of each potential family was
`verified by three methods: MSA of primary struc-
`tures, secondary structure analysis, and tertiary
`structure superpositions. The potential 130 families
`obtained by BLAST became 91 families after MSA
`using ClustalX, secondary structure analysis, and
`tertiary structure superposition by MultiProt and
`PyMOL, and after noting that some potential fami-
`lies had no or very few CEH members (Table I).
`After these operations, 321,830 primary structures,
`1490 sequences with evidence at protein level, and
`1378 tertiary structures remained.
`The ClustalX sequence alignment files, using 50
`representative sequences of each family (or all that
`were available, if <50), are in Supporting Information
`
`1944
`
`PROTEINSCIENCE.ORG
`
`Classification and Database of CEHs
`
`Exhibit 2077
`Page 03 of 12
`
`

`

`Table I. Clans and Families of Carboxylic Ester Hydrolases
`
`Number of
`sequences with
`evidence at
`protein level
`
`Number of
`known tertiary
`structures
`(representative
`PDB structures)
`
`Family
`
`Number of
`sequences
`
`Producing
`organismsa
`
`Dominant
`enzyme
`names
`
`Dominant
`EC numbers
`
`Clan A (a/b-hydrolase, three-layer a/b/a sandwich, Rossmann fold, second b-strand antiparallel with sequence 1, 2, 4, 3, 5,
`6, 7, and 8)
`1216
`
`1
`
`5
`
`1 (3QIT)
`
`B, Ea
`
`3.1.1.–
`
`2
`
`3
`
`4
`
`5
`6
`
`7
`
`8
`9
`
`10
`
`11
`
`31,202
`
`26,277
`
`497
`
`2191
`1181
`
`1437
`
`730
`2432
`
`3896
`
`1359
`
`41
`
`69
`
`5
`
`5
`10
`
`8
`
`2
`13
`
`3
`
`4
`
`131 (5ALM)
`
`53 (3L1J)
`
`10 (4CG1)
`
`14 (3FYU)
`2 (3C5V)
`
`B
`
`B
`B, E
`
`3.1.1.24
`
`3.1.1.–
`
`3.1.1.3
`
`3.1.1.72
`3.1.1.89, 1.11.1.–
`
`6 (3D59)
`
`B, E
`
`3.1.1.–, 3.1.1.47
`
`4 (4G4G)
`2 (1K8Q)
`
`2 (3HXK)
`
`14 (4UYU)
`
`B, E
`E
`
`B, E
`
`B, E
`
`3.1.1.72
`3.1.1.–
`
`3.2.1.8, 3.1.1.–
`
`3.1.1.–
`
`169
`
`280 (2JGJ)
`
`a/b-Hydrolase, esterase,
`thioesterase
`A, B, E a/b-Hydrolase, 3-
`oxoadipate enol-
`lactonase
`A, B, E a/b-Hydrolase, acetyles-
`terase, esterase/lipase
`Lipase, triacylglycerol
`lipase
`Acetyl xylan esterase
`Protein phosphatase
`methylesterase,
`peroxidase
`Carboxylic ester hydro-
`lase, platelet-activating
`factor acetylhydrolase
`Acetyl xylan esterase
`Lysomal acid lipase,
`lipase member M
`Xylanase, pectin acetyles-
`terase, esterase
`Pectin acetylesterase, pro-
`tein notum homolog
`A, B, E Carboxylesterase, carbox-
`ylic ester hydrolase,
`acetylcholinesterase,
`cholinesterase, cocaine
`esterase
`Esterase
`
`B, E
`
`3.1.1.8, 3.1.1.84
`
`3.1.1.–
`
`12
`
`24,560
`
`13
`
`4538
`
`6
`
`19 (3ZI7)
`
`Clan B (a/b-hydrolase, three-layer a/b/a sandwich, Rossmann fold, all b-strands parallel with sequence 2, 1, 3, 4, and 5)
`14
`410
`5
`2 (1YQE)
`A, E
`D-Aminoacyl-tRNA
`3.1.1.96
`deacylase
`GDSL-like lipase, aryles-
`terase, acyl-CoA
`thioesterase
`Lipase
`Cutinase, acetyl xylan
`esterase
`GDSL-like lipase,
`acetylhydrolase
`A, B, E Rhamnogalacturonan ace-
`tylesterase, GDSL family
`lipase, carbohydrate
`esterase family 12 protein
`a/b-Hydrolase, esterase
`GDSL family lipase,
`triacylglycerol lipase
`Lipase, lactonizing lipase
`Phospholipase A2,
`lecithin-cholesterol
`acyltransferase
`Lipase, triacylglycerol
`lipase
`
`15
`
`16
`17
`
`18
`
`19
`
`20
`21
`
`22
`23
`
`24
`
`5264
`
`1869
`2960
`
`1463
`
`2262
`
`1717
`1985
`
`2374
`498
`
`1537
`
`4
`
`2
`12
`
`14
`
`6
`
`15
`5
`
`9
`8
`
`17
`
`7 (1U8U)
`
`B
`
`3.1.1.–, 3.1.1.2, 3.1.2.2
`
`13 (1R50)
`60 (1XZG)
`
`10 (1BWQ)
`
`6 (1DEO)
`
`43 (1EB8)
`4 (1ESD)
`
`18 (1CVL)
`7 (4X92)
`
`B, E
`B, E
`
`B, E
`
`B, E
`B, E
`
`B, E
`B, E
`
`3.1.1.–
`3.1.1.74, 3.1.1.72
`
`3.1.1.–
`
`3.1.1.86, 3.1.1.–
`
`3.1.1.–
`3.1.1.–, 3.1.1.3
`
`3.1.1.3
`3.1.1.4, 2.3.1.43
`
`15 (4X71)
`
`B, E
`
`3.1.1.–, 3.1.1.3
`
`Clan C (a/b-hydrolase, three-layer a/b/a sandwich, Rossmann fold, first b-strand antiparallel with sequence 1, 3, 2, 4, 5, 6,
`and 7)
`25
`
`A, B, E Carboxymethylenebuteno-
`lidase, dienelactone
`hydrolase
`
`3.1.1.45
`
`6115
`
`6
`
`19 (1ZJ5)
`
`Chen et al.
`
`PROTEIN SCIENCE VOL 25:1942—1953
`
`1945
`
`Exhibit 2077
`Page 04 of 12
`
`

`

`Table I. Continued
`
`Family
`
`Number of
`sequences
`
`Number of
`sequences with
`evidence at
`protein level
`
`26
`
`27
`
`28
`
`29
`30
`
`8649
`
`4643
`
`5655
`
`4262
`1818
`
`9
`
`15
`
`30
`
`3
`4
`
`Number of
`known tertiary
`structures
`(representative
`PDB structures)
`
`14 (4ETW)
`
`5 (4UUQ)
`
`9 (3CN9)
`
`18 (1TQH)
`2 (3BF8)
`
`31
`
`314
`
`1
`
`14 (1LBS)
`
`Clan D (six-bladed b-propeller)
`32
`8864
`16
`
`42 (4GN9)
`
`33
`
`672
`
`18
`
`7 (3SRE)
`
`Producing
`organismsa
`
`Dominant
`enzyme
`names
`
`B, E
`
`A, B, E a/b-Hydrolase, hydrolase
`(biotin biosynthesis),
`carboxylesterase
`A, B, E a/b-Hydrolase,
`lysophospholipase
`Carboxylesterase,
`phospholipase
`Esterase, carboxylesterase
`A, B
`A, B, E a/b-Hydrolase, 3-
`oxoadipate enol
`lactonase, 2-succinyl-6-
`hydroxy-2,4-
`cyclohexadiene-1-
`carboxylate synthase
`Lipase
`
`B, E
`
`A, B, E Gluconolactonase, SMP-
`30/gluconolactone/LRE-
`like region
`Serum paraoxonase/
`arylesterase 2
`
`B, E
`
`Dominant
`EC numbers
`
`3.1.1.1
`
`3.1.1.5
`
`3.1.1.1, 3.1.1.–
`
`3.1.1.–, 3.1.1.1
`3.1.1.24, 4.2.99.20
`
`3.1.1.–
`
`3.1.1.17
`
`3.1.1.2, 3.1.1.81
`
`Clan E (three a-helix bundle)
`34
`2791
`325
`35
`738
`14
`36
`269
`5
`
`288 (1TG1)
`1 (1POC)
`3 (2WG7)
`
`E
`B, E
`B, E
`
`Phospholipase A2
`Phospholipase A2
`Phospholipase A2
`
`3.1.1.4
`3.1.1.4
`3.1.1.4
`
`Not part of a clan
`(a/b-Hydrolase, three-layer a/b/a sandwich, Rossmann fold, various b-strand arrangements)
`37
`1715
`15
`8 (3ERJ)
`A, B, E Peptidyl-tRNA
`hydrolase
`6-Phosphogluconolactonase
`Pancreatic glycerol lipase,
`phospholipase A1
`Lipase
`Phospholipase A1, lipase
`Lysophospholipase
`Esterase
`Diacylglycerol lipase
`Peptidyl-tRNA hydrolase
`Lipase
`Chemotaxis-specific
`regulator protein,
`protein-glutamate
`methylesterase
`
`38
`39
`
`40
`41
`42
`43
`44
`45
`46
`47
`
`7996
`4994
`
`2593
`1493
`1059
`688
`869
`11,783
`43
`12,511
`
`29
`56
`
`32
`11
`11
`2
`6
`17
`1
`8
`
`15 (3EB9)
`12 (1LPB)
`
`35 (4GBG)
`1 (2YIJ)
`1 (1CJY)
`1 (3KVN)
`4 (1UWC)
`43 (4HOY)
`1 (1TIA)
`4 (1CHD)
`
`B, E
`B, E
`
`B, E
`B, E
`E
`B
`E
`B, E
`E
`A, B
`
`3.1.1.29
`
`3.1.1.31
`3.1.1.3, 3.1.1.32
`
`3.1.1.–
`3.1.1.32
`3.1.1.5
`3.1.1.–
`3.1.1.–
`3.1.1.29
`3.1.1.–
`3.1.1.61
`
`3.1.1.–
`
`3.1.1.58
`
`3.1.1.57, 3.5.1.–, 3.5.2.–
`
`Patatin-like fold
`48
`11,787
`
`a/b TIM barrel
`49
`8906
`
`50
`
`3765
`
`Seven-bladed b-propeller
`51
`8410
`
`40
`
`10
`
`7
`
`8
`
`4 (4PKB)
`
`B, E
`
`Patatin, patatin-like
`phospholipase family
`
`A, B, E Polysaccharide
`deacetylase
`2-Pyrone-4,6-dicarboxylate
`hydrolase,
`amidohydrolase
`
`B, E
`
`15 (3CL6)
`
`7 (4DI9)
`
`5 (3FGB)
`
`A, B, E 6-Phosphogluconolactonase,
`3-carboxy-muconate
`cyclase
`
`3.1.1.31, 5.5.1.5
`
`1946
`
`PROTEINSCIENCE.ORG
`
`Classification and Database of CEHs
`
`Exhibit 2077
`Page 05 of 12
`
`

`

`Number of
`sequences with
`evidence at
`protein level
`
`Number of
`known tertiary
`structures
`(representative
`PDB structures)
`
`Producing
`organismsa
`
`Dominant
`enzyme
`names
`
`Table I. Continued
`
`Family
`
`Number of
`sequences
`
`Three-solenoid fold
`52
`8166
`
`Four-layer a/b/b/a fold
`53
`1325
`
`Three-layer b/b/a fold
`54
`8877
`
`b-barrel
`55
`
`1024
`
`56
`
`3296
`
`Two-layer sandwich fold
`57
`3136
`
`b-sandwich
`58
`
`2129
`
`Not described
`59
`5037
`
`60
`
`395
`
`22
`
`2
`
`14
`
`2
`
`2
`
`94
`
`16
`
`4
`
`10
`
`3
`
`Dominant
`EC numbers
`
`3.1.1.11
`
`13 (2NSP)
`
`A, B, E Pectinesterase, pectin
`methylesterase
`
`3 (2WYM)
`
`A, B
`
`L-Ascorbate 6-phosphate
`lactonase, b-lactamase
`
`3.1.1.–, 3.5.2.6
`
`19 (3KO9)
`
`B, E
`
`1 (2ERV)
`
`B
`
`7 (1ILZ)
`
`B, E
`
`4 (1J26)
`
`B, E
`
`D-Tyrosyl-tRNA(Tyr)
`deacylase, D-aminoacyl-
`tRNA deacylase
`
`Outer membrane enzyme,
`deacylase, lipid A 3-O-
`deacylase
`Phospholipase A1
`
`Peptidyl-tRNA hydrolase,
`peptide chain release
`factor I
`
`3 (1BCI)
`
`B, E
`
`Cytosolic phospholipase
`A2
`
`5 (2RTX)
`
`B, E
`
`9 (4C7W)
`
`9 (2JZ7)
`
`V
`
`B
`
`3.1.–.–
`
`3.1.1.77
`
`3.1.1.32
`
`3.1.1.29
`
`3.1.1.4
`
`3.1.1.29
`
`3.1.1.53
`
`3.1.1.–
`
`Peptide chain release fac-
`tor 1, peptidyl-tRNA
`hydrolase
`Hemagglutinin-esterase,
`E3 glycoprotein
`Lipase, polyurethanase,
`hemolysin E
`Senescence-associated
`carboxylesterase
`Feruloyl esterase, tannase
`Phospholipase B-like 2
`
`1 (4NFU)
`
`1 (3WMT)
`5 (3FBX)
`
`E
`
`B, E
`B, E
`
`61
`
`62
`
`63
`64
`
`521
`
`386
`
`3312
`944
`
`No known tertiary structure
`65
`13
`66
`438
`
`67
`
`68
`
`69
`
`70
`
`71
`
`72
`
`73
`74
`75
`
`11
`
`728
`
`1234
`
`1334
`
`34
`
`2686
`
`335
`3091
`7331
`
`4
`
`5
`14
`
`1
`7
`
`11
`
`8
`
`4
`
`1
`
`1
`
`10
`
`1
`1
`14
`
`0
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`0
`0
`
`B
`B, E
`B, E
`
`E
`E
`
`E
`
`E
`
`B, E
`
`B, E
`
`B
`
`Phospholipase A2
`Putative peptidyl-tRNA
`hydrolase
`Phospholipase A1,
`phospholipase A2
`Groups XIIA and XIIB
`secretory phospholipase
`A2
`Poly(3-hydroxybutyrate)
`depolymerase, feruloyl
`esterase
`Carboxymethylenebuteno-
`lidase, dienelactone
`hydrolase
`Poly(3-hydroxyoctanoate)
`depolymerase
`A, B, E Polyhydroxybutyrate
`depolymerase family,
`acetylxylan esterase, fer-
`uloyl esterase, carbohy-
`drate esterase family 1
`Phospholipase A1
`Lysophospholipase L2
`GDSL family esterase/
`lipase
`
`3.1.1.1
`
`3.1.1.20, 3.1.1.73
`3.1.1.–
`
`3.1.1.4
`3.1.1.29
`
`3.1.1.4, 3.1.1.32
`
`3.1.1.4
`
`3.1.1.75, 3.1.1.73
`
`3.1.1.45
`
`3.1.1.76
`
`3.1.1.75, 3.1.1.72, 3.1.1.73
`
`3.1.1.32
`3.1.1.5
`3.1.1.–
`
`Chen et al.
`
`PROTEIN SCIENCE VOL 25:1942—1953
`
`1947
`
`Exhibit 2077
`Page 06 of 12
`
`

`

`Table I. Continued
`
`Number of
`sequences with
`evidence at
`protein level
`
`Number of
`known tertiary
`structures
`(representative
`PDB structures)
`
`Producing
`organismsa
`
`Dominant
`enzyme
`names
`
`Family
`
`Number of
`sequences
`
`76
`77
`78
`
`79
`
`80
`
`81
`
`82
`
`83
`
`84
`85
`86
`87
`88
`89
`
`90
`
`91
`
`261
`284
`688
`
`514
`
`1828
`
`212
`
`192
`
`2953
`
`332
`55
`799
`439
`2378
`465
`
`1666
`
`862
`
`4
`1
`2
`
`3
`
`10
`
`1
`
`2
`
`3
`
`3
`3
`1
`2
`34
`3
`
`17
`
`14
`
`0
`0
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`
`0
`0
`0
`0
`0
`0
`
`0
`
`0
`
`B, E
`E
`B, E
`
`E
`
`B, E
`
`E
`
`E
`
`B, E
`
`E
`B
`B, E
`B
`E
`B
`
`B, E
`
`B, E
`
`Chlorophyllase
`Triacylglycerol lipase
`Acetylhydrolase, esterase,
`lipase, a/b-hydrolase
`Triglyceride lipase, choles-
`terol esterase, lysosomal
`acid lipase
`Patatin-like phospholipase
`domain
`ATG15 protein, triacylgly-
`cerol lipase
`Steroyl esterase, YEH1p,
`YEH2p
`Sialate O-acetylesterase,
`9-O-acetylesterase
`Acyloxyacyl hydrolase
`EstP (carboxylesterase)
`Lipase
`Lipase
`Phospholipase DDHD
`D-(2)23-hydroxybutyrate
`oligomer hydrolase,
`cytochrome C1
`Patatin-like phospholipase
`domain protein, triacyl-
`glycerol lipase
`Phospholipase B1
`
`Most prevalent producers are bolded.
`a A, archaea; B, bacteria; E, eukaryota; V, virus.
`
`Dominant
`EC numbers
`
`3.1.1.14
`3.1.1.3
`3.1.1.–
`
`3.1.1.–, 3.1.1.13
`
`3.1.1.–
`
`3.1.1.3
`
`3.1.1.13
`
`3.1.1.53
`
`3.1.1.77
`3.1.1.1
`3.1.1.–
`3.1.1.–
`3.1.1.–
`3.1.1.22
`
`3.1.1.–, 3.1.1.3
`
`3.1.1.–
`
`Figure S1. The conserved amino acid counts in each
`family are summarized in Supporting Information Table
`SI. In the great majority of cases, at least one and usual-
`ly substantially more amino acid residues are totally
`conserved over all primary structures. A roughly equal
`number of residues of chemically similar but not identi-
`cal amino acid residues are also conserved. The second-
`ary structures of a representative of each family with a
`known tertiary structure are shown in Supporting
`Information Figure S2. Each family member has similar
`secondary structures in its core, but some members
`have either extra or missing a-helices or b-strands.
`RMSD and Pavg values obtained by tertiary structure
`superposition also appear in Supporting Information
`Table SI. In a large fraction of all cases where multiple
`tertiary structures in a single family are available,
`RMSD < 1.5 A˚ and Pavg > 90%.
`A large majority of families have >1000 primary
`structures, and most of the rest have 100–1000
`sequences (Table I). As expected, all families have at
`least one sequence with evidence at protein level.
`Among them, 64 of the 91 families have known ter-
`tiary structures, with 27 families having none.
`Many families have members produced by
`organisms in all
`three life kingdoms (Table I).
`
`Members of other families are produced by organ-
`isms in two kingdoms, usually bacteria and eukar-
`yota, or in a single kingdom (Table I).
`A total of 36 families were grouped into five
`clans by having secondary and tertiary structures
`that could be closely superimposed (Table I), even
`though their primary structures may not be signifi-
`cantly similar. Supporting Information Table SII
`shows RMSD and Pavg values and protein folds of
`each clan. In all cases, RMSD < 2.5 A˚ . However, Pavg
`values cover a wide range. Tertiary structures of
`families within clans are more variable than those
`of family members, as the former do not share simi-
`lar primary structures while the latter do.
`Each clan has characteristic tertiary folds (Fig. 1).
`Clan A CEHs all have standard a/b hydrolase folds,
`in which the second b-strand is antiparallel to the
`others in the b-sheet.3,4 The b-sheet generally has
`eight b-strands, which are found in the order 1, 2, 4,
`3, 5, 6, 7, and 8, based on their amino acid residue
`numbers. Clan B members have similar tertiary struc-
`tures as those in clan A; however, all their b-strands
`are parallel to each other, proceeding in the same
`direction, and they are arranged in the order 2, 1, 3,
`4, and 5. A sixth b-strand, if present, may be found
`
`1948
`
`PROTEINSCIENCE.ORG
`
`Classification and Database of CEHs
`
`Exhibit 2077
`Page 07 of 12
`
`

`

`Figure 1. Tertiary structures of representative families from the five CEH clans. Clan A: family 2, Burkholderia xenovorans 3-
`oxoadipate enol-lactonase, PDB 2XUA. Clan B: family 17, Trichoderma reesei cutinase, PDB 4PSD. Clan C: family 30, Escheri-
`chia coli esterase, PDB 3BF8. Clan D: family 32, mouse SMP30/GNL hydrolase, PDB 4GN9. Clan E: family 34, Daboia russelii
`phospholipase A2, PDB 1TG1.
`
`before or after the fifth one. Clan C enzymes have a/b
`hydrolase-like tertiary structures as well, but their first
`rather than their second b-strands are antiparallel to
`the others on the same b-sheet. The b-strand order is 1,
`3, 2, 4, 5, 6, and 7. The tertiary structures of clan D
`members are six-bladed b-propeller folds, where each
`blade, consisting of four b-sheets, shares a central axis.
`Clan E enzymes have three-a-helix up-down bundle ter-
`tiary structures.
`At present, over 60% of the CEH families are not
`part of clans, either because they have no known ter-
`tiary structures or because their members cannot be
`superimposed well on the tertiary structures of the
`members of existing clans or with each other, even
`when they share much the same fold. A total of 22 of
`these families have known tertiary structures desig-
`nated by CATH or the Structural Classification of
`Proteins (SCOP) database,27 such as three-layer a/b/
`a sandwiches (Rossmann folds), a/b-hydrolase struc-
`tures, patatin-like folds, a/b-TIM barrels, seven-
`bladed b-propellers, three-solenoid folds, three-layer
`b/b/a sandwiches, b-barrels,
`two-layer
`sandwich
`folds, and b-sandwiches (Table I). A further six fami-
`lies have tertiary structures that are not classified by
`either CATH or SCOP.
`Of the most studied CEHs, large numbers of
`named carboxylesterases or EC 3.1.1.1 designations
`are found in families 12 (clan A), 26, 28, 29 (clan C),
`62, and 85, of those not in any clan (Table I). Fami-
`lies 4 (clan A), 21, 24 (clan B), 39, 77, 79, 81, and 90
`
`(no clan) have triacylglycerol lipases (EC 3.1.1.3).
`Phospholipase A2s (EC 3.1.1.4) exist in families 23
`(clan B), 34–36 (clan E), 58, 65, 67, and 68 (no clan).
`Lysophospholipases (EC 3.1.1.5) are in families 27
`(clan C), 42, and 74 (no clan). Acetylcholinesterases
`(EC 3.1.1.7) and butyrylcholinesterases (often simply
`called cholinesterases) (EC 3.1.1.8) are found only in
`family 12 (clan A), as are cocaine esterases (EC
`3.1.1.84). Cutinases (EC 3.1.1.74) are in family 17
`(clan B). Enzyme name and EC number assignments
`tend to be somewhat elastic and arbitrary, causing
`some spurious variation, and some families now not
`assigned to clans will eventually find their way into
`them once they have known tertiary structures.
`Therefore, these lists are subject to future revision.
`However, it is evident by the number of sequences
`in each family that carboxylesterases are mainly in
`families 26, 28, and 29 (clan C), triacylglycerol
`lipases exist largely in families 21, 24 (clan B), and
`39 (no clan), and phospholipase A2s are found pre-
`dominantly throughout clan E and in family 58 (no
`clan).
`
`CEH Mechanisms
`Table II presents catalytic residues from representa-
`tive families from each of the 64 families that have
`known tertiary structures, gathered from the
`articles corresponding to the PDB structures listed
`there. Clans A, B, and C all have catalytic residues
`characteristic of serine protease mechanisms, with
`
`Chen et al.
`
`PROTEIN SCIENCE VOL 25:1942—1953
`
`1949
`
`Exhibit 2077
`Page 08 of 12
`
`

`

`Table II. Catalytic Residues of Carboxylic Ester Hydrolase Families with Known Tertiary Structures
`
`Clan
`
`Family
`
`PDB
`designation
`
`Producing species and enzyme
`
`Catalytic residues
`
`A
`
`A
`A
`A
`
`A
`
`A
`A
`
`A
`
`A
`A
`A
`A
`A
`
`B
`B
`
`B
`B
`B
`B
`
`B
`B
`
`B
`B
`B
`C
`
`C
`
`C
`C
`
`C
`
`C
`C
`
`D
`D
`
`E
`
`E
`E
`–
`
`–
`
`–
`–
`–
`–
`–
`
`–
`
`1
`
`2
`3
`4
`
`5
`
`6
`7
`
`8
`
`9
`10
`11
`12
`13
`
`14
`15
`
`16
`17
`18
`19
`
`20
`21
`
`22
`23
`24
`25
`
`26
`
`27
`28
`
`29
`
`30
`31
`
`32
`33
`
`34
`
`35
`36
`37
`
`38
`
`39
`40
`41
`42
`43
`
`44
`
`1QIT
`
`4CCY
`4KRX
`4CG1
`
`2XLB
`
`3C5V
`3D59
`
`4G4G
`
`1K8Q
`–
`4UYU
`2JGJ
`1JJF
`
`–
`1U8U
`
`1I6W
`4PSD
`1VYH
`1DEO
`
`1EB8
`4HYQ
`
`1CVL
`4X92
`4FDM
`1DIN
`
`4ETW
`
`3HJU
`3CN9
`
`1TQH
`
`3BF7
`1LBS
`
`2DG0
`1V04
`
`1PPA
`
`1POC
`2WG7
`1RZW
`
`3EB9
`
`1RP1
`1TGL
`–
`1CJY
`3KVN
`
`1UWC
`
`Moorea producens (Lyngbya majuscula)
`decarboxylating thioesterase
`Bacillus subtilus carboxylesterase CesB
`Escherichia coli acetyl esterase Aes
`Thermobifida fusca polyethylene tere-
`phthalate degrading hydrolase
`Bacillus pumilus (Bacillus mesentericus)
`acetyl xylan esterase
`Human PP2A-specific methylesterase
`Human plasma platelet-activating factor
`acetylhydrolase
`Sporotrichum thermophile (Mycelioph-
`thora thermophila) glucuronyl
`esterase
`Dog gastric lipase
`
`–
`Human WNT deacylase notum
`Mouse acetylcholinesterase
`Clostridium thermocellum feruloyl
`esterase
`
`–
`Escherichia coli thioesterase/protease/
`lysophospholipase L1
`Bacillus subtilis lipase
`Trichoderma reesei cutinase
`Mouse PAF-AH holoenzyme
`Aspergillus aculeatus rhamnogalactur-
`onan acetylesterase
`Manihot esculenta hydroxynitrile lyase
`Streptomyces albidoflavus phospholipase
`A1
`Chromobacterium viscosum lipase
`Human lysosomal phospholipase A2
`Bacillus L2 lipase
`Pseudomonas knackmussi dienelactone
`hydrolase
`Shigella flexneri enzyme/ACP substrate
`gatekeeper
`Human monoglyceride lipase
`Pseudomonas aeruginosa
`carboxylesterase
`Geobacillus stearothermophilus
`carboxylesterase Est30
`Escherichia coli esterase
`Moesziomyces antarticus triacylglycerol
`hydrolase
`Staphylococcus aureus lactonase
`Human/rabbit/mouse/rat serum
`paraoxonase
`Agkistrodon piscivorus lysine 49 phos-
`pholipase A2
`Apis mellifera venom phospholipase A2
`Rice class X1b phospholipase A2
`Archaeglobus fulgidis peptidyl-tRNA
`hydrolase
`Trypanosoma brucei 6-
`phosphogluconolactonase
`Dog pancreatic lipase related protein
`Rhizomucor michei triacylglycerol lipase
`–
`Human cytosolic phospholipase A2
`Pseudomonas aeruginosa autotrans-
`porter EstA
`Aspergillus niger feruloyl esterase
`
`S100/E124/H266
`
`S130/E245/H274
`S165/D262/H292
`S130/D176/H208
`
`S181/D269/H298
`
`S156/D181/H349
`S273/D296/H351
`
`S213/E236/H346
`
`S153/D324/H353
`–
`S232/D340/H389
`S203/E334/H447
`S172/D230/H260
`
`–
`S10/D154/H157
`
`S77/D133/H156
`S164/D216/H229
`S48/D/193/H196
`S9/D192/H195
`
`S80/D208/H236
`S11/H218
`
`S87/D263/H285
`S165/D327/H359
`S113/D317/H358
`C123/D171/H202
`
`S82/D207/H235
`
`S121/D239/H269
`S113/D166/H197
`
`S94/D193/H223
`
`S89/D113/S206/H234
`S105/D187/H224
`
`D138/D236
`H115/H134
`
`H48/Y73/D99
`
`H34/D35
`H61/D62
`H20/D93/H113
`
`D163/H165
`
`S152/D176/H263
`S144/D203/H257
`–
`S228/D549
`S14/D286/H289
`
`S133/D194/H247
`
`1950
`
`PROTEINSCIENCE.ORG
`
`Classification and Database of CEHs
`
`Exhibit 2077
`Page 09 of 12
`
`

`

`Table II. Continued
`
`Clan
`
`Family
`
`PDB
`designation
`
`Producing species and enzyme
`
`Catalytic residues
`
`–
`
`–
`–
`
`–
`–
`–
`
`–
`–
`
`–
`
`–
`
`–
`–
`–
`–
`–
`
`–
`
`–
`–
`–
`–
`
`45
`
`46
`47
`
`48
`49
`50
`
`51
`52
`
`53
`
`54
`
`55
`56
`57
`58
`59
`
`60
`
`61
`62
`63
`64
`
`4HOY
`
`1TIA
`1CHD
`
`4PKB
`3CL6
`4DI9
`
`–
`2NSP
`
`2WYM
`
`3KO9
`
`2ERV
`1ILZ
`–
`1CJY
`2JY9
`
`1FLC
`
`2Z8X
`4NFU
`3WMT
`–
`
`Acinetobacter baumanni peptidyl-tRNA
`hydrolase
`Penicillium camemberti

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket