`
`Thioesterases: A new perspective based
`on their primary and tertiary structures
`
`David C. Cantu, Yingfei Chen, and Peter J. Reilly*
`
`Department of Chemical and Biological Engineering, Iowa State University, Ames, Iowa 50011
`
`Received 19 April 2010; Accepted 7 May 2010
`DOI: 10.1002/pro.417
`Published online 17 May 2010 proteinscience.org
`
`Abstract: Thioesterases (TEs) are classified into EC 3.1.2.1 through EC 3.1.2.27 based on their
`activities on different substrates, with many remaining unclassified (EC 3.1.2.–). Analysis of primary
`and tertiary structures of known TEs casts a new light on this enzyme group. We used strong
`primary sequence conservation based on experimentally proved proteins as the main criterion,
`followed by verification with tertiary structure superpositions, mechanisms, and catalytic residue
`positions, to accurately define TE families. At present, TEs fall into 23 families almost completely
`unrelated to each other by primary structure. It is assumed that all members of the same family
`have essentially the same tertiary structure; however, TEs in different families can have markedly
`different folds and mechanisms. Conversely, the latter sometimes have very similar tertiary
`structures and catalytic mechanisms despite being only slightly or not at all related by primary
`structure, indicating that they have common distant ancestors and can be grouped into clans. At
`present, four clans encompass 12 TE families. The new constantly updated ThYme (Thioester-
`active enzYmes) database contains TE primary and tertiary structures, classified into families and
`clans that are different from those currently found in the literature or in other databases. We
`review all types of TEs, including those cleaving CoA, ACP, glutathione, and other protein
`molecules, and we discuss their structures, functions, and mechanisms.
`
`Keywords: clan; primary structure; protein family; tertiary structure; thioesterases; ThYme
`
`Introduction
`The thioesterases (TEs), or thioester hydrolases,
`comprise a large enzyme group whose members hy-
`drolyze the thioester bond between a carbonyl group
`and a sulfur atom. They are classified by the No-
`
`Additional Supporting Information may be found in the online
`version of this article.
`
`Grant sponsor: U.S. National Science Foundation; Grant
`number: EEC-0813570.
`*Correspondence to: Peter J. Reilly, Department of Chemical
`and Biological Engineering, 2114 Sweeney Hall,
`Iowa State
`University, Ames, IA 50011-2230. E-mail: reilly@iastate.edu.
`
`menclature Committee of the International Union of
`Biochemistry and Molecular Biology (NC-IUBMB)
`into EC (enzyme commission) 3.1.2.1 to EC 3.1.2.27,
`as well as EC 3.1.2.– for unclassified TEs.1 Sub-
`strates of 15 of these 27 groupings contain coenzyme
`A (CoA), two contain acyl carrier proteins (ACPs),
`four have glutathione or its derivatives, one has
`ubiquitin, and two contain other moieties. In addi-
`tion, three groupings have been deleted.
`The EC classification system is based on enzyme
`function and substrate identity, and it was first for-
`mulated when very few amino acid sequences (pri-
`mary structures) and three-dimensional
`(tertiary)
`
`Published by Wiley-Blackwell. VC 2010 The Protein Society
`
`PROTEIN SCIENCE 2010 VOL 19:1281—1295
`
`1281
`
`Exhibit 2076
`Page 01 of 15
`
`
`
`structures of enzymes were available. Another way
`to classify enzymes is by primary structure into fam-
`ilies and by tertiary structure into clans or superfa-
`milies. Some databases are built this way: Pfam2
`has a collection of protein families and domains, and
`SCOP3 classifies protein structures into classes,
`folds, families, and superfamilies. Other databases
`treat certain enzyme groups more specifically. For
`instance, MEROPS4 is a major database for pepti-
`and CAZy5
`dases,
`covers
`carbohydrate-active
`enzymes.
`It is common to observe that members of more
`than one EC grouping are found in one enzyme fam-
`ily based on similar amino acid sequences, implying
`that they have a common ancestor, mechanism, and
`tertiary structure. Conversely, members of a single
`EC grouping may be located in more than one
`enzyme family, being totally or almost totally unre-
`lated in primary structure and potentially in mecha-
`nism and tertiary structure.
`A further observation is that members of two
`different enzyme families may have very similar ter-
`tiary structures and mechanisms even though their
`primary structures are very different. This may
`imply that they are members of the same clan or
`superfamily, descended from a more distant common
`ancestor.
`In this work, TE primary and tertiary struc-
`tures will be analyzed to conclude how TEs are di-
`vided (and united) into families and clans. Struc-
`tures, mechanisms, and catalytic
`residues are
`compared between families and clans. We compare
`our findings with existing databases such as Pfam
`and SCOP. Results also appear in a new continu-
`ously updated database, ThYme (Thioester-active
`enzYmes, http://www.enzyme.cbirc.iastate.edu) that
`includes families and clans of enzyme groups that
`are part of the fatty acid synthesis cycle, TEs among
`them.
`
`Family identification
`Family members must have strong (>15%, but typi-
`cally >30%) sequence similarity and near-identical
`tertiary structures, and they must share general
`mechanisms as well as catalytic residues located in
`the same position.
`In general, TE families were identified in the
`following way:
`(1) experimentally confirmed TE
`sequences were used as queries, (2) a series of suc-
`cessive Basic Local Alignment Search Tool (BLAST)6
`searches and comparison among results reduced
`query sequences to a few representative ones, (3) the
`catalytic domains of representative query sequences
`were subjected to BLAST to populate the families,
`(4) experimentally confirmed TEs were surveyed to
`search for missing potential TE families, and (5) the
`uniqueness of the families was confirmed by multi-
`ple sequence alignments (MSAs), by tertiary struc-
`
`ture superposition and comparison, and by catalytic
`residue positions. Methods are detailed in Support-
`ing Information.
`
`Clan identification
`Two or more families are grouped into a clan if all
`the sequences within them show some (<15%)
`sequence similarity, if their structures are strongly
`similar (narrowing the search to families with the
`same fold), and if they share similar active sites and
`general mechanisms. To consider all aspects of clan
`classification criteria, several methods are used to
`combine sequence and structural analysis. In addi-
`tion, catalytic mechanisms of members of each fam-
`ily were gathered from the literature, and positions
`of catalytic residues were determined to verify that
`they coincided. A more detailed description of these
`methods is found in the Supporting Information.
`
`ThYme database
`All the sequences in each family are displayed on
`the ThYme database website (http://www.enzyme.
`cbirc.iastate.edu). These sequences are taken, using
`a series of scripts, from the BLAST results of the
`catalytic domains
`of
`the
`representative query
`sequences. Matching accessions, taxonomical data,
`protein names, and EC numbers are taken from
`UniProt7 and GenBank8 databases. Each TE family
`is shown on a page where sequences are arranged
`into archaea, bacteria, and eukaryota, then alpha-
`betically by species. In each row, a single sequence
`or group of sequences with 100% identical catalytic
`domains are shown with their protein name and
`UniProt and/or GenBank accession codes. EC num-
`bers are shown only when they appear in a sequen-
`ce’s UniProt or GenBank annotation. If a crystal
`structure is known, the Protein Data Bank (PDB,
`http://www.rcsb.org) accession code also appears.
`ThYme will be continuously updated: the content of
`each family will grow as GenBank, UniProt, and
`PDB do. However, to create a new family, or to
`merge or delete existing ones, human judgment and
`manual changes will be necessary.
`
`Thioesterase families
`Use of BLAST with TE query sequences followed by
`construction of MSAs and superposition of tertiary
`structures yielded 23 families almost completely
`unrelated by primary structure (Table I).
`Enzymes in families TE1–TE13 hydrolyze sub-
`strates with various acyl moieties and CoA, those
`in TE14–TE19 attack bonds between acyl groups
`and ACP, and those in TE20 and TE21 cleave the
`bonds between acyl groups and proteins. Members
`of TE22 and TE23 break bonds between acyl groups
`and glutathione and its derivatives (Table II). The
`sulfur-carrying moiety in CoA and ACP is a pante-
`thiene residue, whereas glutathione itself carries
`
`1282
`
`PROTEINSCIENCE.ORG
`
`Thioesterases: A New Perspective
`
`Exhibit 2076
`Page 02 of 15
`
`
`
`Table I. Thioesterase Families and Common Names of their Members
`
`Family
`
`Producing organisms
`
`Genes and/or other names of family members
`
`TE1
`TE2
`TE3
`TE4
`TE5
`TE6
`TE7
`TE8
`TE9
`TE10
`TE11
`TE12
`TE13
`TE14
`TE15
`TE16
`
`TE17
`TE18
`TE19
`TE20
`TE21
`TE22
`TE23
`
`A, B, Ea
`A, B, E
`A, B
`B, E
`B
`A, B, E
`B, E
`A, B, E
`B
`B
`B
`B,E
`A, B
`B, E
`B
`A, B, E
`
`B
`B,E
`B
`E
`A, B, E
`A, B, E
`A, B, E
`
`Ach1
`Acot1–Acot6, BAAT thioesterase
`tesA, acyl-CoA thioesterase I, protease I, lysophospholipase L1
`tesB, acyl-CoA thioesterase II, Acot8
`tesC (ybaW), acyl-CoA thioesterase III
`Acot7 (BACH), Acot11 (BFIT, Them1), Acot12 (CACH), YciA
`Acot9, Acot10
`Acot13 (Them2)
`YbgC
`4HBT-I
`4HBT-II, EntH (YbdB)
`DNHA-CoA hydrolase
`paaI, paaD
`FatA, FatB
`Thioesterase CalE7
`TE domain of FAS (Thioesterase I), TE domain of
`PKS or NRP (type I thioesterase (TE I))
`TE domain of PKS
`Thioesterase II, type II thioesterase (TE II)
`luxD
`ppt1, ppt2, palmitoyl-protein thioesterase
`apt1, apt2, acyl-protein thioesterase, phospholipase, carboxylesterase
`S-formylglutathione hydrolase, esterase A, esterase D
`Hydroxyglutathione hydrolase, glyoxalase II
`
`a A, archaea; B, bacteria; E, eukaryota. Most prevalent producers bolded.
`
`the sulfur moiety, and in non-ACP proteins, the sul-
`fur-carrying moiety is built up mainly from a cyste-
`ine residue.
`
`All tertiary structures within each family have
`almost identical cores and very strong overall resem-
`blance (Table III) shown by RMSDave values of <1.8
`
`Table II. Thioesterase Functions and Substrate Specificities
`
`Family
`
`General function
`
`EC number
`
`Preferred substrate specificity (if known)
`
`Acyl-CoA hydrolase
`Acyl-CoA hydrolase
`Acyl-CoA hydrolase
`Acyl-CoA hydrolase
`
`Acyl-CoA hydrolase
`Acyl-CoA hydrolase
`
`Acyl-CoA hydrolase
`Acyl-CoA hydrolase
`Acyl-CoA hydrolase
`
`Acyl-CoA hydrolase
`Acyl-CoA hydrolase
`Acyl-CoA hydrolase
`Acyl-CoA hydrolase
`
`TE1
`TE2
`TE3
`TE4
`
`TE5
`TE6
`
`TE7
`TE8
`TE9
`
`TE10
`TE11
`TE12
`TE13
`
`TE14
`TE15
`TE16
`
`TE17
`TE18
`
`TE19
`TE20
`TE21
`TE22
`TE23
`
`Acyl-ACP hydrolase
`Acyl-ACP hydrolase
`Acyl-ACP hydrolase
`
`3.1.2.–, 3.1.2.14
`—
`3.1.2.14a
`
`Acyl-ACP hydrolase
`Acyl-ACP hydrolase
`
`3.1.2.14b
`3.1.2.–, 3.1.2.14
`
`Acyl-ACP hydrolase
`Protein-palmitoyl hydrolase
`Protein-acyl hydrolase
`Glutathione hydrolase
`Glutathione hydrolase
`
`2.3.1.–
`3.1.2.–, 3.1.2.22
`3.1.2.–, 3.1.1.1
`3.1.2.12, 3.1.1.1, 3.1.1.6
`3.1.2.6
`
`Acetyl-CoA
`3.1.2.1, 2.8.3.–
`Palmitoyl-CoA, bile-acid-CoA
`3.1.2.–, 3.1.2.2, 2.3.1.65
`3.1.2.–, 3.1.2.20, 3.1.1.2, 3.1.1.5 Medium- to long-chain acyl-CoA
`3.1.2.–, 3.1.2.2, 3.1.2.27
`Short- to long-chain acyl-CoA, palmitoyl-CoA,
`choloyl-CoA
`Long-chain acyl-CoA, 3,5-tetradecadienoyl-CoA
`Short- to long-chain acyl-CoA, C4–C18
`
`3.1.2.–
`3.1.2.–, 3.1.2.1, 3.1.2.2, 3.1.2.18,
`3.1.2.19, 3.1.2.20
`3.1.2.–, 3.1.2.1, 3.1.2.2, 3.1.2.20
`3.1.2.–
`3.1.2.–, 3.1.2.18
`
`3.1.2.23
`3.1.2.–
`3.1.2.–
`3.1.2.–
`
`Short- to long-chain acyl-CoA
`Short- to long-chain acyl-CoA, C6–C18
`Short- to long-chain acyl-CoA,
`4-hydroxybenzoyl-CoA
`4-Hydroxybenzoyl-CoA
`4-Hydroxybenzoyl-CoA
`1,4-Dihydroxy-2-napthoyl-CoA
`Short and medium-chain acyl-CoA, several
`hydroxyphenylacetyl-CoA substrates
`Short- to long-chain acyl-ACP, C8–C18
`—
`Long-chain acyl-ACP, various polyketides and
`non-ribosomal peptides
`Several polyketides
`Medium-chain acyl-ACP, various polyketides
`and nonribosomal peptides
`Myristoyl-ACP
`Palmitoyl-protein
`—
`S-Formylglutathione
`D-Lactoylglutathione
`
`a TE domain. FASs, PKSs, and NRPs can have several EC numbers such as 2.3.1.85, 2.3.1.94, 2.3.1.–, 2.7.7.–, and 5.1.1.–.
`b TE domain of PKSs.
`
`Cantu et al.
`
`PROTEIN SCIENCE VOL 19:1281—1295
`
`1283
`
`Exhibit 2076
`Page 03 of 15
`
`
`
`Table III. Thioesterase Folds
`RMSDave (A˚ )
`
`Family
`
`Fold
`
`TE1
`TE2
`TE3
`TE4
`TE5
`TE6
`TE7
`TE8
`TE9
`TE10
`TE11
`TE12
`TE13
`TE14
`TE15
`TE16
`TE17
`TE18
`TE19
`TE20
`TE21
`TE22
`TE23
`
`NagB
`a/b-Hydrolase
`Flavodoxin-like
`HotDog
`HotDog
`HotDog
`HotDog
`HotDog
`HotDog
`HotDog
`HotDog
`HotDog
`HotDog
`HotDog
`HotDog
`a/b-Hydrolase
`a/b-Hydrolase
`a/b-Hydrolase
`a/b-Hydrolase
`a/b-Hydrolase
`a/b-Hydrolase
`a/b-Hydrolase
`Lactamase
`
`1.25
`1.00
`0.58
`0.90
`—
`1.39
`—
`0.58
`1.19
`0.67
`0.87
`—
`0.43
`1.65
`—
`1.51
`1.67
`0.83
`—
`1.41
`0.82
`1.69
`1.67
`
`Pave (%)
`
`PDB files
`
`96.4
`96.6
`96.6
`33.3
`—
`75.9
`—
`88.3
`88.8
`97.1
`93.9
`—
`94.6
`87.7
`—
`66.9
`82.4
`97.2
`—
`91.2
`96.7
`78.9
`78.5
`
`2G39, 2NVV
`3HLK, 3K2I
`1IVN, 1J00, 1JRL, 1U8U, 1V2G, 3HP4
`1C8U, 1TBU
`1NJK
`3B7K, 2Q2B, 2V1O, 2QQ2, 1YLI, 3BJK, 3D6L
`
`2H4U, 3F5O, 2F0X, 2CY9
`2PZH, 1S5U, 3HM0, 1Z54
`1BVQ, 1LO7, 1LO8, 1LO9
`1Q4S, 1Q4T, 1Q4U, 1VH9, 2B6E, 1SC0, 3LZ7
`2VEU
`2FS2, 1PSU, 2DSL0, 1J1Y, 1WLU, 1WLV, 1WM6, 1WN3
`2OWN, 2ESS
`2W3X
`2VZ8,a 2VZ9,a 2PX6, 1XKT, 2ROQ,b 2CB9, 2CBG, 2VSQ, 1JMK
`1MO2, 1KEZ, 1MN6, 2H7X, 2H7Y, 2HFK, 2HFJ, 1MNA, 1MNQ
`3FLA, 3FLB, 2RON,b 2K2Qb
`1THT
`1EH5, 3GRO, 1EI9, 1EXW, 1PJA
`1FJ2, 1AUO, 1AUR, 3CN7, 3CN9
`3FCX, 3C6B, 2UZ0, 1PV1, 3I6Y, 3E4D, 3LS2
`2QED, 1XM8, 2P18, 2GCU, 2Q42, 1QH3, 1QH5, 2P1E
`
`a2VZ8 and 2VZ9 have TE domains in their FASTA format. Therefore, these were picked up by BLAST, but their PDB files
`do not include the TE domain, and they were not included in the RMSD calculation.
`bNMR-resolved structures not included in RMSD calculation.
`
`A˚ and Pave values of >75% (see Supporting Informa-
`tion for definitions), with two exceptions. TE4 has a
`Pave value of 33.3% because it has only two crystal
`structures, of which one monomer (1C8U) is a dou-
`ble HotDog, whereas another monomer (1TBU) is
`incomplete with only a single HotDog. Similarly, in
`TE16 the Pave value is 65.8% because the TE domain
`of one structure (2VSQ) is smaller than the rest.
`Of the families whose members hydrolyze acyl-
`CoAs, all have HotDog9,10 folds (Table III, Figs. 1
`and 2) except for TE1, TE2, and TE3. TE1 enzymes
`have NagB folds, and they have acetyl-CoA hydro-
`lase (EC 3.1.2.1) activity as well as acetate or succi-
`nate-CoA transferase (EC 2.8.3.–) activity. They are
`found mainly in bacteria and fungi, although they
`are also present in archaea. Enzymes coded by the
`acetyl-CoA hydrolase ACH1 gene from Saccharomy-
`ces cerevisiae are present in TE1.11 Fungal enzymes
`in this family are involved with acetate levels and
`CoA transfer in mitochondria.12
`TE2 enzymes have a/b-hydrolase13 folds (Figs. 3
`and 4). They are mainly found in eukaryotes (ani-
`mals), but they are also present in bacteria. They
`have mostly palmitoyl (EC 3.1.2.2) and bile acid-
`CoA:amino acid N-acyl
`transferase
`(BAT)
`(EC
`2.3.1.65) activities. The acyl-CoA TE (Acot) enzymes
`ACOT1, ACOT2, ACOT4, and ACOT6 from Homo
`sapiens are present in this family, as well as the
`Acot1 through Acot6 enzymes from Mus musculus,
`Rattus norvegicus, and similar species.14 Also in
`TE2 are the BAAT TEs that transfer bile acid from
`
`bile acid-CoA to amino acids in the liver; these con-
`jugates later solvate fatty acids in the gastrointesti-
`nal tract.15
`Enzymes in TE3 are part of the SGNH hydro-
`lase superfamily with a flavodoxin-like fold. They
`are mainly found in bacteria and have acyl-CoA hy-
`drolase (EC 3.1.2.20), arylesterase (EC 3.1.1.2), and
`lysophospholipase (EC 3.1.1.5) activities. Some TE3
`enzymes come from the tesA gene, and they are
`located in the periplasm and are involved in fatty
`acid synthesis.16 TE3 enzymes are also called acyl-
`CoA thioesterase I, protease I, and lysophospholi-
`pase L1, and the genes that code for them, tesA,
`apeA, and pldC, respectively, are nearly identical.17
`The rest of the acyl-CoA hydrolase families have
`HotDog folds. TE4 enzymes, present in bacteria and
`eukaryotes, are acyl-CoA hydrolases as well as palmi-
`toyl-CoA (EC 3.1.2.2) and choloyl-CoA (EC 3.1.2.27)
`hydrolases. The Acot8 gene encodes for peroxisomal
`TEs,18 which are found in TE4. Also in this family
`are acyl-CoA thioesterase II enzymes, encoded by the
`tesB gene, that can hydrolyze a broad range of me-
`dium- to long-chain acyl-CoA thioesters, but whose
`physiological function is not known.19
`TE5 acyl-CoA enzymes, also known as thioester-
`ase IIIs, are present in bacteria. They are encoded
`by the tesC (or ybaW) gene and are long-chain acyl-
`CoA TEs preferring 3,5-tetradecadienoyl-CoA as a
`substrate.20
`TE6 members, present in eukaryotes, bacteria,
`and archaea, have acyl-CoA hydrolase activities
`
`1284
`
`PROTEINSCIENCE.ORG
`
`Thioesterases: A New Perspective
`
`Exhibit 2076
`Page 04 of 15
`
`
`
`Figure 1. Superimposed tertiary structures of single representatives of each TE family in a clan: TE-A acyl-CoA hydrolases
`from Escherichia coli (TE5) (green), Helicobacter pylori (TE9) (red), Pseudomonas sp. (TE10) (yellow), and Prochlorococcus
`marinus (TE12) (blue).
`
`Figure 2. Superimposed tertiary structures of single representatives of each TE family in a clan: TE-B acyl-CoA hydrolases
`from Homo sapiens (TE8) (blue), Arthrobacter sp. (TE11) (red), and E. coli (TE13) (yellow).
`
`Cantu et al.
`
`PROTEIN SCIENCE VOL 19:1281—1295
`
`1285
`
`Exhibit 2076
`Page 05 of 15
`
`
`
`Figure 3. Superimposed tertiary structures of single representatives of each TE family in a clan: TE-C acyl-ACP hydrolases
`from Homo sapiens (TE16) (blue), Saccharopolyspora erythraea (TE17) (red), and Amycolatopsis mediterranei (TE18) (yellow).
`
`with various specificities. Acot enzymes 7, 11, and
`12, present in eukaryotes, are found in TE6. Acot7
`enzymes (also known as BACH: brain acyl-CoA hy-
`drolases) are expressed mainly in brain tissue and
`preferentially attack C8–C18 acyl-CoA chains.21
`Acot11 (also known as BFIT: brown fat inducible TE,
`or Them1: TE superfamily member 1) enzymes are
`specific toward medium- and long-chain acyl-CoA
`molecules, and they may be involved with obesity in
`humans.22 Acot12 (also known as CACH: cytoplas-
`mic acyl-CoA hydrolase) enzymes in humans hydro-
`
`lyze acetyl-CoA.23 Many bacterial TE6 sequences are
`YciA TEs that hydrolyze a wide range of acyl-CoA
`thioesters and may help to form membranes.24 They
`preferentially attack butyryl, hexanoyl, lauroyl, and
`palmitoyl-CoA substrates.25
`TE7 enzymes are acyl-CoA TEs found in eukar-
`yota and bacteria. In this family are the Acot9 and
`Acot10 enzymes (previously known as MT-ACT48),
`which are expressed in the mitochondria and have
`short- to long-chain acyl-CoA TE activity, showing
`preference for C14 chains.26
`
`Figure 4. Superimposed tertiary structures of single representatives of each TE family in a clan: TE-D protein-acyl hydrolases
`from Bos taurus (TE20) (blue) and Homo sapiens (TE21) (yellow).
`
`1286
`
`PROTEINSCIENCE.ORG
`
`Thioesterases: A New Perspective
`
`Exhibit 2076
`Page 06 of 15
`
`
`
`Most TE8 members, mainly present in eukar-
`yota but also in bacteria, are acyl-CoA thioesterase
`13 (Acot13) enzymes, also known as TE superfamily
`member 2 (Them2). Enzymes in this family hydro-
`lyze short-to-long acyl-CoA (C4–C18) chains, prefer-
`ring the latter.27
`TE9 members are found only in bacteria, and
`they have acyl-CoA hydrolase activity, mostly
`unclassified (3.1.2.–), but ADP-dependent
`short-
`chain acyl-CoA hydrolases (EC 3.1.2.18), and 4-
`hydroxybenzoyl-CoA hydrolases (EC 3.1.2.23) are
`also found. The YbgC TEs are found in this family;
`some hydrolyze primarily short-chain acyl-CoA thio-
`esters,28 whereas others prefer long-chain acyl-CoA
`thioesters.29 Also, the TE domain of methylketone
`synthase, MKS2, recently discovered in tomato, is
`found in TE9.30
`The enzymes in TE10 and TE11 are found only
`in bacteria, and most have 4-hydroxybenzoyl-CoA
`hydrolase (EC 3.1.2.23) activity. They, along with
`other
`enzymes,
`convert 4-chlorobenzoate
`to 4-
`hydroxybenzoate in soil-dwelling bacteria.31 Also in
`TE11 are the EntH (YbdB) TEs, involved with enter-
`obactin (an iron chelator) biosynthesis in Esche-
`richia coli.32 This is a unique example of a HotDog-
`fold enzyme
`involved in nonribosomal peptide
`biosynthesis.
`Most TE12 enzymes are 1,4-dihydroxy-2-nap-
`thoyl (DNHA)-CoA hydrolases, involved in vitamin
`K1 biosynthesis,33 and they are found mostly in bac-
`teria. TE13 enzymes occur in archaea and bacteria.
`Most are either PaaI or PaaD enzymes in the phe-
`nylacetic acid degradation pathway, and they are
`part of the paa gene cluster.34
`TE14–TE19 enzymes hydrolyze acyl-ACP thio-
`esters, with those in TE14 and TE15 having HotDog
`folds, whereas the rest have a/b-hydrolase folds.
`TE14 enzymes are found in bacteria and plants;
`they have acyl-ACP hydrolase (EC 3.1.2.14) activity.
`Many plant enzymes in this family have been exper-
`imentally characterized: they contain FatA and FatB
`enzymes and can hydrolyze C8–C18 acyl-ACP thioest-
`ers.35 All TE14 bacterial sequences come from
`genomic or structural genomic studies.
`TE15 is a small family whose enzymes are pres-
`ent mainly in bacteria. Among them is the TE
`CalE7 involved with enediyne biosynthesis. After
`substrate-ACP hydrolysis, these enzymes decarbox-
`ylate the product before release.36 Enzymes in this
`family are the few TEs with HotDog domains
`involved with polyketide biosynthesis.
`TE16 enzymes occur in both eukaryotes and
`bacteria, and they have oleoyl-ACP hydrolase (EC
`3.1.2.14) activity. They include the TE domains of
`fatty acid synthases (FASs), also known as Thioes-
`terase I, that terminate fatty acid synthesis,37 and
`the TE domain of polyketide synthases (PKSs) and
`nonribosomal peptide synthases (NRPs), also known
`
`as Type I thioesterases (TE I), that terminate poly-
`ketide biosynthesis,38 or nonribosomal peptide bio-
`synthesis.39 In the case of NRPs, instead of an ACP
`as the carrier molecule, a polypeptide carrier protein
`(PCP) is used.
`TE17 enzymes are only found in bacteria,
`mainly in Streptomyces. They are the TE domains of
`various PKSs. FASs, PKSs, and NRPs are large mul-
`timodular enzymes with many domains having dif-
`ferent functions. Only the TE domains were used to
`identify these family members.
`Enzymes in TE18 are present in eukaryotes and
`bacteria and mainly have oleoyl-ACP hydrolase (EC
`3.1.2.14) activity. Some enzymes in this family are
`S-acyl
`fatty acid synthetases/thioester hydrolases
`(Thioesterase II).40 They work with FASs to produce
`medium-chain (C8–C12) fatty acids in milk.41 The
`Type II thioesterases (TE IIs) are found in TE18;
`these enzymes play an important role in polyketide
`and nonribosomal peptide biosynthesis by removing
`aberrant acyl chains from multimodular polyketide
`synthases and nonribosomal peptide synthases.42,43
`TE18 enzymes are independent TEs, not integrated
`to the multimodular FASs, PKSs, or NRPs.
`TE19 enzymes are classified as acyltransferases
`(EC 2.3.1.–), but they hydrolyze acyl-ACP molecules,
`mainly myristoyl-ACP.44 These enzymes divert fatty
`acids to the luminescent system in certain bacteria.
`TE20 members, found only in eukaryotes, are
`palmitoyl-protein TEs (EC 3.1.2.22) encoded by PPT
`genes. They hydrolyze the thioester bond between a
`palmitoyl group and a cysteine residue in proteins.45
`Mutations in PPT enzymes have been linked to neu-
`ronal ceroid lipofuscinosis, a genetic neurodegenera-
`tive disorder.46
`TE21 enzymes were originally identified as lyso-
`phospholipases,47 but
`they are also acyl-protein
`APT1 TEs.48 They hydrolyze
`thioester
`bonds
`between acyl chains and cysteine residues on pro-
`teins. Many proteins in this family also have carbox-
`yesterase (EC 3.1.1.1) activity.
`Among TE22 enzymes are S-formylglutathione
`hydrolases (EC 3.1.2.12) catalyzing formaldehyde
`detoxification;
`they hydrolyze S-formylglutathione
`into formate and glutathione.49 Also in TE22 are
`enzymes with acetyl esterase (EC 3.1.1.6) and car-
`boxyesterase (EC 3.1.1.1) activity.
`TE23 members are hydroxyglutathione hydro-
`lases (EC 3.1.2.6), also known as glyoxalase II
`enzymes, that hydrolyze S-D-lactoyl-glutathione to
`glutathione and lactic acid in methylglyoxal detoxifi-
`cation.50 TE23 enzymes occur in archaea, bacteria,
`and eukaryotes and have a metallo-b-lactamase fold.51
`
`Correspondence to EC groupings
`These TE families bear rather limited resemblance
`to EC numbers representing TEs. For instance, ace-
`tyl-CoA hydrolases (3.1.2.1) occur in TE1, TE6, and
`
`Cantu et al.
`
`PROTEIN SCIENCE VOL 19:1281—1295
`
`1287
`
`Exhibit 2076
`Page 07 of 15
`
`
`
`Table IV. Thioesterase Core Secondary Structure
`Elements
`
`Clan
`
`Family
`
`Secondary structural element
`
`HotDog
`TE-A
`TE-A
`TE-A
`TE-A
`TE-B
`TE-B
`TE-B
`—
`—
`—
`—
`a/b-Hydrolase
`TE-C
`TE-C
`TE-C
`TE-D
`TE-D
`—
`—
`—
`
`TE5
`TE9
`TE10
`TE12
`TE8
`TE11
`TE13
`TE4
`TE6
`TE14
`TE15
`
`TE16
`TE17
`TE18
`TE20
`TE21
`TE2
`TE19
`TE22
`
`a a, a-helix; b, b-strand.
`
`b-a-b-b-b-ba
`a-b-b-b-b
`b-a-b-b-b-b
`b-a-b-b-b-b
`b-b-a-b-b-b-b
`b-b-a-b-b-b-b
`b-b-a-b-b-b-b
`a-b-b-b-b-b-b-a-b-b-b-b
`b-a-b-b-b-b
`b-a-b-b-b-b-b-a-b-b-b-b
`b-a-b-b-b-b
`
`b-a-b-a-b-a-b-b-a-b-a
`b-a-b-a-b-a-b-b-a-b-a
`b-b-a-b-a-b-a-b-b-a-b-a
`b-a-b-a-a-b-a-b-a-b-a-b-a
`b-a-b-b-a-b-a-b-b-a-b-a
`b-b-b-a-b-a-b-a-b-b-a-b-a
`b-b-b-a-b-a-b-a-b-a-b-a-b-a-a
`b-b-b-b-a-b-a-b-a-b-a-b-a-b-a
`
`TE7; palmitoyl-CoA hydrolases (EC 3.1.2.2) are
`found in TE2, TE4, TE6, and TE7; oleoyl-ACP hy-
`drolases (EC 3.1.2.14) occur in TE14 and TE16–
`TE18, and acyl-CoA hydrolases (EC 3.1.2.20) are
`found in TE3, TE6, and TE7. Conversely, of the 24
`EC numbers remaining after three deletions, only 11
`of
`them (EC 3.1.2.1, 3.1.2.2, 3.1.2.6, 3.1.2.12,
`3.1.2.14,
`3.2.1.18,
`3.1.2.19,
`3.1.2.20,
`3.1.2.22,
`3.1.2.23, and 3.1.2.27, along with unclassified TEs
`(EC 3.1.2.–)) occur in significant numbers among the
`23 TE families. Of course, further EC numbers char-
`acteristic of TEs will likely appear as more TEs are
`sequenced and characterized.
`
`Other thioesterases
`Ubiquitin carboxyl-terminal hydrolases (EC 3.1.2.15)
`cleave a wide variety of products from the C-termi-
`nal glycine residue of ubiquitin. They were first
`identified as thiolesterases because they cleave dithi-
`othreitol from ubiquitin, and they were thought to
`also hydrolyze ubiquitin-glutathione and other ubiq-
`uitin thiolesters.52 It was later shown that they hy-
`drolyze amides and other groups from ubiquitin.53
`These enzymes belong to a larger class of peptidases
`called deubiquitinating enzymes that hydrolyze ly-
`sine-glycine amide bonds in ubiquitinated proteins.54
`
`Several families of these enzymes can be found in
`MEROPS, the peptidase database. We identified 11
`ubiquitin thiolesterase families by the methods
`described above, but we have not included them
`here or in the ThYme database, as peptidase activity
`is their main function, and they can be found in
`MEROPS.
`Certain acyl transferases (EC 2.3.1.–), for exam-
`ple, 2.3.1.9, 2.3.1.16, 2.3.1.38, and 2.3.1.39 among
`others, can hydrolyze acyl-CoA or acyl-ACP sub-
`strates and later join the liberated acyl group to
`another acyl-CoA or acyl-ACP molecule. Although
`they hydrolyze thioesters, this is not their main
`function, and therefore, we also decided not
`to
`include these enzymes here.
`
`Thioesterase clans
`TE families 4–6 and 8–15, all with members having
`HotDog crystal structures, were subjected to the
`methods described above and two clans were found:
`TE-A comprising families TE5, TE9, TE10, and
`TE12; and TE-B with TE8, TE11, and TE13.
`PSI–BLAST6 analysis suggested that TE5, TE9,
`TE10, and TE12 should be grouped into one clan
`and TE8, TE11, and TE13 into another, because
`slight sequence similarities among these families
`were found. Secondary structure element analysis of
`the structures pointed to TE5, TE6, TE10, TE12,
`and TE15 (having five b-strands) being placed in one
`clan and TE8, TE11, and TE13 (having six b-
`strands) being placed in another (Table IV); visual
`inspection suggested the same two groupings, with
`the first also including TE9. All crystal structures in
`candidate families of both possible clans were tested
`with superpositions and RMSD analysis (Figs. 1
`and 2, Table V). These different tests led to the two
`clans being defined. Members of TE-A are all acyl-
`CoA hydrolases active on many substrates including
`short, long, branched, and aromatic acyl chains. Cat-
`alytic residues (see below) in TE6 are placed differ-
`ently than those of other TE-A families, and there-
`fore, TE6 was not
`included in this clan. The
`different substrate specificities, catalytic residues,
`and mechanism (see below) of TE15 members sug-
`gested that it also be excluded from TE-A. TE-B
`enzymes are also acyl-CoA hydrolases, except for the
`YbdB TEs in TE11 involved with enterobactin bio-
`synthesis. TE4, TE7 (which has no known tertiary
`structure), and TE14 enzymes are sufficiently differ-
`ent from members of TE-A and TE-B that they were
`
`Table V. RMSD Analysis of TE Clan Members
`RMSDmin (A˚ )
`RMSDave (A˚ )
`
`Clan
`
`RMSDmax (A˚ )
`
`Pmin (%)
`
`Pave (%)
`
`Pmax (%)
`
`Cutoff (A˚ )
`
`TE-A
`TE-B
`TE-C
`TE-D
`
`1.14
`0.11
`1.81
`0.44
`
`1.33
`0.97
`1.94
`1.45
`
`1.53
`2.02
`2.13
`2.00
`
`77.5
`72.3
`52.6
`67.0
`
`87.1
`86.8
`58.3
`80.9
`
`90.9
`100.0
`75.2
`100.0
`
`3.81
`3.80
`3.82
`3.79
`
`1288
`
`PROTEINSCIENCE.ORG
`
`Thioesterases: A New Perspective
`
`Exhibit 2076
`Page 08 of 15
`
`
`
`not considered for placement in either clan; the first
`2 are acyl-CoA hydrolases, whereas the third is an
`acyl-ACP hydrolase.
`TE families 2 and 16 through 22, whose mem-
`bers all have a/b-hydrolase crystal structures, belong
`to two clans: TE-C comprising TE16, TE17, and
`TE18, and TE-D with TE20 and TE21.
`Both sequence analysis and secondary structure
`element arrangement suggested only one clan of
`TE16, TE17, and TE18 (Table IV). Visual inspection
`suggested the two clans described above, and they
`were confirmed by superpositions, RMSD analysis,
`and the position of catalytic residues (Figs. 3 and 4,
`Table V). Families in TE-C contain acyl-ACP hydro-
`lases present
`in multidomain FASs, PKSs, and
`NRPs, as well as
`independent acyl-ACP TEs
`involved in those pathways. TE-D enzymes hydro-
`lyze palmitoyl and other acyl groups from protein
`surfaces. TE2, an acyl-CoA hydrolase, TE19, a myr-
`istoyl-ACP hydrolase, and TE22, active on glutathi-
`one-activated molecules, are not part of either clan.
`
`TE tertiary structures, catalytic residues, and
`mechanisms
`Catalytic mechanisms and residues of each TE fam-
`ily were gathered from crystal structure articles.
`The PDB files, proposed catalytic residues, and pro-
`ducing organisms of the relevant TEs are listed in
`Table VI.
`HotDog-fold enzymes lack defined nonsolvated
`binding pockets and conserved catalytic residues,24
`thus a variety of catalytic residues and mechanisms
`exist.
`In TE-A, only TE9 and TE10 can be further an-
`alyzed, as TE5 and TE12 at present have only one
`crystal structure each with no corresponding refer-
`eed article. In TE9, the YbgC structure 2PZH is a
`tetramer of two dimers. After comparing this struc-
`ture to 1LO9 in TE10 and other YbgC crystals, the
`authors proposed that His18, Tyr7, and Asp11 play
`important roles in catalysis.29
`TE10 4-hydroxybenzoyl-CoA TEs have homote-
`trameric quaternary structures. It was suggested
`from structures 1LO7, 1LO8, and 1LO9 that hydro-
`gen bonds and the positive end of a helix dipole
`moment make the thioester carbonyl group more
`susceptible to a nucleophilic attack by Asp17
`through an acyl-enzyme intermediate.55
`TE-B families include TE8, TE11, and TE13.
`Members of TE8 are tetramers composed of two Hot-
`Dog dimers. Based on a crystal structure of a human
`Them2 enzyme (3F5O), it was proposed that Gly57
`and Asn50 bind and polarize the thioester carbonyl
`group, whereas Asp65 and Ser85 orient and activate
`the water nucleophile.56
`In TE11, Arthrobacter sp. strain SU 4-hydroxy-
`benzoyl-CoA TE crystal structures reveal a tetra-
`meric enzyme with a dimer of dimers. Structures
`
`1Q4S, 1Q4T, and 1Q4U led to the proposal that
`Gly65 polarizes the carbonyl group for a nucleophilic
`attack carried out by Glu73.57
`Both TE10 and TE11 are 4-hydroxybenzoyl-CoA
`TEs of similar substrate specificities and metabolic
`functions; however, their tertiary and quaternary
`structures are different and they use different active-
`site regions and residues for catalysis. This supports
`placing these two families in two different clans.
`TE13 PaaI TE from Thermus thermophilus HB8
`yielded
`homotetrameric
`quaternary
`structures
`1WLU, 1J1Y, 1WM6, 1WLV, and 1WN3. From those
`structures, a study proposed that these enzymes use
`an induced-fit mechanism to hydrolyze the substrate
`via an Asp48-activated water nucleophile.58 Compar-
`ison of the structure of another PaaI, from E. coli
`(2FS2) with the Arthrobacter TE11 structures, as
`well as site-directed mutagenesis, pointed to a mech-
`anism similar to that in TE11: Gly53 prepares the
`thioester for a nucleophilic attack from Asp61.59 4-
`Hydrozybenzoyl-CoA enzymes from TE11 and the
`PaaI enzymes from TE13 catalyze two different reac-
`tions in different organisms, and