`
`Database, 2021, 1–20
`doi:10.1093/database/baab012
`Review
`
`Review
`
`Post-translational modifications in proteins:
`resources, tools and prediction methods
`Shahin Ramazi1,† and Javad Zahiri1,2,3,*,†
`1Bioinformatics and Computational Omics Lab (BioCOOL), Department of Biophysics, Faculty of
`Biological Sciences Tarbiat Modares University, Jalal Ale Ahmad Highway, P.O. Box: 14115-111,
`Tehran, Iran, 2Department of Neuroscience, University of California San Diego, La Jolla, CA, USA and
`3Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
`
`*Corresponding author: Email: Zahiri@modares.ac.ir
`†These authors contributed equally to this work.
`Citation details: Ramazi, S., Zahiri, J. Post-translational modifications in proteins: resources, tools and prediction methods.
`Database (2021) Vol. 2021: article ID baab012; doi:10.1093/database/baab012
`
`Received 12 July 2020; Revised 20 February 2021
`
`Abstract
`Posttranslational modifications (PTMs) refer to amino acid side chain modification in
`some proteins after their biosynthesis. There are more than 400 different types of PTMs
`affecting many aspects of protein functions. Such modifications happen as crucial molec-
`ular regulatory mechanisms to regulate diverse cellular processes. These processes
`have a significant impact on the structure and function of proteins. Disruption in PTMs
`can lead to the dysfunction of vital biological processes and hence to various diseases.
`High-throughput experimental methods for discovery of PTMs are very laborious and
`time-consuming. Therefore, there is an urgent need for computational methods and
`powerful tools to predict PTMs. There are vast amounts of PTMs data, which are publicly
`accessible through many online databases. In this survey, we comprehensively reviewed
`the major online databases and related tools. The current challenges of computational
`methods were reviewed in detail as well.
`
`Introduction
`Posttranslational modifications (PTMs) are covalent pro-
`cessing events that change the properties of a protein by
`proteolytic cleavage and adding a modifying group, such
`as acetyl, phosphoryl, glycosyl and methyl,
`to one or
`more amino acids (1). PTMs play a key role innumerous
`biological processes by significantly affecting the struc-
`ture and dynamics of proteins (2, 3). Generally, a PTM
`can be reversible or irreversible (4). The reversible reac-
`tions contain covalent modifications, and the irreversible
`ones, which proceed in one direction, include proteolytic
`
`modifications (5). PTMs occur in a single type of amino
`acid or multiple amino acids and lead to changes in the
`chemical properties of modified sites (6). PTMs usually
`are seen in the proteins with important structures/functions
`such as secretory proteins, membrane proteins and his-
`tones. These modifications affect a wide range of protein
`behaviors and characteristics, including enzyme function
`and assembly (7), protein lifespan, protein–protein inter-
`actions (8), cell–cell and cell–matrix interactions, molec-
`ular trafficking,
`receptor activation, protein solubility
`(9–14), protein folding (15) and protein localization (16).
`
`Page 1 of 20
`© The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com
`(page number not for citation purposes)
`
`Exhibit 2057
`Page 01 of 20
`
`
`
`Downloaded from https://academic.oup.com/database/article/doi/10.1093/database/baab012/6214407 by Arnold and Porter user on 17 January 2022
`
`Page 2 of 20
`
`Database, Vol. 00, Article ID baab012
`
`Therefore, these modifications are involved in various bio-
`logical processes such as signal transduction, gene expres-
`sion regulation, gene activation, DNA repair and cell cycle
`control (17–19). PTMs occur in various cellular organelles
`including the nucleus, cytoplasm, endoplasmic reticulum
`and Golgi apparatus (5).
`Proximity ligation assay (PLA) is a novel immunoassay
`technology that can be used to study PTMs (20). In addi-
`tion to PLA, immunoprecipitation (IP) is utilized in several
`different PTM detection assays (21). However, the com-
`bination of mass spectrometry with IP strategy is a more
`effective method (22). Nevertheless, large-scale detection of
`PTMs is very costly and challenging. In recent years, com-
`putational methods for predicting PTMs have attracted a
`considerable attention (5, 16, 17, 23–26).
`The rest of this paper is structured as follows. In the
`section ‘The 10 most studied PTMs’, the 10 most stud-
`ied PTMs will be described. Major PTM databases will
`be reviewed in the section ‘The 10 most studied PTMs’
`as well. In the section ‘Involvement of PTMs in diseases
`and biological processes’, involvement of PTMs in diseases
`and biological processes will be discussed. Then, compu-
`tational methods for predicting PTMs will be described in
`the section ‘Computational methods for predicting PTMs’.
`Finally, tools for PTM prediction will be reviewed in the
`section ‘Tools for PTM prediction’.
`
`The 10 most studied PTMs
`There are more than 400 different types of PTMs (27)
`affecting many aspects of protein functions. According
`to the dbPTM (6), one of the most comprehensive PTM
`databases, there are 24 major PTMs, with more than 80
`experimentally verified reported modified sites. Figure 1
`provides a visualized summary of the current major PTM
`data according to the dbPTM. According to Figure 1, we
`can see that some of these major PTMs occur more fre-
`quently and have much more been studied. Three main
`PTMs, based on the dbPTM database, are phosphoryla-
`tion, acetylation and ubiquitination, which comprise more
`than 90% (∼827 000 sites out of ∼908 000) of all the
`reported PTMs Accordingly, each amino acid undergoes at
`least three different PTMs, and Lys undergoes the largest
`number of PTMs (15 PTM types). Moreover, based on
`the whole dbPTM data, Cys and Ser are also modified
`with at least 10 PTM types. Finally, one can see that
`phosphorylation on Ser is the most reported PTM type.
`Figure 1A shows a clustergram, indicating the division
`of the PTMs into four clusters as one can see each phospho-
`rylation, and acetylation has been considered as a separate
`cluster due to their different patterns of modification on the
`
`amino acids. On the other hand, ubiquitination, methyla-
`tion and amidation are the PTMs with many different target
`residues and have been clustered as a group. According to
`the clustergram, amino acids have been divided into five
`clusters. Amino acid Lys is the most different amino acid
`based on the PTM pattern.
`Panels B and C in Figure 1 show the frequency of PTM
`types and amino acids in the dbPTM database in log scale,
`respectively. According to Figure 1, it is observed that phos-
`phorylation, acetylation and ubiquitination are the most
`frequent PTMs.
`Roughly speaking, according to the type of the modi-
`fications, these PTMs can be categorized into three main
`groups. First and second groups are those PTMs that
`include the addition of chemical and complex groups to
`the target residue, respectively. The first group and the
`second group include glycosylation, prenylation, myris-
`toylation and palmitoylation. Those PTMs that contain
`addition of polypeptides to the target residue comprise the
`last group, and these PTMs are ubiquitylation and SUMOy-
`lation. Figure 2 shows a graphical timeline for the discovery
`of these major PTMs. In this timeline, the organisms in
`which each PTM was discovered for the first time also have
`been depicted. In the following subsections, the 10 most
`studied PTMs, out of these major ones, are described in
`more detail.
`
`Phosphorylation
`Protein phosphorylation was first reported in 1906 by
`Phoebus Levene with the discovery of phosphate in the
`protein vitellin (phosvitin) (28). However, it took another
`20 years before Eugene Kennedy described the first enzy-
`matic phosphorylation of proteins (43). This process is an
`important reversible regulatory mechanism that plays a key
`role in the activities of many enzymes, membrane chan-
`nels and many other proteins in prokaryotic and eukaryotic
`organisms (44, 45). Phosphorylation target sites are Ser,
`Thr, Tyr, His, Pro, Arg, Asp and Cys residues (6), but
`this modification mainly happens on Ser, Thr, Tyr and His
`residues (46). This PTM includes transferring a phosphate
`group from adenosine triphosphate to the receptor residues
`by kinase enzymes (Figure 3A). Conversely, dephosphory-
`lating or removal of a phosphate group is an enzymatic
`reaction catalyzed by different phosphatases (47). Phospho-
`rylation is the most studied PTM and one of the essential
`types of PTM, which often happens in cytosol or nucleus
`on the target proteins (48). This modification can change
`the function of proteins in a short time via one of the two
`principal ways: by allostery or by binding to interaction
`domains (49).
`
`Exhibit 2057
`Page 02 of 20
`
`
`
`Database, Vol. 00, Article ID baab012
`
`Page 3 of 20
`
`Downloaded from https://academic.oup.com/database/article/doi/10.1093/database/baab012/6214407 by Arnold and Porter user on 17 January 2022
`
`Figure 1. Summarized information of major PTMs (24 PTMs with more than 80 experimentally verified reported modified sites) according to the
`dbPTM databank (October 2020). All frequencies are shown in log scale. (A) Clustergram indicating the frequency of each PTM on different amino
`acids. (B) Frequency of major PTMs. (C) Frequency of each amino acid that was reported as a modified site.
`
`Phosphorylation has a vital role in significant cellular
`processes such as replication, transcription, environmental
`stress response, cell movement, cell metabolism, apop-
`tosis and immunological responsiveness (12, 50, 51). It
`has been shown that disruption in the pathway of phos-
`phorylation can lead to various diseases such as can-
`cer, Alzheimer’s disease, Parkinson’s disease and heart
`disease (24, 52, 53).
`
`Acetylation
`The first acetylation modification in proteins was discov-
`ered by V.G. Allfrey in 1964 in isolated calf thymus nuclei
`in vitro (31). Acetylation is catalyzed via lysine acetyl-
`transferase (KAT) and histone acetyltransferase (HAT)
`enzymes. Acetyltransferases use acetyl CoA as a cofac-
`tor for adding an acetyl group (COCH3) to the ε-amino
`group of lysine side chains, whereas deacetylases (HDACs)
`
`Exhibit 2057
`Page 03 of 20
`
`
`
`Page 4 of 20
`
`Database, Vol. 00, Article ID baab012
`
`Downloaded from https://academic.oup.com/database/article/doi/10.1093/database/baab012/6214407 by Arnold and Porter user on 17 January 2022
`
`Figure 2. Schematic PTM discovery timeline for 10 major PTMs: phosphorylation (28), methylation (29), sulfation (30), acetylation (31), ubiqui-
`tylation (32), prenylation (33), myristoylation (34), SUMOylation (35), palmitoylation (36), different types of glycosylation (N-glycosylation (37),
`O-glycosylation (38), C-glycosylation (39) and S-glycosylation (40)), phosphoglycosylation (41) and glycosylphosphatidylinositol (GPI anchored)
`(42). For each PTM, target residue(s) and the organism in which the related PTM was discovered for the first time are shown.
`
`remove an acetyl group on lysine side chains (Figure 3B)
`(54). There are three forms of acetylation: Nα-acetylation,
`Nε-acetylation and O-acetylation. Nα-acetylation is an
`irreversible modification, and the other two types of acety-
`lation are reversible (55). These three forms of acetylation
`occur on Lys, Ala, Arg, Asp, Cys, Gly, Glu, Met, Pro,
`Ser, Thr and Val residues with different frequencies (6),
`although the acetylation is more reported on Lysine residue.
`Nε-acetylation is more biologically significant compared to
`the other types of acetylation (55).
`Acetylation has an essential role in biological processes
`such as chromatin stability, protein–protein interaction,
`cell cycle control, cell metabolism, nuclear transport and
`actin nucleation (56–58). According to the available evi-
`dence, acetylated lysine is vital for cell development, and
`its dysregulation would lead to serious diseases such as
`cancer, aging,
`immune disorders, neurological diseases
`(Huntington’s disease and Parkinson’s disease) and cardio-
`vascular diseases (56, 59, 60, 61).
`
`Ubiquitylation
`Ubiquitylation is one of the most important reversible
`PTMs. This modification was firstly studied in 1975 by
`Gideon Goldstein (32). This modification is a versatile PTM
`and can occur on all 20 amino acids (Figure 2). However,
`it occurs on lysine more frequently. This PTM has a major
`role in the degradation of intracellular proteins via the ubiq-
`uitin (Ub)–proteasome pathway in all tissues (62). In ubiq-
`uitylation, a covalent bond befalls between the C-terminal
`of an active ubiquitin protein (a polypeptide of 76 amino
`acids) and Nε of a lysine residue of the protein (63). Ubiq-
`uitin can occur in mono- or poly-ubiquitination forms on
`substrate proteins through specific isopeptide bonds by
`receptors containing ubiquitin-binding domains. Ubiqui-
`tylation is catalyzed by an enzyme complex that contains
`
`ubiquitin-activating (E1), ubiquitin-conjugating (E2) and
`ubiquitin ligase (E3) enzymes (Figure 3C). Ubiquitinated
`proteins may be acetylated on Lys, or phosphorylated on
`Ser, Thr or Tyr residues, and lead to dramatically alter-
`ing the signaling outcome (64). Ubiquitylation modification
`in substrate proteins can be removed by several specialized
`families of proteases called deubiquitinases (64).
`Ubiquitination plays important roles in stem cell preser-
`vation and differentiation by regulation of the pluripotency
`(65). Ubiquitylation has also played a vital role in many
`various cell activities such as proliferation, regulation of
`transcription, DNA repair, replication, intracellular traf-
`ficking and virus budding, the control of signal transduc-
`tion, degradation of the protein, innate immune signaling,
`autophagy and apoptosis (12, 66, 67). Dysfunction in
`the ubiquitin pathway can lead to diverse diseases such
`as different cancers, metabolic syndromes, inflammatory
`disorders, type 2 diabetes and neurodegenerative diseases
`(68–70).
`
`Methylation
`Research on methylation dates back to 1939 (29). Nonethe-
`less, just recently, with the identification of new methyl-
`transferases (such as protein arginine methyltransferases
`(PRMTs), and histone lysine methyltransferases (HKMTs)),
`has attracted more and more attention (71). Methylation is
`a reversible PTM, which often occurs in the cell nucleus and
`on the nuclear proteins such as histone proteins (1, 72).
`Methylation occurs on the Lys, Arg, Ala, Asn, Asp, Cys,
`Gly, Glu, Gln, His, Leu, Met, Phe and Pro residues in tar-
`get proteins (6). However, lysine and arginine are the two
`main target residues in methylation, at least in eukaryotic
`cells (73, 74). One of the most biologically important roles
`of methylation is in histone modification. Histone proteins,
`after synthesis of their polypeptide chains, are methylated
`
`Exhibit 2057
`Page 04 of 20
`
`
`
`Database, Vol. 00, Article ID baab012
`
`Page 5 of 20
`
`Downloaded from https://academic.oup.com/database/article/doi/10.1093/database/baab012/6214407 by Arnold and Porter user on 17 January 2022
`
`Figure 3. Schematic illustration of the 10 most studied PTMs including Phosphorylation (A), Acetylation (B), Ubiquitylation (C), Methylation (D),
`N-glycosylation (E), O-glycosylation (F), SUMOylation (G), S-palmitoylation (H), N-myristoylation (I), Prenylation (J), and Sulfation (k).
`
`Exhibit 2057
`Page 05 of 20
`
`
`
`Page 6 of 20
`
`Database, Vol. 00, Article ID baab012
`
`Downloaded from https://academic.oup.com/database/article/doi/10.1093/database/baab012/6214407 by Arnold and Porter user on 17 January 2022
`
`Figure 3. (continued)
`
`at Lys, Arg, His, Ala or Asn residues (75). Nε-lysine methy-
`lation is one of the most abundant histone modifications
`in eukaryotic chromatin, which includes transferring the
`methyl groups from S-adenosylmethionine to histone pro-
`teins via methyltransferase enzyme (Figure 3D). In eukary-
`otes, methylated arginine has been observed in histone and
`non-histone proteins (76).
`Recent studies have shown that methylation is associ-
`ated with fine tuning of various biological processes ranging
`
`from transcriptional regulation to epigenetic silencing via
`heterochromatin assembly (77). Defect in this modifica-
`tion can lead to various diseases such as cancer, men-
`tal retardation (Angelman syndrome), diabetes mellitus,
`lipofuscinosis and occlusive disease (12, 78, 79).
`
`Glycosylation
`is glyco-
`One of the most complex PTMs in the cell
`sylation, which is a reversible enzyme-directed reaction
`
`Exhibit 2057
`Page 06 of 20
`
`
`
`Downloaded from https://academic.oup.com/database/article/doi/10.1093/database/baab012/6214407 by Arnold and Porter user on 17 January 2022
`
`Database, Vol. 00, Article ID baab012
`
`Page 7 of 20
`
`(12). Glycosylation occurs in multiple subcellular loca-
`tions, such as endoplasmic reticulum, the Golgi apparatus,
`cytosol and the sarcolemma membrane (80). Glycosyla-
`tion occurs in eukaryotic and prokaryotic membranes and
`secreted proteins, also nearly 50% of the plasma pro-
`teins are glycosylated (14). In this modification, oligosac-
`charide chains are linked to specific residues by covalent
`bond (see Figures 3E and F). This enzymatic process,
`which is catalyzed by a glycosyltransferase enzyme, usu-
`ally occurs in the side chain of residues such as Trp,
`Ala, Arg, Asn, Asp, Ile, Lys, Ser, Thr, Val, Glu, Pro,
`Tyr, Cys and Gly (6); however,
`it occurs more fre-
`quently on Ser, Thr, Asn and Trp residues in proteins and
`lipoproteins (13). According to the target residues, glyco-
`sylation can be classified into six groups: N-glycosylation,
`O-glycosylation, C-glycosylation, S-glycosylation, phos-
`phoglycosylation and glypiation (GPI-anchored) (5, 12).
`N-glycosylation and O-glycosylation are two major types
`of glycosylation and have important roles in the mainte-
`nance of protein conformation and activity (81).
`Glycosylation has a great role in many important bio-
`logical processes such as cell adhesion, cell–cell and cell–
`matrix interactions, molecular trafficking, receptor activa-
`tion, protein solubility effects, protein folding and signal
`transduction, protein degradation, and protein intracel-
`lular trafficking and secretion (9–14). It has been shown
`that the defect in this process has a significant effect on
`the development of various diseases like cancer, liver cir-
`rhosis, diabetes, HIV infection, Alzheimer’s disease and
`atherosclerosis (12, 14, 82).
`
`SUMOylation
`Small Ubiquitin-Related Modifier (SUMO) protein was pri-
`marily discovered in 1996 by Rohit Mahajan in the Ran
`GTPase-activating protein (RanGAP) (35). SUMOylation
`takes place via SUMO (83) that has a three-dimensional
`structure similar to ubiquitin protein and has been dis-
`covered in a wide range of eukaryotic organisms (84).
`SUMOylation can occur in both cytoplasm and nucleus on
`lysine residues (85). SUMO family has three isoforms in
`mammals, four isoforms in humans, two isoforms in yeasts
`and eight isoforms in plants (1). SUMOylation occurs as
`a modifier in ε-amino group of lysine residues in target
`protein through a multi-enzymatic cascade (86). In this
`reaction, SUMO is connected to a lysine residue in substrate
`protein by covalent linkage via three enzymes, namely acti-
`vating (E1), conjugating (E2) and ligase (E3). Also, it is
`separated from the target protein by a specific enzyme
`protease—SUMO (Figure 3G) (87). Often, SUMOylation
`modifications occur at a consensus motif WKxE (where W
`represents Lys, Ile, Val or Phe and X any amino acid) (88).
`
`SUMOylation plays a major role in many basic cellular
`processes like transcription control, chromatin organiza-
`tion, accumulation of macromolecules in cells, regulation
`of gene expression and signal transduction (89, 90). It is
`also necessary for the conservation of genome integrity
`(91). Also,
`there are many reports on major role of
`SUMOylation in development of a variety of human
`diseases including cancer, Alzheimer’s disease, Parkin-
`son’s disease, viral infections, heart diseases and diabetes
`(83, 91–93).
`
`Palmitoylation
`An important class of PTMs, called lipidation, includes
`covalent attachment of lipids to proteins. The first report of
`the covalent modification of proteins with lipids dates back
`to 1951 (94). These PTMs are taken place via a great vari-
`ety of lipids like octanoic acid, myristic acid, palmitic acid,
`palmitoleic acid, stearic acid, cholesterol, etc. Myristoyla-
`tion, palmitoylation and prenylation can be considered as
`the three main types of these lipid modifications (95, 96).
`Palmitoylation is described in this subsection, and the
`other two important ones are described in the subsequent
`subsections.
`Palmitoyltransferases (PATs) were first identified in yeast
`in 1999 by Doug J. Bartels (36). Palmitoylation is the
`covalent attachment of fatty acids, like palmitic acid on
`the Cys, Gly, Ser, Thr and Lys (6). S-palmitoylation con-
`tains a reversible covalent addition of a 16-carbon fatty
`acid chains, palmitate, to a cysteine via a thioester linkage
`(Figure 3H) (97). Palmitoyl-CoA (as the lipid substrate) is
`attached to the target protein by a PAT and removed via
`acyl protein thioesterases (98).
`Mostly, S-palmitoylation occurs in eukaryotic cells and
`plays critical roles in many different biological processes
`including protein function regulation, protein–protein
`interaction, membrane–protein associations, neuronal
`development, signal transduction, apoptosis and mitosis
`(98–100). Dysfunction of palmitoylation has been linked
`to many diseases including neurological diseases (Hunting-
`ton’s disease, schizophrenia and Alzheimer’s disease) and
`different cancers (101–105).
`
`Myristoylation
`Myristoylation (N-myristoylation) was discovered by Alas-
`tair Aitken in 1982,
`in bovine brain (34). Although
`often refers to myristoylation as a PTM, it usually occurs
`co-translationally (106). This modification is an irreversible
`PTM that occurs mainly on cytoplasmic eukaryotic pro-
`teins. Myristoylation has been reported in some integral
`membrane proteins as well (107). Myristoylation happens
`approximately in 0.5–1.5% of eukaryotic proteins (108).
`
`Exhibit 2057
`Page 07 of 20
`
`
`
`Downloaded from https://academic.oup.com/database/article/doi/10.1093/database/baab012/6214407 by Arnold and Porter user on 17 January 2022
`
`Page 8 of 20
`
`Database, Vol. 00, Article ID baab012
`
`In myristoylation after removal of the initiating Met, a
`14-carbon saturated fatty acid, called myristic acid,
`is
`attached to the N-terminal glycine residue via a covalent
`bond (Figure 3I) (109). This attachment is often observed
`in Met-Gly-X-X-X- Ser/Thr motif and is catalyzed by an
`N-myristoyl transferase (NMT) (there are at least two
`types of NMT enzymes, NMT1 and NMT2, in humans)
`(109, 110). Myristoylation occurs more frequently on Gly
`and less frequently on Lys residues (6).
`Proteins that undergo this PTM play critical roles in reg-
`ulating the cellular structure and many biological processes
`such as stabilizing the protein structure maturation, signal-
`ing, extracellular communication, metabolism and regula-
`tion of the catalytic activity of the enzymes (109, 110). The
`role of myristoylation has been proved in the development
`and progression of various diseases such as cancer, epilepsy,
`Alzheimer’s disease, Noonan-like syndrome, and viral and
`bacterial infections (111).
`
`Prenylation
`The first study on prenylation was done in 1978 by Yuji
`KamiIya et al. in yeast (33). It is another important lipid-
`based PTM, which occurs after translation as an irre-
`versible covalent linkage mainly in the cytosol (112). This
`reaction occurs on cysteine and near the carboxyl-terminal
`end of the substrate protein (113). Prenylation has two
`main forms: farnesylation and geranylation (114). These
`two forms contain the addition of two different types of iso-
`prenoids to cysteine residues: farnesyl pyrophosphate (15-
`carbon) and geranylgeranyl pyrophosphates (20-carbon),
`respectively. In prenylated proteins, one can find a consen-
`sus motif at the C-terminal; the motif is CAAX where C is
`cysteine, A is an aliphatic amino acid and X is any amino
`acid (115). This process is catalyzed by three prenyltrans-
`ferase enzymes: farnesyltransferase (FT) and two geranyl
`transferases (Figure 3J) (GT1 and GT2) (48).
`The prenylation is known as a crucial physiological
`process for facilitating many cellular processes such as
`protein–protein interactions, endocytosis regulation, cell
`growth, differentiation, proliferation and protein traffick-
`ing (115–117). Observations showed that disruption in
`this modification plays crucial roles in the pathogenesis
`of cancer (114), cardiovascular and cerebrovascular dis-
`orders, bone diseases, progeria, metabolic diseases and
`neurodegenerative diseases (118, 119).
`
`Sulfation
`Sulfation was first discovered by Bruno Bettelheim in
`bovine fibrinopeptide bin in 1954 (120). Residues Tyr, Cys,
`and Ser have been identified as target residues for preny-
`lated proteins (6). Often, the target residue of this PTM
`
`is tyrosine, which happens in the trans-Golgi network.
`N-sulfation or O-sulfation includes the addition of a neg-
`atively charged sulfate group by nitrogen or oxygen to an
`exposed tyrosine residue on the target protein (121, 122).
`Currently, PTS is observed mainly in secreted and trans-
`membrane proteins in multicellular eukaryotes and have
`not yet been observed in nucleic and cytoplasmic proteins
`(121). This reaction is catalyzed by two transmembrane
`enzymes, tyrosyl protein sulfotransferases 1 and 2 (TPST1
`and TPST2) (30). TPSTs govern the transfer of an acti-
`vated sulfate from 3-phospho adenosine 5-phosphosulfate
`to tyrosine residues within acidic motifs of polypeptides
`(Figure 3K) (121).
`Recently, it has been observed that PTS has vital roles
`in many biological processes like protein–protein interac-
`tions, leukocyte rolling on endothelial cells, visual functions
`and viral entry into cells (123). This PTM involves in many
`diseases like autoimmune diseases, HIV, lung diseases and
`multiple sclerosis (12).
`
`Involvement of PTMs in diseases and
`biological processes
`PTMs have a vital role in almost all biological processes
`and fine-tune numerous molecular functions. Therefore,
`the footprints of disruption in PTMs can be seen in many
`diseases. Figure 4A shows a tripartite network of PTM
`involvement in diseases and biological processes for the 10
`abovementioned PTMs. This network contains 97 diseases
`and 153 biological processes. Panels B and C in Figure 4
`show the biological processes with degree ≥3 (those bio-
`logical processes that interact with at least three different
`PTMs) and diseases with degree ≥2, respectively.
`As it is shown in Figure 4C, neurodegenerative disease is
`the major group of diseases, which is affected by the disrup-
`tion in the PTMs (Alzheimer’s disease, Parkinson’s disease
`and Huntington’s disease). Besides, one can see that cancer
`is also one of the most affected diseases. Consistently with
`this observation, the biological processes related to cancer
`are among the high-degree nodes (signaling, DNA repair,
`control of replication and apoptosis). Processes related to
`apoptosis, protein–protein interaction, signaling, cell cycle
`control, chromatin assembly, organization and stability,
`DNA repair, protein degradation, protein trafficking and
`targeting, regulation of gene expression and transcrip-
`tion control are the other high-degree biological processes.
`Moreover, we can say that ubiquitylation, prenylation, gly-
`cosylation, S-palmitoylation and SUMOylation have the
`most involvement in diseases. On the other hand, the PTMs
`with the highest number of interactions with biological pro-
`cesses are phosphorylation, ubiquitylation, methylation,
`acetylation and SUMOylation. Putting all together, we can
`
`Exhibit 2057
`Page 08 of 20
`
`
`
`Database, Vol. 00, Article ID baab012
`
`Page 9 of 20
`
`Downloaded from https://academic.oup.com/database/article/doi/10.1093/database/baab012/6214407 by Arnold and Porter user on 17 January 2022
`
`Figure 4. Involvement of PTMs in diseases and biological processes. (A). Tripartite network of PTM involvement in diseases and biological processes
`for the 10 major PTMs. (B) The degree of the biological processes with degree ≥3 in the tripartite network. (C) The degree of the diseases with
`degree ≥2 in the tripartite network. (D) Involvement of PTMs in disease and biological processes.
`
`conclude that the disruption in the pathways of these five
`PTMs has a great impact on the normal functioning of the
`cell and, as the result, on the organisms
`
`Main PTM databases
`Due to the considerable cost and difficulties of experimental
`methods for identifying PTMs, recently many computa-
`tional methods have been developed for predicting PTMs
`(124). Almost all of these methods need a set of experimen-
`tally validated PTMs to build a prediction model. There-
`fore, the availability of valid public databases of PTMs
`is the first step toward this end. There are a variety of
`
`such public databases that could be utilized easily by the
`scientific community for developing computational meth-
`ods (17, 124).
`According to the scope and diversity of the covered
`PTMs, these databanks can be classified into two main
`groups: general databases and specific databases. The gen-
`eral databases contain different types of PTMs, regardless
`of target residue and organisms. These databases provide
`a broad scope of information for various PTMs. On the
`other hand, specific databases have been created based
`on some certain types of PTMs, certain characteristics of
`PTMs and/or specific target residues.
`
`Exhibit 2057
`Page 09 of 20
`
`
`
`Page 10 of 20
`
`Database, Vol. 00, Article ID baab012
`
`Downloaded from https://academic.oup.com/database/article/doi/10.1093/database/baab012/6214407 by Arnold and Porter user on 17 January 2022
`
`Figure 5. Bubble chart for PTM databases. The chart was drawn based on three parameters for the databases: the number of stored modified proteins,
`the number of modified sites and the number of covered PTM types.
`
`The current public PTM databases are greatly differ-
`ent in the number of stored modified proteins, the number
`of modified sites and the number of covered PTM types.
`Figure 5 shows a bubble chart of main PTM databases
`according to these three parameters. As it is evident from
`the figure, due to the extensive number of studies on phos-
`phorylation, the specific databases are mainly focused on
`phosphorylation. From this point of view, glycosylation
`is the second most interested PTM. In the following, the
`five largest databases are described briefly. Also, Table 1
`summarizes the current main public PTM databases.
`The EPSD (Eukaryotic Phosphorylation Site Database)
`contains the largest number of PTM sites. EPSD contains
`more than 1 600 000 experimental phosphorylation sites in
`more than 209 000 phosphoproteins across 68 eukaryotes,
`including 18 animals, 7 protists, 24 plants and 19 fungi
`(125).
`dbPTM (Database Post-translational modification) is a
`comprehensive database that has collected experimental
`PTMs’ data from 30 public databases and 92 648 research
`articles. dbPTM contains ∼908 000 experimentally verified
`sites for more than 130 types of PTMs from different organ-
`isms (6). This database is the largest database in terms of
`the number of recorded proteins and also in terms of the
`number of stored PTM types (Figure 5).
`
`BioGRID (The Biological General Repository for Inter-
`action Datasets)
`is another major open access PTM
`database. In addition to protein and genetic interactions,
`it also holds data on ∼726 000 phosphorylation sites in
`∼ 72 000 proteins, which were extracted from 4742 pub-
`lications for 71 major model organisms (126).
`PSP (PhosphoSitePlus) is an online resource for study-
`ing experimentally observed PTMs such as phosphoryla-
`tion, ubiquitinylation and acetylation. PSP is comprised
`of ∼484 000 PTM sites for more than 7 PTM types from
`26 species. However, the major amount of its data are
`extracted from human, mouse and rat (127).
`The qPTM database contains 10 types of PTMs for
`∼296 900 sites in more than 19 600 proteins under 661
`conditions that are collected and integrated into a database
`(128).
`
`Computational methods for predicting PTMs
`Generally speaking, any computational method for pre-
`dicting a specific type of PTM has four main steps: data
`gathering, feature extraction, learning the predictor and
`performance assessment. These steps have been schemat-
`ically shown in Figure 6. In the following, these steps
`are described in detail. Also, the related challenges and
`problems in each step are discussed as well.
`
`Exhibit 2057
`Page 10 of 20
`
`
`
`Downloaded from https://acade