`
`Gene 299 (2002) 185-193
`
`GENE
`
`Af.l 1'-ITERNATIONAL JOURNAL. ON
`GENES ANO GENOr-'IES
`
`ViW\v.elsevler.com/locatelgene
`
`Identification and characterization of human DPP9, a novel homologue of
`dipeptidyl peptidase IV*
`
`Christina Olsen Nicolai Wautmann*
`
`'
`
`0
`
`Departmenl of' Cloninf{ Teclmolof{y and lmmwwlof{y, Novo Nm·disk AIS, Novo Aile 6B2. 98, DK-2880 Bagsvaerd, Denmark.
`
`Received 23 July 2002; received in revised form 13 September 2002; accepted 2 October 20()2
`Received by R. DiLauro
`
`Abstract
`
`We used an in silica approach to identify new cDNAs with homology to dipeptidyl peptidase IV (DPP IV). DPP IV (EC 3.4.14.5) is a
`serine protease with a rare enzyme activity having an important role in the regulation of various processes, such as blood glucose control and
`irmnune responses. Here, we report the identification and characterization of a novel DPP IV-like molecule, termed dipeptidyl peptidase-like
`protein 9 (DPP9). The deduced amino acid sequence of DPP9 has a serine protease motif, G'VSYG, identical to that found in DPP IV. The
`presence of this motif, together with a conserved order and spacing of the Ser, Asp, and His residues that form the catalytic triad in DPP TV,
`places DPP9 in the "DPP IV gene family'. Northern blots showed that DPP9 is ubiquitously expressed, with the highest expression levels in
`skeletal muscle, heart, and liver, and the lowest in brain. In vitro translation of the cloned full-length DPP9 eDNA resulted in a DPP9 protein
`product that migrated in sodium dodecyl sulfate-polyacrylamide gel electrophoresis at a position similar to the predicted protein size of 98
`kDa. Consistent with the lack of predicted transmembrane domains and a signal sequence, DPP9 was found in a soluble, putative cytosolic
`form. A DPP9 orthologue in mice was identified by expressed sequence tag database searches and verified by eDNA cloning. «'' 2002
`Elsevier Science B.V. All rights reserved.
`
`Keywords: Bioinformatics; DPP IV gene family; S9 family; Serine protease motif
`
`1. Introduction
`
`mpeplidyl pepLidase IV (DPP IV, CD26) is a non(cid:173)
`classical serine-peptidase that cleaves off N -terminal
`dipeptides !rom proteins having a proline or alanine al
`amino acid position two (Marguct et aL, 1992). In
`mammals, DPP IV is expressed on a variety of epithelial,
`endotheliaL and lymphoid cell types and is also found as a
`soluble form
`in biological fluids. Many chemokines,
`neuropeptides, and peptide hormones are substrates of
`
`A.hhreriatinns: aa, ammo acid(s); bp, base pair(s); eDNA DNA
`complementary to RNA; DPP IV, dipeptidyl peptidase IV; DPPS,
`dipeptidyl peptiddse 1-l; DPP9, dipeptidyl peptidase-like protein lJ; EST,
`tag; FAP, fibroblast-activation protein; nt,
`expTessed sequence
`nucleotide-(sJ; ORF, open reading frame; QPP, quiescent cell proline
`dipeptidase; Ri\CE, rapid amplification of eDNA ends; SDS-PAGE,
`sodium dodecyl sulfate-polyacrylamide gel electrophore.,is; UTR,
`untranslated region.
`·c The nucleotide sequence of the DPP9 cON A has been submitted to
`GenBank under acce«ion number AF4521 02.
`Corresponding author. Tel.: +45-4442-6972; fax: +45-4449-0555.
`E-mail address: wagt@novonordisk.com (N. Wagtmann).
`
`DPP IV and it has been suggested that DPP IV panicipates
`in regulating various physiological processes by cleaving
`lhese subslrales (Memlein, 1999). For example, by cleaving
`peptide hormones involved in stimulating insulin secretion,
`DPP IV contributes to physiological blood glucose regu(cid:173)
`lation (Marguet eta!., 2000). There is clinical evidence for
`the efficiency of DPP IV-inhibition in terms of lowering
`blood glucose levels in type II diabetics (Ahren et aL, 2002),
`and other therapeutic benefits of DPP TV inhibition may be
`found in the future.
`Since the initial discovery of the DPP IV enzyme activity
`(Hopsu-Havu and Gleoner, 1966) and the molecular cloning
`of its eDNA (Marguet et al., 1992), biochemical evidence
`has suggested the existence of other DPP IV-like enzymes.
`DPP IV-likc enzyme activities can be measured in some
`lymphocytes that are negative for DPP IV expression
`(Huhlmg et al., 1995: Ruiz et al., 1998; Hlanco et aL 1998).
`Recently, it has been observed lhal some DPP IV-inhibitors
`cause apoptosis in resting DPP IV-negative lymphocytes,
`which suggests that a DPP IV-like enzyme plays an
`important role in the survival of resting cells (Underwood
`et al., 1999; Cb.iravuri et al., 1999).
`
`0141-933/02/$- sec front matter!(;; 2002 Elsevier Science B.V. All rights reserved.
`Pll: SU37ll-lll'J(U2)01059-4
`
`AstraZeneca Exhibit 2179
`Mylan v. AstraZeneca
`IPR2015-01340
`
`Page 1 of 9
`
`
`
`186
`
`C. Olsen, N. WagJmarm I Gene 299 (2002) 785-793
`
`Fischer 344/CRJ rats lack catalytically active DPP IV
`molecules, due to an active-site mutation (Giy 033
`---+ Arg)
`(\Vatanabe et al., 1987). Although the DPP TV enzyme
`activity is significantly reduced in Fischer 344/CRJ rats,
`Tavares ct al. (2000) reported the presence of 40% DPP TV(cid:173)
`like enzyme activity in the plasma from these rats compared
`to wild-types. Similarly, analyses of DPP IV-I- mice have
`shown that they possess about 20% DPP TV-like activity in
`plasma (Marguet et al., 2000). These findings of residual
`DPP IV-like activity in mutant animals, suggest the
`presence of a DPP TV-like enzyme in the plasma of these
`animals. Thus, it appears that distinct DPP IV-like enzymes
`exist, which have a substrate and inhibitor profile similar to
`DPP TV.
`DPP TV is a member of the S9b family (also called the
`'DPP IV gene family'), which is a subfamily of the S9
`family of serine peptidases. The S9 family contains
`cytosolic as well as membrane-bound peptidases. The
`cytosolic enzymes include prolyl oligopeptidase, acylami(cid:173)
`noacyl-peptidase and dipeptidyl peptidase 8 (DPP8) (Barrett
`and Rawlings, I fJIJ2; Abbott et al., 2000b). It has been
`suggested that all the members of the S9 family have similar
`three-dimensional structures containing a ~-propeller
`domain and an am-hydrolase domain (Abbott ct al., 2000a).
`Two other members of the S9b subfamily have been
`reported to have the same rare enzyme specificity as DPP
`IV, namely fibroblast-activation protein (FAP) (Scanlan
`et al., 1994-) and DPP8 (Abbott et al., 2000b). These
`enzymes all share the motif GWSYG surrounding the
`catalytic serine residue. Furthennm·e, proteases in other
`enzyme families, 1 ike quiescent cell proline dipeptidase
`(QPI-', DPPTT), have DPP TV-like enzyme activities and
`appear sensitive to some of the DPP IV inhibitors despite
`lack of primary homology to DPP IV (Underwood et a1.,
`1999; Fukasawa ct al., 2001).
`The characterizations of novel DPP IV-like proteins,
`which have substrate specificities similar to DPP IV, or that
`cross-react with DPP TV inhibitors, raise the possibility that
`some of the functions suggested for DPP IV could in fact
`have been due to other molecules. To clarify this, it is
`necessary to identify the genes and produce recon1binant
`forms of these novel DPP IV-like molecules. Therefore, the
`aim of this study was to identify novel DPP TV-related genes
`that may encode DPP TV-like enzymes.
`We have identified and characterized a novel member of
`the S9b subfamily and named the molecule DPP9, for
`dipeptidyl peplidase-like protein 9. Sequence homology to
`members of the S9b family suggests that DPP9 may have
`DPP TV-like activity.
`
`2. Materials and Inethods
`
`2.1. Bioinfonnatic.s analy.si.s
`
`A protein query sequence, corresponding to the o:/~-
`
`hydrolase domain of human DPP IV, was used for searches
`against translations of all reading frames (tBLASTn search)
`in dbEST sequences (http://www.ncbi.nlm.nih.gov/blast).
`Sequence data were analysed using the Vector NTI program
`package {version 6, lnforMax). The 51-flanking sequence of
`DPP9 (AC005783) was analysed for predicted CpG islands
`using GrailEXP version 3.31 available at http://combio.ornl.
`gov, and putative tran~cription factor-binding sites were
`found using NSTTE available at http://genomic.sanger.ac.
`uk:Jgf/gf.shtml. Exon and promoter predictions were con(cid:173)
`ducted using the programs available at http://cbs.dtu.dk/
`services, http://www.cxpasy.ch/tools/dna.html and http://
`argon.cshl.urg/genefinder.
`Analysis of the DPP9 protein sequence and detection of
`conserved protein domains were done with PFAM,
`PROSITE, SMART, BLOCKS, PRODOM and PREDICT(cid:173)
`PROTEIN available at http://www.motif.genome.ad.jp,
`http://www.expasy.<:h/prosite, http://www.cbs.dtu.dk/ser(cid:173)
`vices, and http://coot.embel-hcidelberg.de/SMART.
`
`2.2. RACE PCR and eDNA cloning
`
`A modified procedure of the 5 1-RACE technique,
`SMART RACE (Clontech) was used to identify the
`transcription start site of DPP9. Briefty, poly(A)+ RNA
`from human skeletal muscle or from the human chronic
`myelogenous leukemia K-562 cell line (Clontech), were
`reverse-transcribed with ThermoScript reverse transcriptase
`{Gibco-BRL). The Advantage-GC 2 kit (Clontech) wa~ m.ed
`for the SMART 51-RACE PCR. The reactions were carried
`out in a 50 ~1 reaction mixture containing 2.5 ~1 of the
`synthesized SMART 51-RACE eDNA, 0.5 M GC-melt, 10
`~I SX GC-buffer, 0.2 mM dNTPs, 0.5 jJuM of primers and I
`~l of Advantage GC 2 polymerase. The sequences of the
`
`DP P9 specific reverse primers were: 51- TACTTGCGGCTG
`, 51-CACGAGTGCTTCTGCACCTGG-31
`CCGTGGAT-31
`and 51-CGGTGGTGGCCATCCTCTCA-31. The thermal
`cycling was conducted using the following program: an
`initial denaturation step at 95 °C for 1 min was followed by
`ten cycles (95 oc for 5 s, 68 oc for 30 s, 72 oc for 1 min) and
`by 30 cycles (95 oc for 5 s, 66 oc for 30 s, 68 "C for 1 min); a
`final elongation was done at 68 oc for 10 min. The resulting
`PCR products, after gel purification with a GFX kit
`(Pharmacia Amersham), were cloned using a TOPO TA
`cloning kit (Invitrogen). All the cloned 51-RACE products
`were fully sequenced.
`Full-length DPP9 transcripts were PCR amplified using
`total first-strand cDNAs tram human skeletal muscle and
`from the human K-562 cell line as templates. cDNAs were
`sequenced and compared to genomic DPP9 sequences
`(AC005783 and AC005594). The following primers were
`(5 1
`used
`to amplify
`the ORF:
`forward primer
`-
`ACCGTGAGGCGCCGCTGGA-31) and reverse pnmer
`(51-ATACCTCTGAGCCTGCCCA-31), generating a 2836
`bp DPP9 eDNA product.
`
`Page 2 of 9
`
`
`
`C. Olser~ N Wagtmann I Gene 299 (2()02 I 185-193
`
`187
`
`2.3. Northern blotting
`
`Human multiple tissue Northern blots (Clontech) con(cid:173)
`taining mRKA from various tissues and cell lines were
`hybridized with a DPP9-specitic eDNA probe. A human 13-
`adin eDNA lragmenl was used as control probe. Probes
`were labeled using the Prime-It II random primer labeling
`kit (Stratagene). The membranes were hybridized with the
`32P-labeled probe in Quickhyp solution (Stratagene) for 2 h
`at 68 °C. Hereafter, the membranes were washed with 2X
`SSC and 0.1% SDS at 42 cc for 30 min. with 0.2X SSC and
`0.1% SDS at 65 oc for 30 min and once again with 0.2x SSC
`and 0.1% SDS at 6~ oc for about 20 min.
`
`2.4. DNA sequencing
`
`The sequences of both strands of cloned fragments were
`
`determined by use of a MegaBACE 1000 (Pharrnacia
`Amcrsham), using synthetic oligonucleotide primers
`(T-A-G, Copenhagen, Denmark).
`
`2.5. In vitro transr:riptinnltranslatinn nf DPP9 eDNA
`
`The TNT rabbit reticulocyte lysate system (Pt·omega)
`was used, following the manufacturer's instructions, for
`synthesis ofradiolabeled DPP9 protein. This system couples
`transcription and translation of cDNAs inserted downstream
`from a T7 promoter. In vitro translation of DPP9 was
`performed in the presence of [35Slmethionine (I 000
`Ci/mmol) (Pham1acia Amersham). The protein product
`was resolved by SDS-PAGE using a 7% Tris-Acetat mini
`gel (NuPage,
`Invitrogen), and was visualized by
`autoradiography.
`
`AGGCCGCCGCCTGGGTCGCTCAACTTCCGGGTCAAAGGTGCCTGAGCCGGCGGGT:.::CCCTGTGTCCGCCGCGGCTGTCG'l'CCCCCGCTCCCGCCAC:TTCCGGGGTCGCAGTCCCGGGCA
`'TGGAGCCGCGA.CCGTGAGGCGC:GCTGGA.CCCGGGACGACCTGCCCAGTCCGGCC3C:CGCCCCACGTCCCGGTCTGTGTCCCACGCCTGCAGCTGGAATGGAGGCTCTCTGGACCCTT'l'A
`120
`2~0 GhhGGCACCCCTGCCCTCCTGA3GTCAGCTGAGCGGT7AATGGGGAhGGTTAAGAAA=TGCGCCTGG~~~AGGAGAACACCGGAAGTTGGAG&~GCTTCTCGCTG~~JTCCGAGGGGGC:
`M A T T G T P T A D R G D A A A T D D P A A R F Q V Q K H S W D G L R S
`I
`I
`360 GAGAGGATGGCCACCACCGGGA=CCCAACGGCCGACCGAGGCCACGCAGCCGCCACAGATGACCCGCCCGCCCGCTTCCAGG~GCAGAAGCACTCGTGGGACGGGCTCCGGAGCATCATC
`F V Q K T D E
`R K Y
`P H S H R L Y Y L G M p
`H G S
`S G L
`V N K A
`P H D
`F
`0
`S G
`480 CACGGCAGCCGCAAGTACTCGGGCCTCATTGTCAACAAGGCGCCCCACGACTTCCAGTTTGTGCAGAAGACGGATGAGTCTGGGCCCCACTCCCA~CGCCTCTACTACCTGGGAATGCCA
`YGSRENSLLYSE IPKKVRKEALLLLSWKOMLDHFQATPHH
`600 TATGGCAGCCGAGAGAACTCCCTCCTCTACTCTGAGA7TCCCAAGAAGGTCCGGAAAGAGGCTCTGCTGCTCCTGTCCTGGAAGCAGATGCTGGATCATTTCCAGGCCACGCCCCACCA7
`G V Y
`S R E
`S
`E
`L
`L R
`E R K R
`I. G V
`F
`I
`1'
`S Y
`l) F H
`S
`S G
`L
`F
`L
`F Q A
`S N
`S
`F.
`r;
`720 GGGGTCTAC~CTCGGGAGGAGGAGCTGCTGAGGGAGCGGAAACGCCTGGGGGTCTTCGGCATCACCTCCTACGAC1TCCACAGCGAGAGTGGCCTCTTCCTCTTCCAGGCCAGCAACAGC
`L
`F H C R D G
`::.; K N G
`f' M V S
`P M K
`P
`L E
`I K T Q
`C
`S G
`P R M D
`P K
`I
`C
`tJ A D
`P
`840 CTC'l'TCCAC':GCCGCGACGGCG3CAAGAACGGCTTCATGGTGTCCCCTATGAAACCGCTGGAAATCAAGACCCAG1'GCTCAGGGCCCCGGATGGAC:::CcAAAA.TCTGCCCTGCCGACCCr:..'
`A
`F
`F
`S ':;"I~ N
`S D
`L W VAN IE I
`8
`E ERR L T
`F C H Q G L
`S NV L D D
`P K
`S
`960 GCCTTCTTC7CCTTCATCAATAACAGCGACCTGTGGGTGGCCAACATCGAGACAGG~GAGGAGCGGCGGCTGACCTTCTGCCACCAAGGTTTATCCAATGTCCTGGPJGACCCCAAGTC:
`AGVATF "V IQEEFDRFTGYWWCPTASWEGSEGLKTLR ILYE
`lOSO GCGGGTGTGGCCACCTTCGTCATACAGGAAGAGTTCGACCGC7TCACTGGGTACTGGTGGTGCCCCACAGCCTCCTGGGAAGGTTCAGAGGGCCTCAAGACGCTGCGAATCCTGTATGAG
`EVDESEVEV !HVPSPALEERKTDSYRYPRTGSKNPK IALK
`1200 GAACTCGATGAGTCCGAGGTGGAGGTCATTCACGTCCCCTCTCC~GCGCTAGA~GAAAGGAAGACGGACTCGTATCGGTACCCCAGGACAGGCAGCAAGAATCCCFAGATTGCCTTGAAA
`L A E F Q T D S Q G K
`I V S T Q E K E L V Q P F S S L F P K V E Y
`I A R A G W T
`1320 CTGGCTGAGTTCCAGACTGACAGCCAGGGCAAGATCGTCTCGACCCAGGAGAAGGAG~TGGTGCAGCCCTTCAGCTCGCTGT7CCCGAAGGTGGAGTACATCGCCAGGGCCGGGTGGACC
`'Iii
`T
`R D G K
`Y A
`A M
`F
`1
`D R
`0 W
`L
`V
`I. L
`P
`P A
`F
`T
`S
`E N
`E
`L
`A
`P Q
`E Q R
`f,
`?
`I.
`Q
`14ot10 CGGGAiGGCAAATACGCCTGGGCCATG"''TCCTGGACCGGCCCCAGCAGTGGCTCC:A3:.=TCGTCCTCCTCCCCCCGGCCCTGT7CATCCCGAGCACASAGAATGAGGAGCAGCGGCTAGCC.
`e·
`1: V T N V W L N V H
`i'
`P R N V Q
`F Y V V
`E
`0
`F
`P Q
`S E G E D E
`S A R A V
`'{
`1560 'fCTGCCAGAGCTGTCCCCAGGAATGTCCAGCCGTATGTGGTGTACGAGGAGGTCA::::AACGTCTGGATC.l'ATGTTCATGACA~CTTCTATCCCTT::::CCCAATCAGAGGGAGAGGACGAG
`L C F L RAN E C K
`'! G F C H LY K V T
`:AV L
`.K S
`Q G Y D W S E P F S ? G ED E
`16SO CTCTGCTTTCTCCGCGCCAATGAATGCAAGACCGGCTTCTGCCA~TTGTACA~~STCACCGCCGTTTT~~TCCCAGGGCTACGATTGGAGTGAGCCCTTCAGCCCCGGGGAAGATGAA
`FKCP IKEE : IALTSGE :WEVLARHGSK IWVNEETKLVYFQGT
`1800 TTTAAGTGCCCCATTAk3GAA3AGATTGCTCTGACCAGCGGTGAATGGGAGGTTTT3GCGAGGCACGGCTCCAAGATCTGGG~CAATGAGGAGACCAAGCTGGTGTACTTCCAGGGCACC
`KDTPL1 ' :HHL ' iVVSYE :AAGE IVRLTTPG I ?SHSCSMSQNFDM
`1920 Al~GACACGCCGCTGGAGCACCACCTCTACGTGGTCAGCTATGAGGCGGCCGGC3A3ATCGTACGCCTCACCACGCCCGGCTTCTCCCATAGCTGCTCCATGAGCCAGAACTTCGACATG
`F V S H ¥
`5 S V S T P P C V H V Y K L S G P D D D P L H K Q ?
`~ F W A S M M E A
`2C40 TTCGTCAGCCACTACAGCAGCSTGAGCACGCCGCCCTGCGTGCACGTCTACA~G::TGAGCGGCCCCGACGACGACCCCCTGCACAAGCAGCCCCGCTTCTGGGCTAGCATGATGGAGGCA
`ASCPP :>YVPPE FHFHTRSDVRLYGM YKPHALQPGKKHP
`216 0 GCC AGCTGCC CCCCGGAl'T'AT:';'t'Tf:C~TC:C:AGJl.GA TC'fTCCA.T"!'':'CCACACGCGCT~: :;~ATGTGCGGC TC'TACGGCA.TGATCT ACAAGCCCCACGr.:-: TTC"'£AGCCAGGGA A.G AAGCACC CC
`TVLFVY~GPQVQLVNNSFKGIKYLRLN':LASLGYAVVVJD
`2280 ACCG'l'CC'!'CTTTG'l'A'l'Al'G8A3GCCCCCAGG'IGCAGCTGGTGAA':'AACTCCTTCAAA3GCATCAAGTAC'TTGCGGCTCAACACACTGGCCTCCCTG3GC:TACGCCGTGGTTGTGA'fTGAC
`G R G S C Q R G L R F E G A L K N Q M G Q V E
`I E D Q V E G L 0 F V A E K Y G F
`2400 GGCAGGGGCTCCTGTCAGC3A3GGCTTCGGT'ICGAAGGGGCCC7GAAAAACC.~TSGGCCAGGTGGAGATCGAGGACCAGG~GGAGGGCCTGCAGTTCGTGGCCGAGAAGTATGGCTTC
`I D L S R V A
`I H G W S Y G G F L S L M G L
`I E K P Q V F K V A
`I A G A P V T V
`2520 ATCGACCTGAGCCGAGTTSCCATCCATGGCTGGTCCTACGGGGGCTTCCTCTCGCT~ATGGGGCTAATCCACAAGCCCCAGG~GTTCAAGGTGGCCATCGCGGGTGCCCCGGTCACCGTC
`W M A Y D T G Y T E R Y M D V
`P E N N Q H G Y E A G S V
`~ H V E K L P N E P N
`h
`2640 TGGATGGCCTACGACACAG3GTACACTGAGCGCTACATGGACG7CCCTGAGAACAA=~AGCACGGCTATGAGGCGGGTTCCG~GGCCCTGCACGTGGASAAGCTGCCCAATGAGCCCAAC
`RLL ILH3FLDENVHFFHTNFLVSQL IRAGK ? ' iQLQ IYPNE
`2760 CGCTTGCT7ATCCTCCACG3CrTCCTGGACG~~CGTGCAC?~TTTCCACACAAACTTCCTCGTCTCCCAACTGATCCGAGCAGGGAAACCTTACCAGCTCCAGATCTACCCCAACGAG
`RHS RCPESGF .HYEVTLLHFLQEYL
`AGAC'ACAG':'ATTCGCTGCC~CSA.GTCGGGCGF_GCACTATGAAG'.:'CACG'l'TGCTG::AC:TTTCTACAGG.AF.'l'ACC"I'CTGAGCCTGCCCACCGGGAG-C:.::.:tCC:ACATCACAGCACAAGTGGCiG
`2 AS 0
`3CUO CAGCCTCCGCGGGGAACCA3G~GGGAGGGACTGAGTGGCCCGCGGGCCCCAGTGA3GCACTTTGTCCCGCCCAGCGCTGGCCAGCCCCGAGGAGC::GCTGCCTTCACCGCCCCGACGCC~
`3120 TTTATCC7TTTTTAAACGCTCTTGGGTTTTATGlCCGCTGCT7CTTGGTTGCCGA~ACAGAGAGATGGTGGTCTCGGGCCAGCCCCTCCTCTCCC~3CCTTCTGGGAGGAGGAGGTCACA
`3240 CGCTGATGGGCACTGGAGAGGCCAGA/l.GAGACTCAGAGGAGCGGGCTGCCTTCCG::CTGGGGCTCCCTGTGACCTCTCAGTCCCCTGGCCCGGCCA3CCACCGTCCCCAGCACCCAAGCA
`3360 TGCAATTGCCTGTCCCCCCCG3CCAGCCTCCCCAACTTGATGTTTGTGTTTTGTTTSGGGGGATATTTTTCATAATTATTTAAAAGACAGGCCGGGCGCGGTGGCTCACGTCTGTAATCC
`3480 CAGCACTTTGGGAGGCTGA8GCGGGCGGATCACCTGAGGTTGGGAGTTCAAGACCA3CCTGGCCAACATGGGGAAACCCCGTCTCTACTAAAAATACAAAAAATTAGCCGGGTGTGGTGG
`3600 CGCGTGCCTATAATCCCAGCTACTCGGGAGGCTCAGGCAGGAGAATCGCTTGA~CCGGGAGGTGGAGGTTGCCGTGAGCCAAGATCGCACCATTGCACTCCAGCCTGGGCAACAAGAGC
`3720 GAAACTCTGTCTC~~TAAArAAAAAAT~~GACAGAAAGCAAGGGGTGCCTAAATCTAGACTTGGGGTCCACACCGGGCAGCGGGGTTGCAACCCAGCACCTGGTAGGCTCCATTTC
`3840 TTCCCAAGCCCGAGCAGAGGGTCATGCGGGCCCCACAGGAGAAGCGGCCAGGGC~CSCGGGGGGCACCACCTGTGGACAGCCCTCCTGTCCCCAAGCTTTCAGGCAGGCACTGAAACGCA
`3960 CCGAACT?CCACGCTCTGCTGGTCAGTGGCGC~TGTCCCCTCCCCAGCCCAGCCSCCCAGCCACATGTGTCTGCCTGACCCGTACACACCAGGGGTTCCGGGGTTGGGAGCTGAACCATC
`4080 CCCACCTCAGGGTTATATTTCCCTCTCCCCTTCCCTCCCCGCCAAGAGCTCTGCCAGGGGCGGGCAAAAP~GTAAAAAGAAAAGAAAJVL~GAAACAPACCACCTCTAC
`4200 ATATTATGGAAAGAAk~TATTTTTGTCGATTCTlATTCTTTTATAATTATGCGTSSAAGAAGTAGACACATTAAACGATTCCAGTTGGAAACATGT
`
`Fig. 1. Nucleotide sequence of the DPP9 eDNA and the deduced 863-amino-acid sequence (GenBank accession number AF452102l. The nucleotides are
`numbered rrom the 5'- to the 31-end, with the most 5 1 transcription sL:tn site at position + 1 The termination codon and the putative initiation codons are
`indicated in bold italic. Three putative polyadenylation AATAAA signals (two of them are overlapping) are underlined. The serine protease consensus motif
`, Asp'0
`', His'40
`(GWSYG) is indicated by bold characters and the putative catalvtic triad residues of DPP9 (Scr730
`) arc indicaccd by bold underlining.
`
`Page 3 of 9
`
`
`
`188
`
`C. Olsen, N. WagJmarm I Gene 299 (2002) 785-793
`
`3. Results and discussion
`
`3.1. In .vilico identification of human DPP9
`
`The amino acid sequence of the catalytic domain of
`human DPP TV was used to search for similar sequences in a
`human EST database (http://www.ncbi.n!m.nhi.gov). This
`search revealed a set of overlapping ESTs, which could be
`assembled into a eDNA contig. The deduced amino acid
`sequence of this partial eDNA had 38% identity and 52%
`similariL y Lo the hydrolase domain of DPP IV. The order and
`spacing of the Ser, Asp, and His residues that form the
`catalytic triad in DPP TV was completely conserved.
`Furthermore, the motif GWSYG that conforms to the
`consensus sequence GXS XG proposed for serine proteases
`(Barren and Rawling~. 1992) was conserved in Lhe deduced
`amino acid sequence of this novel sequence. A combination
`of 5'-RACE and standard PCR resulted in a full-length
`eDNA clone (Fig. 1; see Section 3.2). Due to its homology
`with DPP IV. the novel gene sequence described here was
`given the name DPP9, for dipeptidyl peptidase-like protein
`9. A BLASTn search into GenBank revealed that the
`identified eDNA eontig corresponded to cxons in two
`overlapping genomic clones (AC005783 and AC005594),
`localized on human chromosome 19pl3.3.
`
`3. 2. Analysis of the DPP9 eDNA and its predicted protein
`product
`
`Determination of the transcription 1mUation site was
`accomplished by a modified SMART 5 1-RACE procedure,
`which allowed isolation of 5'DPP9 eDNA sequences more
`consistently than other versions of the same methods (see
`
`Section 2.2). The templates used in these reactions were
`total eDNA synthesized from poly(A)+ RNA isolated from
`either human skeletal muscle tissues or rrom the human
`chronic myelogenous leukemia K-562 cell line. By this
`method, seven putative transcription initiation sites were
`identified that clustered in two regions separated by~ 1.3 kb
`(Fig. 2).
`The 5'-RACE reactions also resulted in the identification
`of a Drr9 5'-UTR splice variation that appeared among
`some, but not all, of the PCR products amplified from the K-
`562 derived eDNA. The alternative splicing resulted in the
`lack of exon 2 in some of the 5 1 -RACE PCR products. The
`splicing between exons 1 and 3 generated an upstream stop
`codon (amber) due to a frame-shift in the 51-UTR. This stop
`codon is located in the beginning of exon 1 in the alternative
`spliced mRNA. Since the splice variation is in the .S'-UTR of
`DPP9, both of the identified mRNAs should code for the
`same protein. In order to test whether the identified 5' Drr9
`splice variant was differentially expressed in different tissue
`types, RT- PCR was conducted with 5 1 DPPY specific
`primers, using LemplaLes from a range of normal adulL and
`fetal tissue types. In this experiment, all the identified PCR
`fragments corresponded to the e"on 2-positive DPP9
`mRNA (data not shown). Thus, the smaller sized DPP9
`mRNA was only cloned from the eDNA that originated
`from the K-562 cell line. However, further experiments
`should indicate if splicing of DPP9 is regulated in a cell(cid:173)
`type-specific manner. The mRNA transcript resulting from
`alternative splicing of DPP9 wou!d on!y be 53 bp shorter
`than the longer transcript, and this dirrerence was undetect(cid:173)
`able on Northern blots {Pig. 3). Alternative splicing is not
`uncommon, as an EST-based study showecl alternative
`splicing to take place in 35% of genes in the TIGR human
`
`Spl
`Spl
`Spl
`-539 accgcccccccccccccaccccgccccaccccgccccaccccaccccatctctactaaaaat.acaaaaat.tagatgggcgtggtqgcg:tgcgcttgtaatcacagctacttgggaggctg
`Pit-la
`-41 :J aggcagg a. goes tcgt ttg-aacct.gg:gagg-tgg t.gg t t.gcag tgagcgyayatcacgccact-gcactccagcct.gggqgaccqagcgagact.ccgt:c t.caaaactaaaatgaaaa t t tcaa
`GATA
`GATA
`-299 actcgcaaaagt.ggtacc:!9"Cat.gcgttgg<:cggga.atcgaacccggg-tcaact-gcttggaaggcagct01tgctaaccact:atacca.c-ea.acgccct~ggtagcatcctctccga!.£.!
`APl
`GATA
`GCF
`·1 7 9 teet taa tag tggggqat.catgqq t t tgactgagt~ccaagtcacagggggg tgt.ctc tccct:.aacccaccggaagatgtcgt t catggggcgt tillcgcacct taggCC!l££!1£flCC
`M'f .. LFl ....
`CCAAT
`..
`"'(+1~
`-59 gcgggctcccccccaagcgccgcggacgccetggtacgtgcctggtggt:gt~cc~n:'GCTCAACTT'CCGG'GTCA.AA~GCCGGCGGGft.'CCCTG
`Spl
`c-Ets-1
`•
`Spl
`+ 6 Z roTCCG<..~C:'GCGGCTGTCG!'CCCCCGC"PCCCGCCAC'ITCCGGGG'n':'GCAGTCCCGCGCA"TGGAGCCGCGACCG~GGCCCCGC"'ltiGCCCGGGAC'Gli.Ct::'I'GCCCA£'PC'CGGCCGCCGCCC
`Spl
`Spl
`BRE
`- - -
`+192 CACG'f'CCC(Jgt:gagcagcgcccgcgcgcctcgtcccgccccggccg-cgaccctgaace<..--gccattgaccggccc:cgcct:.cggc:cctgaccctgcccccgccat:cgcctggcctccccgca
`Spl
`+302 ccccggagcctccat:.gcacccgtctgt:gggt:ccctccggccccctgcctcgtttcctctcc:rccccaggccctcctccaccttctttccctccccgattcctgcgctggccaggccacct
`
`+422: c-cca.gtt tctcccctcccccga.t.gggct.t.<::cccac:cca.tcccgt.t.gct.gcccaggccaccaggct.ctgaccccttccggaggr.ggccgcacccaagaacgtcctcgt.agctgcagacctt
`
`+542 gac•ccctctcccctttogt.gcaggctctggag:gtcccgatctcct:ccctctccccatgcaaatgctgtatatgcac«tgaggcca.cctccctcgacctgcatcctgtcctc::actccttta
`CCAAT
`GATA
`1-662 ccacttcctccaaccttcg:ccccgaggcttggtctcc~cca.gtcctgatcctcaaccctacttgatccctctattaaggg~cagagactettt.ttcctgccgaggcgttggtg
`
`+ 7 8 2: g tgtagtggt tagca tagctgtct. tccuggg tccagcccacag:ggt tc4qccccctt.ctc t.caqcc Lccc t tc tcaggu::;:ctgtccaa.ccqcc t:acctt.cttcccacacagcc t taggcc
`ti'U'
`+ 90 2 a.gccctg acac t t ccctgt:tcacgctgcc tctgagcca tca.a.ag t;ggteta ttcect~:~cccaccccc tggtcttccccaggggecccaggaggaagtcac ttgcct.c<:<:aggcaa
`
`+ l 02 2' g-aoccaacc tgc t t tcgaggagct.ggg ac taaagcccaggggg tgt.qqaccg taggca~ccc tgca ttcctctgctgqgtccca.ggggaa tcggg.gcccgtgggatggc t tctctc
`
`+1142 ccagattctgtccagggcat.gcagacagtgctgcagggcttcaaggcaga-gctagaatctagccaggtagaatcgggcctgacagtctgcagctcctgagtgggggoggctgaacggctg
`GA'l'A TATA-box ..
`...
`AP4
`+1262: gccqgtgcttccattaacctgccacattcatgacatgtt~c::ataaa~ACGCC"'I"GC~AATGGAGGC'TC'rCTGGACCC'I"TTAGAAGgtcagttgtqctgggg
`
`Fig. 2. Sequence of the 5 1-flanking region of the human DPP9 gene. Arrows indicate transcription initiation sites identified by 51-RACE. The positions are
`numbered rrom the most 5' transcripLlon stan site ldenLified, ·v-rhich is numbered+ 1. PotenLial regulaLory elements and the locations or the consensus sequences
`for the binding sites of certain transcription factors are underlined. A computer-predicted CpG island ( -115 to+ 391'1 containing 75% GC residues is indicated
`in italics and the tirst and second cxons arc shown in uppercase letters. The genomic DPP9 sequence was obtained from GcnBank (AC005783).
`
`Page 4 of 9
`
`
`
`C. Olser~ N Wagtmann I Gene 299 (2()02 I 185-193
`
`189
`
`1:
`!l
`3'
`I
`
`Fig. 3. Expre"ion analy,es of human DPP9. Human :\forthem blot'
`(Clontech) containing 2 IJ.g per lane of poly(A)+ RNA were hybridized with
`a 32P-labeled DPPY probe originating from nt 106-497 of the cloned
`eDNA. One maJor ~ 4.4 kb transcnpt was observed in all lanes. Le.-els of
`13-actin mRNA are shown and kilobase molecular size markers are
`indicated un lhe right.
`
`gene index. Mosl or Lhe splicing events occurred within Lhe
`5'-UTRs, which was interpreted as indication for alternative
`regulatory mechanisms (M1ronov et aL, 1999). RT -PCR
`performed on RNA from different human tissues indicated
`that DPP9 transctipts do not contain alternatively spliced
`CDS sequences, and therefore, are supposed to yield a single
`polypeptide (data not shown). The DPP9 3'-UTR, found in
`several ESTs (e.g. BC000970 and BC017008), contains
`three consensus poly-adenylation signals AATAAA (Fig.
`1). However, no poly(A) sequence has been found in the 3'(cid:173)
`UTR. The 3 1-UTR sequence of the assembled DPP9 cDNAs
`was confirmed by RT -PCR.
`There are two possible translation initiation sites in the
`deduced amino acid sequence of DPP9, one at nt position
`280-282 and one at nt position 367-369 (Fig. 1). No in(cid:173)
`frame stop codon was found upstrean1 from the two ATG
`codons in the exon 2-positive DPP9 sequence. The second,
`but not the first, ATG codon is arranged within a Kozak
`sequence and is therefore assumed to be the start codon. The
`sequence that surrounds this start codon has a
`'strong'
`consensus sequence for initiation of translation by eukary(cid:173)
`otic ribosomes with an adenine at position - 3 and a guanine
`in position + 4 (Kozak, 1989). In addition, in vitro
`translation reactions supports that the second ATG codon
`is the slarl codon (data nul shown). ThLLS, Lhe translation
`might initiate from the second ATG, in a so-called ·Jeaky'
`
`scanning module (Kozak, 1991). The ORF starting from the
`second methionine codon and ending at the TGA terminator
`codon encodes an 863-amino-acid protein with a calculated
`molecular mass of 98 kDa.
`Analysis of the primary structure revealed that DPP9 has
`the highest identity to DPP8, with 58% overall identity and a
`72% identity in the 200 amino acid fragments containing the
`presumed catalytic regions in the predicted a/(3-hydrolase
`domain. This high identity suggests that DPP9 and DPP8
`could be a result of gene duplication. This is strengthened by
`Carim-Todd et al. (2000), who recently reported a
`conservation of clusters of paralogous genes between
`human chromosomes 15q and 19pl3.3- pl2, where both
`DPP8 and DPP9 are located, respectively.
`By bininfnrmatics analysis, the deduced R63-aminn-acid
`sequence of DPP9 appears devoid of transmembrane
`segments and lacks a signal-sequence for co-translational
`insertion into Lhe endoplasmic reticulum, suggesting the
`protein is expressed as a soluble cytosolic form. If initiation
`of DPP9 also can take place from the first ATG codon, the
`29 aa longer protein product would still lack transmembrane
`segments and a signal-sequence. DPP9 has a GWSYG
`motif, which is a signature for Lhe calalylically active
`members of the S9b subfamily of serine peptidases.
`Furthermore, DPP9 has two potential N-glycosylation sites
`at position N 205 and N 653
`.
`
`3.3. Expression pattern of human DPP9
`
`Northern blots showed that the DPP9 gene encodes one
`major 4.4 kb mRNA transcript (Fig. 3). The specificity of
`rhe DPP9 probe corresponding to nt 106-497 of the cloned
`eDNA was confirmed by hybridization to a Southern blot
`(data not shown). The Northern blots showed that all of the
`tested tissues and cell lines expressed DPP9; however, the
`DI'I'9 expression levels were highest in skeletal muscle,
`heart, and liver and lowest in brain tissue (Fig. 3). The probe
`used for Northern blotting was also hybridized to an array
`RNA blot (Ciontech). which contained mRNA from 61
`adult human tissues, eight human tumor cell lines, seven
`human fetal tissues and eight controls immobilized in dots
`(data not shown). This RNA blot produced the same
`ubiquitous expression pattern as seen on the Northern blots
`and comparison between the amounts of DPP9 mRNA in
`cancer cell lines, adult and fetal tissues showed no obvious
`differences. To further address functional aspects of DPP9,
`a more detailed knowledge of its tissue specific regulation
`would be useful.
`Preliminary Western blots, using a polydonal antibody
`raised against a DPP9 peptide, showed a presumed DPP9
`protein expression in various cell lines, including the human
`T
`lymphoblastoid cell line C-8166 (data not shown).
`Interestingly, C-S 166 does not express DPP IV; however,
`it has been suggested Lhat lysates !rom this cell line can
`hydrolyse DPP IV substrates (Blanco et aL, 1998).
`
`Page 5 of 9
`
`
`
`190
`
`C. Olsen, N. Wagtmarm/Gene299(2002) 785-793
`
`Table 1
`Exon-intron <tmcture of the human DPP9 gene
`
`Exun (bp)
`
`5' splice donor
`
`Intrun (bpi
`
`Clas,;
`
`31 splice acceptor
`
`l (1901
`2 (53)
`3 (91)
`
`4 (2571
`
`5 (1131
`
`6 (1741
`
`7 (1691
`
`8 (1141
`
`9 (1291
`
`]() (62)
`
`11 (101)
`
`12 (17l>)
`
`TCCCG
`AGAAG
`AGAAG
`
`gtgag ..
`gtcag.
`glgag ..
`
`gtaag ..
`
`gtaag.
`
`gtgag ..
`
`gtggg ..
`
`gtagg ..
`
`gtgag ..
`
`gtaag.
`
`gtgag ..
`
`gtaag ..
`
`l (1122)
`2 (2557)
`3 (5513)
`
`4 (8110)
`
`5 (1553)
`
`6 (76)
`
`7 (1169)
`
`8 (447)
`
`9 (1811)
`
`10 (2564)
`
`11 (1995)
`
`12 (554)
`
`0
`
`0
`
`()
`
`2
`
`()
`
`.taaag
`. . cgcag
`.cgcag
`
`G':'CTG
`GCZ\CC
`C':'TCTC .. ATG
`M
`
`.cttag
`
`tccag
`
`.ctcag
`
`.cctag
`
`.ccaag
`
`.cgcag
`
`.. ctcag
`
`.cttag
`
`. . gacag
`
`c CTC c
`L
`rc CAG
`Q
`rc ATG
`Yl
`c CAA G
`Q
`G GAA G
`E
`G ACA G
`T
`GC 10\G
`K
`AAA TA
`y
`K
`rc AA~
`.N
`G GAA G
`E
`cc }'v'~,G
`K
`GC CAG
`Q
`A GCC A
`A
`cc Cl',G
`Q
`.'\'T'G
`Y!
`rc }V1.G
`K
`AAT GA
`E
`N
`rc Ci".G
`Q
`rc TGA GCC TG ..
`
`glgCJ.I.
`
`gtacc ..
`
`gtggg ..
`
`gt<Jct<J ..
`
`gtgag ..
`
`AA
`
`gtaac.
`
`gtggg ..
`
`gtaag ..
`
`gtggg . .
`
`11(16:'\)
`
`14 (80)
`
`15 (153)
`
`16 (136)
`
`17 (146)
`
`18 (147)
`
`19 (153)
`
`20 ( 143)
`
`21 (112)
`
`22 (1431)
`
`CA ATC
`G
`M
`GCC AC
`A
`G':'G TC
`v
`G':' ITA
`L
`G
`G':' TCA
`s
`G
`GC AGC
`s
`G
`A':'C GT
`I
`c GCC T
`A
`G~T CA
`v
`A~ GAA
`E
`D
`l'.~c TG
`I
`AAC TT
`N
`GC TGC
`s
`c
`G~G C1\
`v
`GGC CA
`G
`G~G GC
`v
`G C:C A
`p
`
`l'.~c TA
`I
`
`I~ ( 11 'i'i)
`
`14 (3703)
`
`15 (677)
`
`16 (2985)
`
`17 (816)
`
`18 (1033)
`
`19 (638)
`
`20 (2749)
`
`21 (3178)
`
`0
`
`0
`
`0
`
`()
`
`0
`
`2
`
`0
`
`.. lr.r.ng
`
`.tgcag
`
`.cccag
`
`.cgcag
`
`. . cacag
`
`ctcag
`
`.ctcag
`
`.cccag
`
`.C<JCd<J
`
`The exon-intron boundaries of DPP9 follow the GT-AG splice rule. The nucleotides in the exons are indicated in uppercase letters, whereas the flanking
`uuckutides in tbe intruus are in lu,vt=-rca~e. The class and sizes uf the introm.; and exuus are indicated in base pairs. Arninu acid"i. encudetl by the nucle:utitles
`flanking the intron -cxon border arc indicated by their letter symbol. An asterisk indicates the stop codon.
`
`:3.4. Gene structure of human DPPY
`
`Two overlapping cosmid clones found in GenBank
`(AC005783 ami AC005594) cuntain~:d the entire genomic
`DPP9 sequence. The introns were identified by aligning the
`full-length PCR-amplifiecl DPP9 eDNA with the genomic
`contig. Non-aligned spaces were presumed to be introns. All
`the exon/intron houndaries satisfy the GT/ AG splice rule
`(Table 1 ). The introns were shown to be mostly class 0 (the
`intron is located hetween codons) and 1 (the intron interrupts
`the first and second bases of t