`
`Incorporating Mouse Genome
`© Springer-Verlag New York Inc. 1999
`
`Genomic sequence comparison of the human and mouse adenosine
`deaminase gene regions
`Anthony G. Brickner,1 Ben F. Koop,2 Bruce J. Aronow,1 Dan A. Wiginton1
`
`1Department of Pediatrics, Division of Developmental Biology, University of Cincinnati College of Medicine and Children’s Hospital Research
`Foundation, Cincinnati, Ohio 45229, USA
`2Center for Environmental Health-Biology, University of Victoria, Victoria, BC, Canada V8W-272
`
`Received: 27 March 1998 / Accepted: 22 September 1998
`
`Abstract. A challenge for mammalian genetics is the recognition
`of critical regulatory regions in primary gene sequence. One ap-
`proach to this problem is to compare sequences from genes ex-
`hibiting highly conserved expression patterns in disparate organ-
`isms. Previous transgenic and transfection analyses defined con-
`served regulatory domains in the mouse and human adenosine
`deaminase (ADA) genes. We have thus attempted to identify re-
`gions with comparable similarity levels potentially indicative of
`critical ADA regulatory regions. On the basis of aligned regions of
`the mouse and human ADA gene, using a 24-bp window, we find
`that similarity overall (67.7%) and throughout the noncoding se-
`quences (67.1%) is markedly lower than that of the coding regions
`(81%). This low overall similarity facilitated recognition of more
`highly conserved regions. In addition to the highly conserved ex-
`ons, ten noncoding regions >100 bp in length displayed >70%
`sequence similarity. Most of these contained numerous 24-bp win-
`dows with much higher levels of similarity. A number of these
`regions, including the promoter and the thymic enhancer, were
`more similar than several exons. A third block, located near the
`thymic enhancer but just outside of a minimally defined locus
`control region, exhibited stronger similarity than the promoter or
`thymic enhancer. In contrast, only fragmentary similarity was ex-
`hibited in a region that harbors a strong duodenal enhancer in the
`human gene. These studies show that comparative sequence analy-
`sis can be a powerful tool for identifying conserved regulatory
`domains, but that some conserved sequences may not be detected
`by certain functional analyses as transgenic mice.
`
`the T-cell receptor C␣/C␦ region (Koop et al. 1994; Koop and
`Hood 1994), the -globin gene cluster (reviewed in Hardison and
`Miller 1993; Hardison et al. 1994), the ␣- and -myosin heavy
`chain genes (Liew et al. 1990; Epp et al. 1993, 1995; GenBank r84,
`Koop 1994), and the ␥-crystallin gene cluster (den Dunnen et al.
`1989). The remaining large human–rodent comparisons are from
`two DNA repair genes, XRCC1 and ERCC2 (Lamerdin et al. 1995,
`1996) and the Bruton’s tyrosine kinase locus (Oeltjen et al. 1997).
`ADA is a key enzyme of purine metabolism that catalyzes the
`deamination of adenosine or deoxyadenosine. Its deficiency in
`humans results in an accumulation of metabolites toxic to lym-
`phocytes, causing severe combined immunodeficiency (SCID;
`Giblett et al. 1972). ADA-deficient homozygous mice die perina-
`tally owing to disturbed purine metabolism, severe liver cell im-
`pairment, epithelial cell death in the small intestine, and atelectasis
`(Migchielsen et al. 1995; Wakamiya et al. 1995). Although ex-
`pressed ubiquitously in mammals, ADA levels vary widely ac-
`cording to cell type, species, and stage of development or differ-
`entiation (Witte et al. 1991; Chinsky et al. 1990). Its expression
`profile thus differs somewhat from that of most mammalian genes,
`which are usually expressed either ubiquitously at relatively low
`levels or in a restricted set of cell types at moderate to high levels.
`In human tissues, expression varies over a 1000-fold range, with
`highest levels in thymus and duodenum (Aronow et al. 1989). In
`mice, the highest levels are in tongue, esophagus, forestomach,
`maternal decidua, and fetal placenta, as well as in thymus and
`duodenum (Knudsen et al. 1988; Chinsky et al. 1990; Mohamme-
`deli et al. 1993). ADA expression in human cells is regulated
`mainly at the level of transcription initiation (Lattier et al. 1989),
`but in some cell types transcriptional arrest may play a regulatory
`role (Chen et al. 1990).
`Modular organization of cis-regulatory elements has been ob-
`served in a number of genes, especially those expressed in com-
`plex temporal and/or spatial patterns (reviewed in Kirchhamer et
`al. 1996). Mounting evidence indicates that distinct regulatory
`modules also govern the diverse expression pattern of human and
`mouse ADA. A GC-rich TATAAA-box-deficient promoter is re-
`quired for basal transcription in both species (Valerio et al. 1985;
`Ingolia et al. 1986; Rauth et al. 1990; Innis et al. 1991; Dusing and
`Wiginton 1994). T-cell-specific expression of ADA is primarily
`regulated by a potent enhancer identified and characterized in hu-
`man intron 1 (Aronow et al. 1989, 1992) and subsequently con-
`firmed in mouse intron 1 (Brickner et al. 1995; Winston et al.
`1995, 1996). Sequences flanking the human thymic enhancer are
`required for position-independent, copy-proportional thymic ex-
`pression in transgenic mice, but not in transient transfection of
`human T-cell lines. These elements, termed facilitators, are appar-
`ently necessary in establishing a proper chromatin configuration
`for thymic enhancer function (Aronow et al. 1992, 1995). A 3.3-kb
`fragment of mouse intron 1 harboring the thymic enhancer addi-
`Correspondence to: D.A. Wiginton
`SEQUENOM EXHIBIT 1094
`Sequenom v. Stanford
`IPR2013-00390
`
`Introduction
`
`Large-scale comparisons of the genes of humans and disparate
`species provide insights into gene structure, function, and evolu-
`tion not easily obtained from the analysis of human sequences
`alone. Combinatorial complexity inherent in multifactor binding
`site determination of enhancers and promoters (reviewed in Ar-
`none and Davidson 1997) ensures that sequence-based recognition
`of regulatory regions will remain a difficult challenge. Rodent
`species generally provide a well-characterized, practical model for
`testing hypotheses that may result from large-scale comparisons
`(Koop 1995). Their genetic similarity to humans should facilitate
`identification of homologous loci, yet their evolutionary distance
`from us should permit conserved and nonconserved sequences
`within a locus to be distinguished (Hood et al. 1992). However,
`only eight human–rodent comparisons >20 kb have been pub-
`lished, as few genes have been sequenced from more than one
`species. Five of these are in clustered multigene families: the im-
`munoglobulin heavy chain JH-C-C␦ region (Koop et al. 1996),
`
`SEQUENOM EXHIBIT 1094
`
`
`
`96
`
`A.G. Brickner et al.: Sequence comparison of ADA gene regions
`
`tionally activated basal levels of chloramphenicol acetyltransferase
`(CAT) reporter expression in all transgenic mouse tissues tested;
`this ubiquitously activating element was found to be separable
`from the thymic enhancer (Winston et al. 1996). Similar ubiqui-
`tous activation was also shown in the 2.3-kb fragment harboring
`the human thymic enhancer and facilitators (Aronow et al. 1989).
`Recent studies in our laboratory have identified a 3.4-kb human
`ADA gene segment centered in intron 2 capable of driving high-
`level CAT expression in transgenic mouse duodenal epithelium
`(Dusing et al. 1997). Functional characterization of this element is
`ongoing. Separate segments within 6.5 kb of mouse ADA gene 5⬘
`flanking sequences regulate expression postnatally in the fore-
`stomach and prenatally in the placenta (Winston et al. 1992, 1995).
`This upstream fragment was also used to drive placental expres-
`sion from an ADA minigene construction that rescued homozy-
`gous ADA-deficient mice from perinatal lethality (Blackburn et al.
`1995). However, ADA expression in rescued mice was limited to
`the gastrointestinal tract, primarily the forestomach, and was ac-
`companied by lymphoid-specific metabolic disturbances (Black-
`burn et al. 1996). Modules regulating high-level expression in
`other tissues such as maternal decidua and mouse duodenum,
`tongue, and esophagus have yet to be identified. ADA-deficient
`mice rescued by introduction of a human ADA gene-containing
`transgene into mouse zygotes reflected a human expression pat-
`tern, expressing only low-level human ADA in the upper alimen-
`tary tract, as opposed to high-level endogenous mouse ADA. Thus,
`the human gene apparently lacks regulatory elements necessary for
`high-level ADA expression in the mouse upper alimentary tract
`(Migchielsen et al. 1996).
`In this study, we have attempted to identify conserved regions,
`particularly in noncoding sequence, that correlate with ADA gene
`segments that may play important functional or structural roles.
`Despite low-level similarity throughout the sequence compared,
`we identified several distinct, highly conserved blocks of noncod-
`ing sequence. Two, the promoter and thymic enhancer, have pre-
`viously demonstrated functional significance. Several conserved
`regions were observed to which no function has yet been ascribed,
`as well as at least one functional region that is apparently not well
`preserved between mouse and human.
`
`Materials and methods
`
`Analysis of human and mouse ADA sequences. General patterns of
`similarity between the human and mouse ADA sequences were initially
`plotted on x and y axes by dot matrix analyses as implemented by Inherit
`GeneAssist (v1.1, Perkin-Elmer/ABD, Foster City, Calif.) and Dotter (Son-
`nhammer and Durbin 1995). In GeneAssist, plotted dots represent 16
`matches in a window of 20 bp. Adjacent windows overlapped by 10 bp.
`Diagonal lines reflect colinear dots of similarity between the two se-
`quences. Parameters chosen for presentation showed a moderate back-
`ground of dots with regions of similarity clearly indicated. A global align-
`ment of these sequences was determined with the MAP program (Huang
`1994). Parameters used to determine the alignment were: match ⳱ 10,
`mismatch penalty ⳱ (cid:237)9, gap open penalty ⳱ 27, gap extend penalty ⳱ 10
`(to a maximum of 10 base positions). Of the range of parameters examined,
`the values used in the final alignment attempted to minimize the number of
`gaps and maximize overall similarity. Thus, the overall similarity value of
`aligned regions may be overestimated. A plot of local similarity was based
`on overlapping 24-bp windows where gaps were counted 1 difference over
`1 position, irrespective of gap length (window overlap was 12 bp).
`
`Results
`
`This study arose as an extension of our previous comparison of the
`mouse and human ADA gene thymic enhancers (Brickner et al.
`1995). We initially sequenced a 3.3-kb mouse intron 1 fragment
`(GenBank U72392) containing the thymic enhancer for compari-
`son with the human sequence (GenBank M13792). Subsequent
`
`determination of the mouse ADA gene sequence by Rodney Kel-
`lems’ laboratory (GenBank U73107) has resulted in the more com-
`plete comparison reported here.
`The ADA gene spans ∼23 kb in mouse and ∼32 kb in humans.
`The human and mouse genes are similarly organized, contain 12
`exons and 11 introns, and encode proteins of 363 and 352 amino
`acids, respectively, which are 83% identical (Wiginton et al. 1984;
`Yeung et al. 1985). Mouse ADA lacks the 11 C-terminal amino
`acid residues of human ADA owing to an additional stop codon in
`exon 11 (Al-Ubaidi et al. 1990). Murine ADAs 1056-bp open
`reading frame shares 81% sequence similarity with its human
`counterpart.
`Figure 1 depicts the human and mouse ADA gene maps, show-
`ing the similar organization of the genes (top) and a dot matrix
`overview of similarity between the genes (bottom). Lengths of
`exons and the smaller introns are highly conserved. Intron 1 com-
`prises nearly half the length of either gene. Alu repeats in the
`human gene account for most interspecies differences in intronic
`size, especially in introns 1, 2, and 6. Within the human gene, 23
`Alu repeats were identified (Wiginton et al. 1986), which comprise
`18% of the total sequence at a density of 0.62 Alu/kb, exceeding
`the average density (0.25 per kb) predicted for the rest of the
`genome (Moyzis et al. 1989).
`In the dot matrix comparison in Fig. 1 (bottom), a strong di-
`agonal occurs 3⬘ of exon 4, a region containing closely spaced
`exons and small, well-conserved introns. The 3⬘ half of intron 1
`and most of intron 2 comprise a large region of low similarity. The
`insertion of ten Alu repeats over ∼13 kb of human sequence con-
`tributes to the discontinuity of the diagonal in this region. Simi-
`larity 5⬘ of the promoter is also low, mainly owing to two nearly
`contiguous Alu repeats and a hybrid O-type/Alu repeat in the
`human sequence (Wiginton et al. 1986). Low similarity 3⬘ of exon
`1 results from five Alu repeats in the human sequence as well as
`two B-type repeats in the mouse sequence (GenBank U73107).
`Conserved noncoding sequences in Fig. 1 include the promoter
`region, the thymic enhancer, and an adjacent region just 3⬘ within
`intron 1, and a region central to intron 3.
`A local alignment was constructed from the human and mouse
`ADA sequences (data not shown; alignment provided upon re-
`quest) to more precisely determine the degree of significant se-
`quence conservation. On the basis of 24-bp windows and 12-bp
`overlapping window shifts, local levels of similarity were deter-
`mined and plotted (Fig. 2A) across the entire aligned sequence. As
`indicated by the dashed lines in the top map in Fig. 1, the mouse
`gene contains about 1.9 kb and 1.2 kb of 5⬘ and 3⬘ flanking se-
`quence, respectively, extending beyond the alignment with the
`human gene.
`A critical parameter to establish in the sequence comparison of
`two species is the level of similarity due to random incorporation
`of mutations in nontranscribed DNA. Determination of this level
`facilitates identification of regions actively conserved by forces
`such as natural selection (Koop 1994). Our comparison revealed an
`average level of 67.7% sequence similarity over the ADA gene
`region, based on aligned regions only. Since a basal divergence
`level between mouse and human noncoding DNA sequences has
`not been established, we utilized this 67.7% value (indicated as a
`solid line in Fig. 2A) as a background similarity level in assigning
`significance to high similarity. Significant conservation in Fig. 2 is
`indicated not only by high percentage similarity in a given window
`(height of peak), but also by extended similarity over consecutive
`windows (width of peak). Large gaps in the plot indicate consecu-
`tive windows in which alignment scores fail to exceed 50% simi-
`larity or sequence is not present in one of the two sequences
`compared. These gaps often indicate the presence of repetitive
`elements, depicted by arrows beneath the plot, but some sequences
`could be neither aligned nor identified as repetitive DNA. To
`facilitate recognition of conserved regions, we adjusted the align-
`ment plot in Fig. 2B so that the baseline similarity (y-axis) equals
`
`
`
`A.G. Brickner et al.: Sequence comparison of ADA gene regions
`
`97
`
`76.6%, the mean overall similarity plus one standard deviation (SD
`⳱ 8.9%). The plots clearly illustrate a set of distinctly conserved
`regions.
`Conservation of the ADA coding sequence is obvious in Fig. 2
`and is substantially higher (81%) than the overall similarity of
`67.7%. Similarity throughout the aligned noncoding sequence
`(67.1%) is only slightly lower than the 67.7% overall similarity,
`since coding sequences comprise only a small portion (<3.5%) of
`either gene. Exons 4 and 5 are especially conserved, the latter
`containing only three mismatches over 41 bp. Twelve segments
`exceeded 90% similarity over at least 24 bp, eight of which cor-
`respond to exons 2 through 6, 8, 9, and 10. The remaining four lie
`in noncoding regions (see below). Murine exon 12 contains only
`untranslated and polyadenylation sequences (Al Ubaidi et al. 1990)
`and is not highly similar to human exon 12, which contains the
`final three amino acid codons and 3⬘ untranslated region (Wiginton
`et al. 1986).
`Several noncoding areas in the alignment plots exhibit ex-
`tended similarity and coincide with those observed in Fig. 1. Two
`correspond to known functional domains shared by mice and hu-
`mans, the promoter and thymic enhancer. Our previous compari-
`son of the human and murine thymic enhancer regions indicated
`71.1% similarity overall, four subregions with ⱖ80% similarity
`over ⱖ24 bp, and several conserved transcription factor consensus
`sequences within these subregions (Brickner et al. 1995). We sub-
`sequently determined similarity in this region over a wider win-
`dow, as shown in Fig. 3. In a comparison of 1.1 kb of mouse ADA
`promoter/exon 1 sequence to its human counterpart, Al-Ubaidi et
`al. (1990) identified 10 domains exhibiting ⱖ85% similarity over
`ⱖ15 bp. In this previous comparison, Domains I–IV, which en-
`compass the proximal promoter region through the first 69 bp of
`mouse intron 1, exhibit similarity (73.8% over 233 bp) very analo-
`gous to that observed in the thymic enhancer despite the inclusion
`of 33 bp of coding sequence. Domains from this previous com-
`parison which lie further 5⬘ of the transcription initiation site ex-
`hibit less extensive similarity. The extensive similarity (>70% over
`>200 bp) exhibited by the promoter and thymic enhancer supports
`
`Fig. 1. Top, exonic map of human (GenBank M13792) and
`mouse (GenBank U73107) ADA genes. Dashed lines indicate
`boundaries of alignment; vertical bars, positions of exons;
`arrows, thymic enhancer region. Bottom, overall gene dot
`matrix analysis. Each dot indicates a ⱖ16 bp match in a 20-bp
`window. Windows are moved to adjacent positions by
`incrementing window size times 0.5. Dots forming diagonal
`lines represent extended areas of similarity.
`
`the idea of using comparative analysis to identify conserved regu-
`latory regions within ADA noncoding sequence. Comparable lev-
`els of noncoding sequence similarity for a functional regulatory
`element (70% over 370 bp) were established in a comparison of
`human and mouse T-cell-specific enhancers of the T-cell receptor
`C␦ gene (Koop et al. 1992). A placenta-specific regulatory element
`identified in the mouse ADA gene (Shi et al. 1997) lies 5⬘ of the
`mouse sequence used in this study. Since the analogous region of
`the human gene has not been sequenced, we could not evaluate the
`conservation of this element. Comparison of the mouse placental
`enhancer sequence to the entire known human ADA sequence
`yielded no significant similarities (data not shown).
`Since the promoter and thymic enhancer are clearly evident in
`Figs. 1 and 2, we attempted to detect additional conserved ele-
`ments having potential functionality. Regions of extended similar-
`ity in Fig. 2B were tested by utilizing the alignment to determine
`whether their similarities exceeded >70% over >100 bp of human
`sequence. This level was chosen to permit detection of regions
`with similarities comparable to, yet less extensive than that exhib-
`ited by the thymic enhancer and promoter regions. Table 1 sum-
`marizes the lengths, similarity, and positions of 19 conserved re-
`gions (CRs) that meet these criteria. All CRs contain multiple
`24-bp regions with much higher levels of similarity than 70%.
`Most CRs have spikes in Fig. 2 that exceed 2 SD (85.5%). CRs
`containing exons include conserved adjacent sequence, some of
`which is associated with consensus splicing signals. All exons but
`exon 12 exhibited enough similarity for inclusion as a CR. Exons
`7 and 8, sequences flanking these two exons, and intron 7 were
`grouped as CR13 according to our criteria.
`CRs in noncoding sequence in Fig. 2 are also presented in
`Table 1. Some, particularly CRs 3 and 4, are conserved over re-
`gions as extensive as those containing the exons. The promoter/
`exon1 (CR1a) and thymic enhancer regions (CR3) are obvious in
`Fig. 2B. Since coding sequence from exon 1 comprises only about
`14% (33 bp) of CR1a, we designated as it a noncoding CR. CRs 1a
`and 1b are separated by two copies of a 31-bp direct repeat in the
`mouse sequence (Maa et al. 1990), precluding their designation as
`
`
`
`98
`
`A.G. Brickner et al.: Sequence comparison of ADA gene regions
`
`Fig. 2. Human and mouse ADA gene local similarity analysis. An align-
`ment was constructed from the human and mouse ADA sequences. Local
`levels of sequence similarity were calculated from a 24-bp window that
`shifted along the alignment in overlapping 12-bp intervals. Level of simi-
`larity (vertical axis) is plotted against alignment position (horizontal axis).
`Positions of exons and three HS within intron 1 (see text) are indicated
`above graph, as well as the promoter/exon 1 region, designated as P-1. (A)
`Average similarity between human and mouse aligned sequences, indicated
`by solid line, is 67.7% (gap counted as 1 difference over 1 alignment
`position). Large arrows below graph indicate positions of Alu repeats in
`human sequence; small arrows indicate positions of B1 and B2 repeats in
`
`mouse. Large and small stippled arrows indicate the positions of a hybrid
`O-type/Alu repeat in the human sequence (Wiginton et al. 1986) and a
`repeat region similar to an unidentified repeat in the Chinese hamster
`rhodopsin gene locus (Gale et al. 1992), respectively. Bracket below CR2
`through CR5 indicates region included in the mouse/human comparison
`from our initial 3.3 kb of mouse intron 1 sequence, represented in greater
`detail in Fig. 3. (B) Graph emphasizing regions of high similarity. Graph
`identical in all respects to Fig. 2A except that minimum value on vertical
`axis equals average similarity plus one standard deviation (76.6%, SD ⳱
`8.9%). CRs (see text and Table 1) within noncoding sequence are indicated
`by arrowheads below the graph.
`
`Fig. 3. Detailed map of high similarity in intron 1 corresponding to brack-
`eted region in Fig. 2. Sequence from a 3.3-kb mouse ADA intron 1 frag-
`ment (GenBank U72392; numbered according to GenBank U73107) was
`compared with the analogous human ADA sequence (GenBank M13792).
`Black rectangles represent blocks of high similarity (ⱖ70% over ⱖ20 bp
`of human sequence); CRs are labeled accordingly. Repetitive elements
`
`indicated by horizontal arrows or bars above or below maps; ?RPT is
`repeat region similar to an unidentified repeat in the Chinese hamster
`rhodopsin gene locus (Gale et al. 1992). Positions of three DNase I HS in
`human intron 1 (see text) are indicated at top of map. Alignment param-
`eters used are identical to those in Fig. 4.
`
`a single CR. CR4 lies 3⬘ of the thymic enhancer and exhibits
`similarity (70.5% over 336 bp of human sequence), more extensive
`than that of the thymic enhancer and possibly consistent with a
`functional role (see Fig. 4). Although CR4’s function is unknown,
`it corresponds to the fourth of an array of six thymus-specific
`DNase I hypersensitive sites (HS) previously identified within hu-
`man ADA intron 1 (Aronow et al. 1989). HS regions are thought
`to represent either inducible, constitutive, or tissue-specific regions
`
`of open chromatin that allow transcription factors access to their
`binding sites within regulatory regions in the process of functional
`activation (reviewed in Elgin 1988). The human thymic enhancer
`segment within CR3 is associated with HS III, the only HS of the
`six with clear functional significance (Aronow et al. 1992). HS II
`is also conserved (CR2), but over a markedly shorter span (71.3%
`over 101 bp). The remaining three HSs in intron 1 were not well
`conserved. The bracket beneath CRs 2 through 5 in Fig. 2 indicates
`
`
`
`A.G. Brickner et al.: Sequence comparison of ADA gene regions
`
`99
`
`Table 1. Conserved regions identified in a comparison of the mouse and human ADA gene regions.
`
`Human
`ADA
`#s
`
`3905–4137
`4269–4388
`8670–8770
`9169–9452
`10821–11156
`11674–11834
`17824–17931
`19220–19414
`26318–26627
`27772–27875
`28840–29088
`29745–29945
`31023–31353
`32404–32740
`32799–33096
`34345–34537
`35036–35176
`36383–36632
`36638–36741
`
`Mouse
`ADA
`#s
`
`4261–4483
`4686–4808
`8600–8687
`9041–9313
`9943–10269
`10825–11002
`14884–14996
`16239–16422
`18977–19274
`20652–20756
`21416–21661
`22032–22228
`23091–23415
`23905–24241
`24402–24631
`26222–26418
`26776–26908
`28248–28496
`28530–28624
`
`Corresponding
`region
`
`Promoter/exon 1
`3⬘ of exon 1
`HS II
`HS III (thymic enhancer)
`HS IV
`3⬘ of HS IV
`5⬘ of exon 2
`Exon 2
`Exon 3
`Mid-intron 3
`Exon 4
`Exon 5
`Exon 6
`Exons 7 and 8
`Exon 9
`Exon 10
`Exon 11
`3⬘ of exon 12
`3⬘ of exon 12
`
`%S
`
`imilarity
`
`Length
`(bp)
`
`233
`120
`101
`284
`336
`161
`108
`195
`311
`104
`246
`201
`331
`337
`298
`226
`141
`250
`104
`
`73.8%
`70.1%
`71.3%
`70.7%
`70.5%
`72.7%
`70.3%
`71.3%
`72.0%
`76.0%
`82.5%
`80.6%
`76.7%
`70.3%
`71.5%
`72.6%
`70.2%
`72.8%
`72.1%
`
`CR
`
`CR1a
`CR1b
`CR2
`CR3
`CR4
`CR5
`CR6
`CR7
`CR8
`CR9
`CR10
`CR11
`CR12
`CR13
`CR14
`CR15
`CR16
`CR17a
`CR17b
`
`Fig. 4. Sequence alignment of human (top) and mouse (bottom) CR4, the
`most extensive noncoding CR determined in this comparison. Consensus
`matches for transcription factor binding sites are shown (overlined, human;
`underlined, mouse). Alignment corresponds to human bps. 10821–11156
`
`and mouse bps. 9943–10269 from same GenBank accession Nos. as Fig. 3.
`Alignment parameters are: match ⳱ +1; mismatch penalty ⳱ 0.9, gap
`penalty ⳱ 1 + 1.7* gap length (to a maximum of 10). HSIV ⳱ center of
`HS IV region in human intron 1 (Aronow et al. 1989).
`
`the region of the mouse/human comparison from our initial 3.3 kb
`of mouse intron 1 sequence (GenBank U72392), shown in greater
`detail in Fig. 3.
`CRs 6, 9, 17a, and 17b also exhibit high similarity in Fig. 2.
`CR6 lies within a 13-kb human ADA gene segment that drives
`enhanced duodenal CAT activity in transgenic mice, but lies 5⬘ of
`a smaller 3.4-kb segment sufficient for duodenal expression (Dus-
`ing et al. 1997). We have yet to establish any relevance of CR6 to
`aspects of duodenal expression. CR9 lies within intron 3 in both
`species, and CRs 17a and 17b lie 3⬘ of the polyadenylation se-
`quences. Interestingly, GenBank U73107 notes that mouse bps
`28320–28777, included within CRs 17a and 17b, are similar to
`GenBank expressed sequence tags (Ests) W71509 and AA031028.
`CR5 was the only CR detected that did not exhibit obvious con-
`servation in Fig. 2, yet met our criteria for inclusion in Table 1.
`In Fig. 2, several short stretches of high similarity in noncoding
`regions did not meet the length criteria for inclusion as CRs. The
`5⬘-most spike exceeding 90% similarity lies in a region with only
`four mismatches over 40 bp. The spike between exons 4 and 5
`contains 22 bp with 100% similarity. These peaks correspond to
`bps 1720–1759 and 29500–29521 of the human ADA GenBank
`sequence, respectively. Some sharp peaks in Fig. 2 may be attrib-
`uted to alignment artifacts associated with regions adjacent to
`gaps. Caution must be exercised in interpreting the significance of
`such conserved short stretches considering the size of the ADA
`locus. Still, we cannot exclude the possibility that some may have
`a functional or architectural role. Most of the larger gaps in Fig. 2
`represent the insertion of known mammalian short interspersed
`elements (SINES), which we have identified in the sequences of
`both species (data not shown).
`
`Discussion
`
`In order to elucidate the mechanisms of eukaryotic gene regula-
`tion, thus furthering fundamental understanding of normal and
`aberrant biological processes, diverse avenues of investigation
`must be pursued. Searching for evolutionarily conserved se-
`quences is an approach that could facilitate the identification and
`characterization of a gene’s transcriptional regulatory elements.
`For instance, one of the first tissue-specific enhancers identified,
`the immunoglobulin kappa enhancer, was initially noted as a
`highly conserved region within an intron (Emorine et al. 1983). In
`this study we have compared 36.7 kb of the human ADA gene with
`29.8 kb of its mouse counterpart. We attempted to gain insight into
`the overall level of conservation, and possible functionality, in this
`region by comparing the corresponding sequences of the two spe-
`cies.
`A mosaic model of genome evolution was proposed by Koop
`(1995) based on three general patterns of noncoding DNA se-
`quence conservation observed in large (>20 kb) human–rodent
`sequence comparisons. These consist of a divergent pattern, with
`high-level sequence similarity limited almost exclusively to the
`coding regions; a conserved pattern, with high similarity present in
`both coding and noncoding regions; and a third, mixed pattern of
`similarity in which conserved and divergent sequences are adja-
`cent to one another within noncoding sequences. These patterns of
`similarity exhibit fast, slow, and mixed rates of incorporation of
`mutations, respectively, indicating that different genomic regions
`evolve at different rates (Koop 1995; Koop et al. 1996). The di-
`vergent pattern, initially observed in human–rodent comparisons
`of the -globin (reviewed in Hardison and Miller 1993) and
`
`
`
`100
`
`A.G. Brickner et al.: Sequence comparison of ADA gene regions
`
`␥-crystallin (den Dunnen et al. 1989) gene clusters, exhibits scant
`similarity outside the coding regions and immediate 5⬘ flanking
`regulatory regions.
`The mainly divergent pattern of noncoding sequence conser-
`vation observed in the present comparison, as well as the similar
`structural organizations of the mouse and human genes, facilitated
`recognition of conserved noncoding regions in ADA gene. Deter-
`mination of high similarity levels (>70% over >200 bp of human
`sequence) in the two shared noncoding regulatory elements of the
`mouse and human ADA genes, the promoter and the thymic en-
`hancer (Al-Ubaidi et al. 1990; Brickner et al. 1995), further aided
`identification of potentially functional CRs in this study. Similarity
`levels, comparable to but less extensive than those observed in
`these two known functional CRs, were used to gauge conserved
`regions delineated in our local alignment plots (Fig. 2). The se-
`quence alignment between the human and mouse genes identified
`conserved coding and known regulatory regions, as well as a num-
`ber of noncoding CRs for which the function is yet unknown. The
`noncoding CRs shown in Fig. 2B and Table 1 that meet these
`criteria have remained relatively conserved since the evolutionary
`separation of mouse and human lineages ∼65–80 million years ago
`and thus have at least some potential for a functional or structural
`role.
`Interestingly, the human ADA duodenal enhancer region, cen-
`tered in intron 2 (Dusing et al. 1997), exhibits considerably less
`extensive conservation than the thymic enhancer, promoter, or any
`of the CRs noted in this study. However, it is possible that non-
`adjacent or fragmentary short stretches of similarity, such as those
`observable in intron 2 (Fig. 2B), may be sufficient for mouse
`duodenal enhancer function. Figure 1 (top) shows that intron 2 is
`about 2.5-fold larger in humans, a size disparity greater than that
`of any other ADA intron. Evolutionary insertion of DNA into
`human intron 2 may thus account for the fragmentary, low-level
`similarity seen in this enhancer region. Mouse duodenal ADA
`expression may also be governed by an element not detected in our
`comparison, perhaps because it lies outside of the mouse sequence
`in this comparison or is simply not evolutionarily conserved. Fur-
`ther studies are necessary to determine whether a similar duodenal
`enhancer is located in an analogous position in the mouse gene.
`The identification in this study of noncoding CRs comparable
`to those of known ADA gene regulatory regions may lead to
`readily testable hypotheses that were not immediately apparent
`without this comparative alignment. Comparative analysis of the
`ADA gene, as well as other genes, could potentially be exploited
`to identify candidate sequences for functional assays of gene regu-
`lation. This in turn may lead to the identification of novel regula-
`tory elements or regions important to the maintenance of chroma-
`tin structure, as well as their cognate DNA binding factors. Such
`information may greatly enhance our understanding of ADA regu-
`lation and gene regulation in general, and may provide insight into
`the molecular evolution of other gene regions. Potentially, other
`genes with expression patterns similar to that of ADA, such as
`coordinately regulated enzymes of purine catabolism expressed in
`the gastrointestinal and postimplantational reproductive tracts
`(Witte et al. 1991), may harbor regulatory elements somewhat
`similar to those within the ADA gene. Studies of the regulation of
`such genes may thus benefit from information presented in this
`study. Moreover, understanding the structure and regulation of the
`human ADA gene may enhance the success of future attempts at
`human gene therapy, whereby the synthesis of ADA or other gene
`products could conceivably be targeted to specific tissues.
`