throbber
compared with chromosomal duplication con-
`tent (R2 ⫽ 0.16). The correlation was due to
`intrachromosomal duplications (fig. S5; R2 ⫽
`0.20; P ⫽ 0.04; F test) and was absent for
`interchromosomal duplications (R2 ⫽ 0.002).
`The three most gene-rich chromosomes showed
`high levels of duplication, and the seven most
`gene-poor chromosomes were among the least
`duplicated chromosomes.
`To determine what role recent segmental
`duplications have played in current gene evo-
`lution, we characterized the gene content in
`our filtered set of duplicated genomic se-
`quence. We analyzed a highly curated set of
`13,351 mRNAs assigned to the human ge-
`nome assembly (RefSeq, www.ncbi.nlm.nih-
`.gov/LocusLink/refseq.html). We partitioned
`exons from each gene into a unique or dupli-
`cated sequence on the basis of their map
`position (⬎90% sequence identity). We iden-
`tified a total of 7777 exons as being tran-
`scribed from recently duplicated sequence,
`corresponding to 6.1% of all RefSeq exons
`(128,467). This is slightly greater than the
`genomic representation of segmental duplica-
`tion (5.2%), which confirms that gene-poor
`regions have not been preferentially duplicat-
`ed. In many cases, a complete complement of
`exons was not duplicated. These incomplete
`duplicated genes were often found adjacent to
`other duplicated cassettes that originated
`from elsewhere in the genome. By comparing
`our data with human expressed sequence tag
`databases, we found evidence for “chimeric”
`or fusion transcripts that emerged from the
`physical juxtaposition of incomplete segmen-
`tal duplications. Although the mechanism for
`recent segmental duplications is not under-
`stood, the existing data suggest the process
`may play a role in exon shuffling associated
`with expanding protein diversity. A complete
`list of all genes with one more exons within
`duplicated genomic sequence is available (8).
`To further assess whether specific kinds of
`genes or biological processes have been prefer-
`entially duplicated, we compared all RefSeq
`mRNAs on the basis of their INTERPRO pro-
`tein domain classification (Table 2) (table S7)
`(23). In this analysis, we considered a gene
`duplicated only if all its exons were contained
`within a duplicated genomic region. Our anal-
`ysis suggests a nonrandom distribution of seg-
`mental duplications within the proteome. Genes
`associated with immunity and defense (natural
`killer receptors, defensins, interferons, serine
`proteases, cytokines), membrane surface inter-
`actions (galectins, HLA, lipocalins, carcinoem-
`bryonic antigens), drug detoxification (cyto-
`chrome P450), and growth/development (soma-
`totropins, chorionic gonadotropins, pregnancy-
`specific
`glycoproteins) were
`particularly
`enriched. It should be emphasized that our gene
`analysis is restricted to genomic segments that
`show ⱖ90% sequence identity. On the basis of
`neutral expectation of divergence, this corre-
`
`R E P O R T S
`
`sponds to duplications that have emerged over
`the last ⬃40 million years of human evolution
`(24). Gene duplication followed by functional
`specialization has long been considered a major
`evolutionary force for gene innovation (25).
`Therefore, these genes embedded within recent
`genomic duplications may be considered excel-
`lent candidates for adaptations specific to pri-
`mate evolution.
`
`1.
`
`References and Notes
`International Human Genome Sequencing Consor-
`tium, Nature 409, 860 (2001).
`2. J. A. Bailey, A. M. Yavor, H. F. Massa, B. J. Trask, E. E.
`Eichler, Genome Res. 11, 1005 (2001).
`3. J. C. Venter et al., Science 291, 1304 (2001).
`4. S. Ohno, U. Wolf, N. Atkin, Hereditas 59, 169 (1968).
`5. E. E. Eichler, Trends Genet. 17, 661 (2001).
`6. P. Stankiewicz, J. R. Lupski, Trends Genet. 18, 74
`(2002).
`7. E. E. Eichler, Genome Res. 11, 653 (2001).
`8. See supporting data on Science Online.
`9. V. E. Cheung et al., Nature 409, 953 (2001).
`10. The sequence and underlying test statistics for all
`duplicated regions of the genome are available at
`http://humanparalogy.cwru.edu/SDD. WGAC com-
`parisons of the human genome assembly (UCSC,
`August freeze, 2001; http://genome.ucsc.edu) were
`done as described (2). The WSSD-filtered set of
`WGAC duplications can be interactively searched
`(http://humanparalogy.cwru.edu/SDD). This includes
`extracted sequence files, the actual alignments, the
`location of the alignments within the assembly, and
`whole-chromosomal views comparing WGAC and
`WSSD duplication patterns. An updated WSSD based
`on the analysis of 39,298 clones from April 2002,
`detecting an additional 36 Mb of duplicated se-
`quence, is also available.
`
`11. T. H. Shaikh et al., Hum. Mol. Genet. 9, 489 (2000).
`12. J. A. Bailey et al., Am. J. Hum. Genet. 70, 83 (2002).
`13. L. Edelmann, R. K. Pandita, B. E. Morrow, Am. J. Hum.
`Genet. 64, 1076 (1999).
`14. R. Mazzarella, D. Schlessinger, Genome Res. 8, 1007
`(1998).
`15. J. R. Lupski, Trends Genet. 14, 417 (1998).
`16. S. T. Sherry et al., Nucleic Acids Res. 29, 308 (2001).
`17. K. Chen et al., Nature Genet. 17, 154 (1997).
`18. S. L. Christian, J. A. Fantes, S. K. Mewborn, B. Huang,
`D. H. Ledbetter, Hum. Mol. Genet. 8, 1025 (1999).
`19. D. E. Jenne et al., Am. J. Hum. Genet. 69, 516 (2001).
`20. T. Kuroda-Kawaguchi et al., Nature Genet. 29, 279
`(2001).
`21. H. C. Mefford, B. J. Trask, Nature Rev. Genet. 3, 91
`(2002).
`22. J. Guy et al., Hum. Mol. Genet. 9, 2029 (2000).
`23. M. Ashburner et al., Nature Genet. 25, 25 (2000).
`24. W. Li, Molecular Evolution (Sinauer Associates, Sun-
`derland, MA, 1997).
`25. S. Ohno, Evolution by Gene Duplication (Springer-
`Verlag, Berlin, 1970).
`26. We thank L. Christ, M. Eichler, and U. Neuss for
`technical assistance, and H. Willard, J. Nadeau, T.
`Hassold, D. Locke, and J. Horvath for helpful com-
`ments. Supported by NIH grants GM58815 and
`HG002318 and U.S. Department of Energy grant
`ER62862 (E.E.E.), NIH Career Development Program
`in Genomic Epidemiology of Cancer (CA094816)
`and Medical Scientist Training Grant ( J.A.B.), the
`W. M. Keck Foundation, and the Charles B. Wang
`Foundation.
`
`Supporting Online Material
`www.sciencemag.org/cgi/content/full/297/5583/1003/
`DC1
`Materials and Methods
`Tables S1 to S7
`Figs. S1 to S5
`
`20 March 2002; accepted 7 June 2002
`
`Predictive Identification of
`Exonic Splicing Enhancers in
`Human Genes
`William G. Fairbrother,1,2* Ru-Fang Yeh,1* Phillip A. Sharp,1,2
`Christopher B. Burge1†
`
`Specific short oligonucleotide sequences that enhance pre-mRNA splicing when
`present in exons, termed exonic splicing enhancers (ESEs), play important roles
`in constitutive and alternative splicing. A computational method, RESCUE-ESE,
`was developed that predicts which sequences have ESE activity by statistical
`analysis of exon-intron and splice site composition. When large data sets of
`human gene sequences were used, this method identified 10 predicted ESE
`motifs. Representatives of all 10 motifs were found to display enhancer activity
`in vivo, whereas point mutants of these sequences exhibited sharply reduced
`activity. The motifs identified enable prediction of the splicing phenotypes of
`exonic mutations in human genes.
`
`Human genes are generally transcribed as
`much longer precursors, typically tens of ki-
`lobases in length, from which large introns
`
`1Department of Biology, 2Center for Cancer Research,
`Massachusetts Institute of Technology, Cambridge,
`MA 02139, USA.
`
`*These authors contributed equally to this work.
`†To whom correspondence should be addressed. E-
`mail: cburge@mit.edu
`
`must be precisely removed and flanking ex-
`ons precisely ligated to create the mRNA that
`will direct protein synthesis. Sequences
`around the splice junctions—the 5⬘ and 3⬘
`splice sites (5⬘ss and 3⬘ss)—are clearly im-
`portant for splice site recognition. However,
`these signals appear to contain only about
`half of the information required for exon and
`intron recognition in human transcripts (1).
`The sequence or structure context in the vi-
`
`www.sciencemag.org SCIENCE VOL 297 9 AUGUST 2002
`
`1007
`
`Triplet EX1069
`
`1
`
`

`

`cinity of the 5⬘ss and 3⬘ss motifs is known to
`play an important role in splice site recogni-
`tion (2– 4). ESE sequences, which enhance
`
`splicing at nearby sites (5), are an important
`component of this context.
`Exonic enhancers have been identified
`
`R E P O R T S
`
`Fig. 1. Schematic of RESCUE-ESE approach. Exon-intron structures of human genes are derived by
`spliced alignment of cDNAs to the assembled genomic sequence, and splice sites are scored as
`described (17). Values of ⌬EI (scaled difference in frequency between exons and introns) and ⌬WS
`(scaled difference in frequency between weak and strong exons) are calculated as described for
`each of the 4096 possible hexanucleotides (17). Each hexamer is then represented by a colored
`letter at the point (⌬EI, ⌬WS) in the scatterplot. The letters are chosen to reflect the base
`composition of the hexamer according to IUPAC nomenclature (e.g., hexamers containing only A
`and G are represented by the letter “r”). Hexamers containing homonucleotide runs of three or
`more bases (e.g., AAA) are represented by capital letters, all other hexamers by lowercase letters.
`Each letter is colored proportional to the relative content of A (red), C (green), G (blue), and T
`(black) of the hexamer. Hexamers (6mers) satisfying ⌬EI ⬎ 2.5 and ⌬WS ⬎ 2.5 (upper right portion
`of first quadrant) are predicted to have ESE activity. As a test of ESE activity, a 19-base “extended
`exemplar” sequence containing the hexamer in its natural context in a weak exon is chosen and
`inserted into the SXN splicing reporter construct as indicated. SXN is a ␤-globin– derived minigene
`with deleted translation start codon. A point mutant predicted to disrupt ESE activity is also chosen,
`generally the single-base mutant that is farthest to the left and below the predicted ESE hexamer
`in the scatterplot. Transient transfection of the reporter construct followed by quantitative RT-PCR
`with flanking primers is used to assay inclusion of the test exon for the candidate ESE and its
`mutant.
`
`through the analysis of disease alleles (6), by
`site-directed mutagenesis of minigene con-
`structs, and by protocols based on SELEX
`(Systematic Evolution of Ligands by EXpo-
`nential enrichment)
`to identify sequences
`with enhancer activity from a pool of random
`sequences (7–11). These methods initially
`characterized ESEs as purine-rich sequences,
`but additional classes of AC-rich motifs and
`pyrimidine-rich motifs have since emerged
`(7, 10).
`Our strategy for identifying human ESE
`sequences was to first develop a statistical/
`computational method to predict the ESE activ-
`ity of oligonucleotide sequence motifs, to apply
`this method to large data sets of human genom-
`ic sequences, and then to test representatives of
`each predicted motif by means of an in vivo
`splicing assay. At the heart of this approach is a
`sequence
`analysis method that we
`call
`RESCUE (Relative Enhancer and Silencer
`Classification by Unanimous Enrichment).
`RESCUE identifies the set of oligonucleotide
`motifs that enhance or repress a particular bio-
`chemical process; it consists of four steps: (i)
`Identify two or more statistical “attributes” that
`should be manifested by sequences that en-
`hance (or, alternatively, repress) the biochemi-
`cal activity of interest. (ii) Use a statistical
`power calculation to determine an oligonucleo-
`tide “word” size k appropriate for the amount of
`data available. Then represent all possible oli-
`gonucleotides of size k by points in a multidi-
`mensional space, the axes of which represent
`the attributes chosen in the previous step. (iii)
`Define a region in this space corresponding to
`“unanimous enrichment” (i.e.,
`significantly
`high values of all of the chosen attributes) and
`identify clusters of similar sequences that fall in
`this region. (iv) Align the sequences in each
`cluster to produce motifs, and test representa-
`tive sequence(s) from each motif and appropri-
`ate point mutants with the use of a suitable
`functional assay.
`A large body of work suggests that ESEs
`are located in the general vicinity of splice
`sites (12). Unlike transcriptional enhancers,
`ESEs function in a strongly position-depen-
`dent manner,
`enhancing splicing when
`present downstream of a 3⬘ss and/or upstream
`of a 5⬘ss (13), but often repressing splicing
`when present in intronic locations (14, 15).
`These observations suggest that, as one at-
`tribute, ESE sequences should be strongly
`selected for in constitutively spliced exons
`and generally avoided in intronic sequences
`near splice sites.
`Moreover, ESEs can compensate for the
`presence of “weak” (nonconsensus) 5⬘ or 3⬘
`splice signals in exons, and strengthening of
`the splice sites of an enhancer-dependent
`exon generally eliminates enhancer depen-
`dence (16). Therefore, we conjecture that ex-
`ons with nonconsensus splice sites (“weak
`exons”) are under much stronger selective
`
`1008
`
`9 AUGUST 2002 VOL 297 SCIENCE www.sciencemag.org
`
`2
`
`

`

`pressure to retain ESEs than are exons with
`consensus splice sites (“strong exons”), re-
`sulting in a significantly higher frequency of
`ESEs in weak exons than in strong exons.
`Available full-length cDNA sequences
`were aligned to the assembled human ge-
`nome by means of the spliced alignment al-
`gorithm that
`is part of the “Genoa” gene
`annotation script (17). Reliable full-length
`alignments were obtained with this approach
`for 4817 human genes containing 31,463 in-
`trons and 28,933 internal exons. Position-
`specific log-odds score matrices were then
`used to score the 5⬘ss and 3⬘ss of these exons,
`and the distributions of 5⬘ss and 3⬘ss scores
`were used to partition exons into categories
`on the basis of the strength of their splice
`sites: “weak 5⬘ exons” (bottom 25% of 5⬘ss
`scores), “strong 5⬘ exons” (top 25% of 5⬘ss
`scores), with “weak 3⬘ exons” and “strong 3⬘
`exons” defined analogously.
`Application of the RESCUE-ESE method
`to this set of human genes is illustrated in Fig.
`1. A power calculation dictated the use of a
`word size of six nucleotides, which is com-
`parable in size to the binding sites of many
`known RNA binding factors (17). In step
`two, each of the 4096 oligonucleotides of
`length six was assigned two scores: ⌬EI, the
`scaled difference between the frequency of
`occurrence of the hexamer in exons and the
`frequency of occurrence near splice sites in
`introns (scaled in standard deviation units);
`and ⌬5WS, the scaled difference between the
`frequency of occurrence of the hexamer in
`weak 5⬘ exons and its frequency in strong 5⬘
`exons (SD units), with ⌬3WS defined analo-
`gously for weak 3⬘ exons versus strong 3⬘
`exons. Each hexamer was then represented
`by a point in the plane with coordinates (⌬EI,
`⌬5WS) for identification of sequences that
`enhance 5⬘ss recognition (5⬘ESEs) (Fig. 2A).
`Alternatively, each hexamer was represented
`by the point (⌬EI, ⌬3WS) for identification
`of sequences that enhance 3⬘ss recognition
`(3⬘ESEs) (Fig. 2B). A statistical significance
`threshold of 2.5 standard deviations above
`the mean (corresponding to a P value of
`⬃0.01) was then applied to each axis inde-
`pendently; that is, any hexamer for which
`both ⌬EI ⬎ 2.5 and ⌬5WS ⬎ 2.5 is predicted
`to be a 5⬘ESE, and any hexamer with both
`⌬EI ⬎ 2.5 and ⌬3WS ⬎ 2.5 is predicted to be
`a 3⬘ESE (hexamers in the upper right portion
`of the first quadrant in the scatterplots). The
`requirement
`that
`each hexamer
`exceed
`thresholds in two separate dimensions, both
`with P ⬃ 0.01, represents essentially a Bon-
`ferroni-type correction for multiple compari-
`sons: Because 4096 different tests are being
`performed, the combined P value is set to
`⬃(0.01)2 ⫽ 10⫺4, giving an expectation of
`less than one false positive hexamer.
`These criteria identified 103 different hex-
`amers as candidate 5⬘ESEs and 198 hexamers
`
`R E P O R T S
`as candidate 3⬘ESEs. These two sets overlap
`fairly extensively, with 63 of the 103 predict-
`ed 5⬘ESEs also contained in the set of pre-
`dicted 3⬘ESEs, which suggests that many en-
`hancers may be capable of acting at both
`splice sites [e.g., (13)]. The total number of
`hexamers predicted to display either 5⬘ or 3⬘
`ESE activity was 238 out of the 4096 possible
`hexamers, about 6% of the total, consistent
`with the notion that ESEs are quite common.
`In step three of the RESCUE procedure,
`predicted 5⬘ESE and 3⬘ESE hexamers were
`clustered on the basis of sequence similarity,
`and the hexamers in each cluster were multiply
`aligned using CLUSTALW (18) to identify
`candidate enhancer motifs (fig. S3) (17). This
`procedure yielded a total of five 5⬘ESE motifs
`(Fig. 2A) and eight 3⬘ESE motifs (Fig. 2B).
`Three of the five 5⬘ESE motifs—5A, 5B, and
`5C—are significantly similar to 3⬘ESE motifs
`3G, 3A, and 3D, respectively, so the three pairs
`5A/3G, 5B/3A, and 5C/3D were considered to
`
`represent just three distinct classes, each com-
`prising the union of the pair of similar hexamer
`clusters (17). The total number of distinct can-
`didate enhancer motifs identified by RESCUE-
`ESE was therefore 10.
`In the final step of the RESCUE proce-
`dure, representatives of these candidate en-
`hancer motifs were tested for ESE activity in
`a splicing reporter construct. For each cluster
`of predicted ESEs, a representative hexamer
`was chosen—referred to as the “exemplar” of
`the class. To place each exemplar hexamer in
`its natural context, we screened our human
`spliced gene database for an occurrence of
`each exemplar hexamer in a weak 5⬘ exon
`(bottom 10% of 5⬘ss scores) or weak 3⬘ exon
`(bottom 10% of 3⬘ss scores), as appropriate.
`A slightly longer region of sequence centered
`on the exemplar—referred to as the “extend-
`ed exemplar”—was then chosen from this
`exon and inserted into the reporter construct
`described below. The extended exemplar
`
`Fig. 2. RESCUE-ESE prediction of 5⬘ and 3⬘ ESEs in human genes. (A) Scatterplot for prediction of
`5⬘ESE activity. Hexamers are represented by colored letters as described in Fig. 1. Simplified
`dendrogram shows clustering of 5⬘ESE hexamers (total of 103 hexamers with ⌬EI ⬎ 2.5 and
`⌬5WS ⬎ 2.5) into five clusters of four or more hexamers. (B) Scatterplot for prediction of 3⬘ESE
`activity. Simplified dendrogram shows clustering of 3⬘ESE hexamers (total of 198 hexamers with
`⌬EI ⬎ 2.5 and ⌬3WS ⬎ 2.5) into eight clusters of four or more hexamers. Complete dendrograms
`of all hexamers are shown in fig. S3. The aligned sequences in each cluster are represented as
`Pictograms (http://genes.mit.edu/pictogram.html). Cluster labels (e.g., 3B, 5A/3G) are listed to the
`right of each Pictogram, with the total number of hexamers in the cluster indicated in parentheses.
`Clustering and alignment were performed as described (17).
`
`www.sciencemag.org SCIENCE VOL 297 9 AUGUST 2002
`
`1009
`
`3
`
`

`

`sequences comprise the 19-base region ex-
`tending from six bases 5⬘ of the exemplar
`hexamer to seven bases 3⬘ of the exemplar
`hexamer (Fig. 1).
`The splicing enhancer activity of each ex-
`tended exemplar sequence was then assessed by
`measuring its ability to “rescue” splicing of
`exon 2 of the reporter construct, pSXN (7).
`SXN exon 2 is only 32 bases long, including the
`19-base insert. Previously, this exon was ob-
`served to be predominantly skipped for most
`random insert sequences tested. This failure to
`be included is reversed when the exon is length-
`ened, when a splicing enhancer is present, or
`when the 5⬘ss, the branch point, or the poly-
`pyrimidine tract is improved (fig. S1) (19–21).
`Because strengthening either the 5⬘ss or 3⬘ss
`consensus sequence of SXN exon 2 or inserting
`an ESE causes exon inclusion, we reasoned that
`this exon would be a suitable reporter system
`for testing the activity of candidate 5⬘ESEs as
`well as candidate 3⬘ESEs.
`It was of particular interest to assess the
`ability of the RESCUE approach to predict
`ESE-disrupting mutations. Therefore,
`for
`each exemplar hexamer, a single-base mutant
`was chosen that was predicted to lack en-
`hancer activity [i.e., did not fall in the ex-
`treme upper right (“unanimous enrichment”)
`region of the scatterplot]. Typically, the sin-
`gle-point mutant farthest “southwest of ” (to
`the left and below) the exemplar in the scat-
`terplot was chosen. A “mutant” extended ex-
`emplar sequence containing just this single
`base change was then generated for each
`extended exemplar and inserted into the same
`cloning site in the SXN minigene. Constructs
`containing the extended exemplars and mu-
`tants were transiently transfected into HeLa
`cells, and the splicing phenotype was assayed
`by quantitative reverse-transcription poly-
`merase chain reaction (RT-PCR) (see fig. S2
`for protocol and quantitation curves).
`An initial set of experiments evaluated
`the robustness of the approach with respect
`to differences in the local context of the
`exemplar hexamer. For this purpose we
`focused on a
`representative hexamer,
`GAAGAA, chosen from the large purine-
`rich 5C/3D cluster of predicted enhancers
`(Fig. 2). The consensus sequences for these
`clusters and the chosen exemplar are simi-
`lar to the classical “GARGAR” enhancer
`(R represents either purine nucleotide, A or
`G). Occurrences of GAAGAA were identi-
`fied in three exons with weak splice sites,
`generating extended exemplar sequences GAA-
`GAA.1, GAAGAA.2, and GAAGAA.3, which
`lack appreciable similarity other than the
`shared hexamer GAAGAA (Fig. 3). All three
`extended exemplars conferred high levels of
`inclusion on the test exon, ranging from
`⬃50% for GAAGAA.3 to ⬃70% for GAA-
`GAA.1 (see fig. S4 for representative gels).
`All three of these extended exemplars con-
`
`R E P O R T S
`
`tained additional purine-rich hexamers
`overlapping the central GAAGAA hex-
`amer, which were also predicted to have
`enhancer activity by RESCUE-ESE (indi-
`cated by the vertical blue bars in Fig. 3).
`Next, the mutation G4⬎T was introduced
`into each extended exemplar [i.e., each cen-
`tral GAAGAA hexamer was mutated to
`GAATAA, a hexamer that falls far “south-
`west” of GAAGAA in the scatterplots and is
`predicted to lack ESE activity (see Fig. 4,
`motif 5C/3D)]. This mutation also disrupts
`many or all (for GAAGAA.3) of the overlap-
`ping RESCUE-ESE hexamers. As predicted,
`this mutation produced sharply reduced lev-
`els of inclusion in each of the three contexts,
`ranging from ⬃5% to ⬃30% of the wild-type
`level (Fig. 3). Taken together, these data sug-
`gest that different occurrences of the same
`exemplar tend to be qualitatively similar in
`their ability to enhance splicing and in their
`response to specific point mutations, but that
`the precise level of ESE activity depends
`on local sequence context. Another muta-
`tion predicted to disrupt ESE activity of
`GAAGAA, A2⬎T (Fig. 3, M2), also gave
`
`reduced levels of exon inclusion in the
`context of GAAGAA.3. On the other hand,
`the mutation A5⬎C (Fig. 3, M3) is predict-
`ed to preserve ESE activity because it con-
`verts GAAGAA to GAAGCA, another pre-
`dicted ESE hexamer, and this mutation
`slightly increases exon inclusion in the con-
`text of GAAGAA.3 (Fig. 3). These data
`anecdotally suggest that RESCUE-ESE can
`accurately predict which mutations will
`disrupt the enhancing activity of an ESE;
`some evidence for this conclusion is dis-
`cussed below.
`To assess the degree to which different
`exemplar hexamers from the same cluster
`would have similar ESE activity, we chose a
`quite different exemplar, AGAAAC, from the
`same 5C/3D cluster as GAAGAA. The ex-
`tended exemplar AGAAAC.1 also displayed
`ESE activity in the range observed for the
`different extended exemplars of GAAGAA
`(Fig. 3). However, the mutation G2⬎T, pre-
`dicted to disrupt the activity of AGAAAC,
`gave only a moderate (⬃27%) reduction in
`exon inclusion, from ⬃75% to ⬃55%. This
`remaining ESE activity might be attributable
`
`Fig. 3. Analysis of ESE activity for predicted enhancers of class 5C/3D. Upper panel: Extended
`exemplar sequences for three occurrences of the GAAGAA exemplar and one occurrence of the
`AGAAAC exemplar. All extended exemplars derive from arbitrarily selected occurrences of the
`exemplar in human exons with weak splice sites, as described in the text. Gene name and exon
`number are listed above each sequence. GenBank accession numbers for the mRNAs are as follows:
`XM_046769 (GAAGAA.1), XM_010365 (GAAGAA.2), AF212232 (GAAGAA.3), and BC020651
`(AGAAAC.1). Predicted ESE hexamers in each extended exemplar are indicated by blue bars above
`the sequence. Point mutations introduced into these sequences are shown in red, with predicted
`ESE hexamers in the mutant sequence shown by blue bars below the sequence. Each mutant is
`labeled by a red M if the mutation is predicted to disrupt ESE activity, or by a blue M if the mutant
`sequence is predicted to retain ESE activity. Total RNA extracted from HeLa cells was amplified by
`RT-PCR after transient transfection with the SXN reporter containing the indicated insert. Radio-
`labeled products were analyzed by polyacrylamide gel electrophoresis and visualized using a
`phosphorimager. (Representative autoradiographs are shown in fig. S4.) Bottom panel: Percent
`inclusion for each construct was calculated as the ratio of the intensity of the upper band (including
`exon 2) to the sum of the intensities of the upper and lower bands. All transfections were
`performed at least twice. The height of the colored bar indicates the average of all measurements;
`horizontal black lines indicate the minimum and maximum inclusion values observed.
`
`1010
`
`9 AUGUST 2002 VOL 297 SCIENCE www.sciencemag.org
`
`4
`
`

`

`to the retention of two predicted ESE hexa-
`mers in the mutated AGAAAC.1 sequence
`(Fig. 3), although this was not tested. This
`example underscores the difficulty in inter-
`preting the results of mutations in sequences
`containing additional predicted ESE hexa-
`mers. To rigorously test the predictions of the
`RESCUE-ESE method, we used sequences
`specifically chosen to contain exactly one
`predicted ESE hexamer (or zero, in the case
`of mutant sequences) in all other experiments
`reported here.
`One exemplar and a corresponding ex-
`tended exemplar were chosen from each of
`the 10 motifs for testing in the reporter con-
`struct. Only 19-nucleotide oligomers contain-
`ing a single RESCUE-ESE–predicted hex-
`amer in the middle were considered (table
`S1). Although this restriction might result in
`a bias toward selection of weaker enhancers,
`it was considered essential in order to avoid
`complications in interpreting the splicing
`phenotypes of sequences containing overlap-
`ping or adjacent predicted ESEs. For each
`motif, a single-base mutant predicted to dis-
`rupt the ESE activity of the exemplar hex-
`amer was chosen as described above, intro-
`duced into the extended exemplar sequence,
`and cloned into the reporter construct. The 10
`predicted enhancer and mutant constructs
`were transiently transfected and assayed for
`splicing as before (Fig. 4).
`All 10 of the predicted enhancers (blue
`bars) displayed ESE activity in the reporter
`system,
`ranging from weakly enhancing
`(⬃20% inclusion for 5B/3A and 3B)
`to
`strongly enhancing (⬃60 to 80% inclusion,
`for 5D and 3E). In addition, for 9 of 10
`classes of enhancer tested, the predicted ESE
`
`R E P O R T S
`
`sequence gave a significantly higher level of
`inclusion than the mutant (blue bar higher
`than red bar), motif 3F being the only excep-
`tion (see fig. S5 for representative gels).
`These results demonstrate the effectiveness
`of RESCUE-ESE for prediction of the effects
`of single base changes on ESE activity. The
`different point mutant sequences exhibited
`varying levels of inclusion. Mutant 3F gave
`comparable inclusion to wild-type 3F en-
`hancer. Three other mutants, 3C, 3E, and 3H,
`gave about two-thirds the level of inclusion
`of the wild-type sequence,
`indicating that
`ESE activity had been only partially im-
`paired. On the other hand, the remaining six
`mutants all had 10 to 50% of the wild-type
`level of inclusion. In absolute terms, these six
`mutants had inclusion levels below 20% and
`often less than 10%, comparable to that seen
`by others for typical random inserts in this
`context (7).
`In a large set of human exons, slightly
`more than 10% of all the hexanucleotides
`were found to match RESCUE-ESE hexa-
`mers (22), often in overlapping clumps as in
`Fig. 3, suggesting that ESEs are very com-
`mon in human genes. Counting each overlap-
`ping clump as a single enhancer, we found an
`average of 5.2 predicted enhancers per exon,
`with most exons containing between three
`and seven ESEs (20th and 80th percentiles,
`respectively). The hexamers in each cluster
`typically occurred more frequently in exons
`than in introns by a factor of 1.5 to 2 and
`more frequently in weak exons than in strong
`exons by a factor of 1.3 to 1.4. The average
`frequencies of hexamers in each of the 10
`RESCUE-ESE motif clusters were compara-
`ble or slightly lower in a database of more
`
`than 2000 alternative (skipped) exons than in
`our database of constitutively spliced exons
`(22); this finding suggests that the motifs we
`have identified are involved in recognition of
`both constitutively and alternatively spliced
`exons.
`Some sequences that display ESE activity
`were missed by the RESCUE method in its
`current form. For example, mutant 3F and
`one of the three predicted “neutral” sequenc-
`es tested (fig. S5) displayed enhancer activ-
`ity but did not contain any RESCUE-pre-
`dicted ESE hexamers. Analysis of the three
`predicted neutral sequences—19-base seg-
`ments that
`lack predicted ESE hexamers
`chosen from exons with weak splice sites—
`suggests a possible modification of the cut-
`offs used in the RESCUE-ESE protocol.
`Specifically, it was found that neutral se-
`quence N3 contained a hexamer that was
`close to the cutoff for ESEs: The hexamer
`CTACGC had ⌬EI ⫽ 16.9 (⬎⬎ 2.5) and
`⌬5WS ⫽ 2.2, just below the cutoff. By
`contrast, no hexamer in neutral sequence
`N1 or N2 had both ⌬EI ⬎ 2.5 and ⌬5WS or
`⌬3WS ⬎ 1.5, and no hexamer in any of the
`neutral sequences had ⌬5WS or ⌬3WS ⬎
`2.5. These and other data (22) suggest that
`altering the cutoffs used in RESCUE-ESE,
`perhaps by increasing the ⌬EI cutoff while
`simultaneously reducing the ⌬WS cutoff to
`1.5 or 2, might result in improved detection
`of ESEs.
`A database of published mutationally char-
`acterized natural ESE sequences was construct-
`ed, and these sequences were searched for
`occurrences of the hexamers in each cluster
`(tables S2 to S4). Five of the RESCUE-ESE
`clusters (5C/3D, 5E, 3C, 3E, and 3F) resemble
`
`Fig. 4. Analysis of ESE activity for 10 classes of predicted enhancers. HeLa
`cells were transfected with SXN splicing reporter construct containing
`inserts representing all 10 classes of predicted ESEs and point mutants of
`these sequences. The extended exemplar sequences used are listed in
`table S1. Upper panel: schematic representing ⌬EI and ⌬WS values for
`each tested exemplar hexamer (blue E) and point mutant hexamer (red
`M) from Fig. 2A or 2B, as appropriate. The label of the predicted ESE
`cluster from Fig. 2 is indicated above. Lower panel: Percent inclusion for
`
`each construct, calculated as in Fig. 3. (Representative autoradiographs
`are shown in fig. S5.) All transfections were performed at least twice. The
`height of the colored bar (blue for predicted ESE, red for mutant
`predicted to disrupt ESE activity) indicates the average of all measure-
`ments; horizontal black lines indicate the minimum and maximum in-
`clusion values observed. The predicted ESE hexamer and point mutant
`sequence are shown below the blue and red bars, respectively, with the
`mutated base shown in the corresponding color.
`
`www.sciencemag.org SCIENCE VOL 297 9 AUGUST 2002
`
`1011
`
`5
`
`

`

`R E P O R T S
`
`Fig. 5. Correlation between predicted
`ESEs and exon skipping mutations in
`human HPRT gene. (A) Exon skipping
`mutations were analyzed in terms of
`the set of hexamers affected; mutation
`2 (G88⬎T) is shown here as an exam-
`ple. All hexamers affected by the mu-
`tation are shown, with matches to
`RESCUE-predicted ESEs shown in blue
`and represented by a plus sign. The
`location of the mutation is indicated by
`a red arrow. (B) Summary of human
`HPRT gene mutations known to cause
`exon skipping [from (23, 24)]. Base
`changes that occurred within five nu-
`cleotides of a splice junction were ex-
`cluded, as they may alter the splice site
`signals. The first column lists the na-
`ture of the mutation (del ⫽ deletion,
`X⬎Y ⫽ substitution of base Y for base
`X), with coordinates listed relative to
`the translation start site of the HPRT
`cDNA (GenBank accession number
`NM_000194). The last two columns list
`the number of predicted ESE hexamers
`in the affected region of the wild-type
`and mutant sequences,
`respectively.
`(C) Locations of all mutations listed in
`(B) are indicated relative to the exon-
`intron structure of the HPRT gene by
`red arrows. Exon sizes are to scale;
`intron sizes are not. Mutations that
`alter RESCUE-predicted ESEs are shown
`below the
`exon-intron schematic,
`numbered according to (B). Mutations
`3
`labeled E
`X disrupt predicted ESE
`3
`hexamers; those labeled X
`E create pre

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket