`© 1993 Kluwer Academic Publishers. Printed in the Netherlands.
`
`Molecular cloning of BRCAl: a gene for early onset familial breast and
`ovarian cancer
`
`Anne M. Bowcock
`Department of Pediatrics, University of Texas Southwestern Medical Center, Dallas TX, USA
`
`Key words: chromosome 17q mapping, familial breast/ovarian cancer gene, gene mapping, molecular
`biology techniques, tumor suppressor genes, yeast artificial chromosomes (YACs)
`
`Summary
`
`Molecular analyses allow one to determine genetic lesions occurring early in the development of tumors.
`With positional cloning approaches we are searching for a gene involved in the development of early onset
`familial breast and ovarian cancer that maps to human chromosome 17q21 and is termed BRCAI. This
`involves localizing the region genetically within families with multiply affected members, capturing the
`region identified by genetic analyses in YACs (yeast artificial chromosomes), converting those YACs to
`smaller manipulable pieces (such as cosmids), and searching for genes via a variety of approaches such
`as direct screening of eDNA libraries with genomic clones, direct selection by hybridization, "exon
`trapping", and CpG island rescue. Once identified, candidate genes will be screened for mutations in
`affected family members in whom breast cancer segregates with the locus on 17q21. The frequency of
`this gene has been calculated to be 0.0033; from this the incidence of carriers, i.e. those carrying such a
`predisposition, is one in 150 women. The isolation of BRCAI and the elucidation of the mutations
`resulting in breast and ovarian cancer predisposition will allow identification of women who have inherited
`germ-line mutations in BRCAI. In families known to harbor a germ-line BRCAl mutation, diagnosis of
`affected members will be rapid. It is possible that one will also be able to detect alterations of the second
`copy of this gene early in tumor development in individuals carrying a germ-line mutation. It is not yet
`known how frequently somatic BRCAl mutations predispose to breast and ovarian carcinoma in the
`general female population. If, as in other genetic diseases, new germ-line mutations occur in some women
`and thus contribute to the development of breast cancer, it may be feasible to screen women in the general
`population for predisposing mutations. In addition, if acquired genetic mutations of the BRCAI gene are
`involved as early events in the development of non-familial forms of the disease, early detection of
`possible breast carcinoma may become feasible in biopsy of breast tissue.
`
`Introduction
`
`incidence of breast cancer is
`the
`Although
`estimated to be 1/9 for a woman over her life(cid:173)
`time, certain women appear to be at an increased
`
`risk. These women harbor germ-line mutations
`that predispose to breast cancer susceptibility,
`and in general develop the disease at an earlier
`age. A characteristic of familial cancers is that
`besides being of earlier onset than normal, the
`
`Address for correspondence and offprints: Dr. A.M. Bowcock, Department of Pediatrics, McDermott Center for Human Growth
`and Development, 6000 Harry Hines Blvd., Dallas, TX 75235-8591; Tel: 214-648-1600; Fax: 214-648-1666
`
`GeneDX 1006, pg. 1
`
`
`
`122
`
`AM Bowcock
`
`cancer is often bilateral. Some of the women
`who are at increased risk of developing breast and
`ovarian cancer harbor a mutation in a gene termed
`BRCAl (see Table 1 for list of abbreviations
`used). Over their lifetime, the likelihood that
`these women will develop breast or ovarian can(cid:173)
`cer is approximately 90%.
`Tumor development and progression is accom(cid:173)
`panied by a series of events that occur in a single
`
`clone of cells as a result of molecular lesions in
`a specific set of genes. Breast cancer, like other
`cancers, is likely to occur as a result of aberrant
`gene expression. Some of this is due to loss of
`expression of genes such as those that normally
`regulate or suppress cell growth (tumor suppressor
`genes or anti-oncogenes), while some is due to an
`increase in gene expression, e.g. the activation of
`growth-promoting factors such as proto-onco-
`
`Table 1. Table of abbreviations
`
`Alu-PCR:
`APC:
`BAC:
`BRCAl:
`eDNA:
`eM:
`COLlAl:
`DCC:
`DOGE:
`Dl7S74:
`
`EDH17B2:
`ERBB2:
`FAP:
`FISH:
`GIP:
`kb:
`HER2/neu:
`HOX2:
`HTF islands:
`IRS-PCR:
`LOH:
`Mb:
`Morgan:
`MEN!:
`MEN2:
`NF2:
`NMEI:
`NME2:
`PHB:
`RARA:
`RFLP:
`RBI:
`PCR:
`SSCP:
`STS:
`THRAl:
`
`VNTR:
`YAC:
`
`(see IRS-PCR)
`adenomatosis polyposis coli
`bacterial artificial chromosome
`gene for familial early onset breast and ovarian cancer
`DNA copied off an mRNA template by reverse transcriptase
`centimorgan, or 1/lOOth of a Morgan
`collagen, type I, alpha I
`deleted in colorectal carcinoma
`denaturing gradient gel electrophoresis
`74th single copy segment of DNA to be isolated from human chromosome 17 (term
`for a locus)
`estradiol 17 -beta dehydrogenase I
`avian erythroblastic leukemia viral v-erb-B2
`familial adenomatosis polyposis coli
`fluorescence in-situ hybridization
`gastric inhibitory polypeptide
`kilo-base (1 ,000 bases)
`See ERBB2
`borneo box region 2
`Hpall tiny fragment islands
`inverse-repeated sequence PCR
`loss of heterozygosity
`mega-base (1,000,000 bases)
`a unit of recombination (there are 33 Morgans in the human genome)
`multiple endocrine neoplasia I
`multiple endocrine neoplasia II
`neurofibromatosis 2 (bilateral acoustic neuroma)
`non-metastatic cells 1, expressing NM23 protein
`non-metastatic cells 2, expressing NM23 protein
`prohibitin
`retinoic acid receptor, alpha
`restriction fragment length polymorphism
`retinoblastoma I (including osteosarcoma)
`polymerase chain reaction
`single-strand conformation polymorphism
`sequence tagged site
`thyroid hormone receptor alpha 1 (avian erythroblastic leukemia viral (v-erbA)
`oncogene homolog 1, formerly ERBA1)
`variable number of tandem repeats
`yeast artificial chromosome
`
`GeneDX 1006, pg. 2
`
`
`
`genes. Chromosomal rearrangements have identi(cid:173)
`fied several chromosomal regions or genes likely
`to be involved in either the development or the
`progression of breast cancer. One can also local(cid:173)
`ize genes predisposing to disease on the basis of
`co-segregation with DNA markers in multiply
`affected families. The gene we are searching for
`was identified by Mendelian genetics, and local(cid:173)
`ized to human chromosome 17 q21.
`Current molecular genetic technology is being
`used to
`isolate this gene;
`termed positional
`cloning, it relies on localizing the disease gene
`genetically, capturing
`the region physically,
`searching for the genes
`in
`the region, and
`identifying BRCA1 among them. Although one
`would expect women from a large proportion of
`families where breast cancer is segregating as a
`Mendelian trait to harbor an alteration at BRCA 1,
`it is estimated that women with no family history
`may also harbor an altered BRCA1 gene. It has
`been estimated that 1/150 to l/500 women are at
`increased risk of developing breast and ovarian
`cancer due to an alteration of BRCA1 in their
`germ-line DNA.
`This review describes our
`approaches to isolating BRCA1, and outlines
`positional cloning approaches in general.
`
`Segregation analysis to dissect diseases
`genetically
`
`One risk factor for the development of breast
`cancer is a family history of the disease [26].
`One can perform a "segregation analysis" to
`determine the best genetic model for a disease.
`These studies require the ascertainment of large
`numbers of affected individuals from a single
`population and provide information on the hypo(cid:173)
`thetical genetic component of the disease, e.g.
`whether the disease is likely to be inherited in a
`recessive or dominant fashion, and the penetrance
`of the gene (percent of members with the defec(cid:173)
`tive gene who will develop the disease).
`A large study of 4,730 histologically con(cid:173)
`firmed breast cancer cases between the ages of 20
`and 54 along with 4,688 controls has provided
`
`Molecular cloning of the BRCAJ gene
`
`123
`
`evidence for the existence of a rare autosomal
`dominant allele with a frequency of 0.0033, that
`leads to an increased susceptibility to breast
`cancer [10]. The lifetime risk for a woman with
`such a susceptibility allele is predicted to be 92%,
`in contrast to the cumulative lifetime risk of non(cid:173)
`carriers which is estimated to be approximately
`10%. This study agreed with smaller, earlier ones
`such as that by Newman et al [35], who studied
`1,579 cases of breast cancer and predicted that
`women with the susceptibility allele had a lifetime
`risk of developing breast cancer of 82%, versus
`8% for the general population. The ,study by
`Newman et al suggested that 4% of cases are due
`to an inherited predisposition. Other studies
`suggest that more than one locus may predispose
`to familial breast cancer. It has been shown that
`< 1% of women, who develop breast cancer at a
`very young age and who often have children who
`develop sarcomas, carry a germ-line mutation in
`the tumor suppressor gene p53 [30].
`
`17q2llinkage
`
`Since segregation analyses suggested that some
`forms of breast cancer predisposition can be
`accounted for by a single gene, it was reasonable
`to attempt to map it. Gene mapping is currently
`performed by linkage analysis with DNA markers
`in multiply affected families followed by posi(cid:173)
`tional cloning approaches to isolate the gene
`subsequently.
`Linkage analysis relies on the identification of
`a marker or markers that segregate with disease
`predisposition. Markers were originally protein
`polymorphisms, but have been replaced by DNA
`markers such as RFLPs (restriction fragment
`length polymorphisms) and VNTR (variable num(cid:173)
`ber of tandem repeat polymorphisms). There are
`many different classes of repeats that differ in the
`number of copies present at any one site. Some
`of the most useful are variable numbers of "di, tri
`and tetra-nucleotide repeats". Commonly called
`"polymorphic microsatellites",
`they have re(cid:173)
`volutionized linkage analysis since
`they are
`
`GeneDX 1006, pg. 3
`
`
`
`124
`
`AM Bowcock
`
`ubiquitous, with a microsatellite occurring
`approximately every 40kb. Polymorphic loci con(cid:173)
`taining microsatellites are highly variable since
`most individuals are heterozygous.
`This is
`indispensable for linkage analysis since nearly
`every individual is informative and one can
`determine which allele is inherited with the
`disease gene most of the time, with the result that
`little information is lost from the rare but impor(cid:173)
`tant families in which breast cancer segregates as
`a Mendelian trait. Microsatellites can be typed
`with the polymerase chain reaction, which is fast
`and requires approximately 100 times less DNA
`than RFLP-based linkage analysis (e.g. 30ng
`instead of 3-5).lg). One additional advantage of
`microsatellites for a disease such as breast cancer,
`is that since affected members have often died at
`the time the family is genotyped, archival tissue
`such as microscope slides or paraffin blocks
`containing the patients' normal tissue can act as
`an invaluable source of DNA with which to re(cid:173)
`construct their genotypes. This can be performed
`with PCR-based typing; we routinely obtain suffi(cid:173)
`cient DNA from a 10J.lm section for approximate(cid:173)
`ly 100 PCR reactions. We have also obtained
`sufficient DNA from tumor DNA scraped off
`microscope slides for approximately 25 PCR
`reactions.
`In 1990, it was shown that a VNTR marker on
`chromosome 17q (D17S74, or cMM86) segrega(cid:173)
`ted with breast cancer predisposition in seven out
`of 23 families (40%) where the onset of the
`disease occurred before the age of 46 [22]. Here
`the two-point lod score in the early-onset families
`was high enough above the threshold value of 3.0
`to be strong evidence for linkage. This analysis
`also demonstrated that the disease is genetically
`heterogeneous (i.e. that breast cancer predisposi(cid:173)
`tion did not always segregate with this marker
`and may be linked to other susceptibility genes in
`other families). Subsequent analyses showed that
`the "late-onset"
`a confounding
`influence
`in
`families is the co-occurrence of the disease in
`relatives due to non-germ-line alterations [31]. In
`the case of this predisposing gene BRCA 1,
`D17S74 was initially shown to lie at a distance of
`
`10% recombination from it. This represents a
`map distance of approximately 10cM (1% recom(cid:173)
`bination corresponds
`to a map distance of
`approximately 1cM).
`In terms of physical
`distance, lcM represents 1,0000kb, on average,
`although this distance can vary widely (for
`example, 1cM may correspond to 100kb in re(cid:173)
`combination hot spots, where there is more
`recombination than the average, and to 1 O,OOOkb
`in recombination cold spots, where it is less than
`the average).
`The results of Hall et al [22] were quickly
`confirmed by Narod et al [34] who studied five
`families where both breast and ovarian cancer was
`segregating and who demonstrated that breast/
`ovarian cancer predisposition was
`linked to
`D17S74 in three families. Here the combined lod
`score was 2.20 at a recombination fraction 0 of
`.20 (or approximately 20cM).
`In an attempt to confirm the previously pub(cid:173)
`lished linkage results, to localize the disease
`locus more definitively, to examine the extent of
`genetic heterogeneity, and to estimate the pene(cid:173)
`trance of the BRCAl gene, a joint analysis of
`data from 13 groups was performed with a total
`of 214 families with apparent hereditary pre(cid:173)
`disposition to breast and/or ovarian cancer [15].
`This localized BRCAl to an 8.3cM interval (18
`eM in females) between D17S588 and D17S250,
`with odds of 66: 1. When families with only
`breast cancer were considered, breast cancer
`predisposition was
`linked
`to
`this
`locus
`in
`approximately 45% of the families. When fami(cid:173)
`lies with both breast and ovarian cancer were
`considered, cancer predisposition was linked in
`nearly all cases. This suggests that a gene(s) on
`chromosome 17q accounts for most families with
`both early-onset breast and ovarian cancer, but
`that there exist other genes predisposing to breast
`cancer. In the linked families, the risk associated
`with inheritance of the defective gene was estima(cid:173)
`ted to be 59% at age 50 and 82% at age 70 [15].
`An example of a hypothetical family in which
`breast cancer predisposition in females segregates
`with a highly informative DNA marker is shown
`in Figure 1.
`
`GeneDX 1006, pg. 4
`
`
`
`1,2
`
`3,4
`
`dx 37
`
`dx 34
`
`2,5
`
`4,5
`
`4,6
`
`2,5
`
`4,6
`
`Figure 1. Example of family with early onset breast cancer
`in which breast cancer predisposition is segregating with
`allele 4 of a linked locus. Genotypes at the linked locus are
`shown underneath the pedigree symbols. Circles: females;
`squares: males; shaded circles: affected females; Dx: age
`at which breast cancer was diagnosed.
`In this pedigree,
`males with allele 4 are unaffected. This resembles the
`situation for BRCAl, where males harboring a linked allele
`are unaffected.
`
`Evidence that BRCAl is a tumor suppressor
`gene
`
`There is some evidence that the BRCAl gene is
`a tumor suppressor gene [54]. A hallmark of
`tumor suppressors is the finding of nearby allele
`losses, reflecting regions of chromosomal loss at
`the suppressor gene locus in tumor DNA [44]. In
`the case of a suppressor gene involved in inheri(cid:173)
`ted predisposition, these allele losses would be
`expected to occur on the chromosome containing
`the wild-type allele, thereby inactivating this
`allele (inactivation of the first allele having been
`inherited). When loss of heterozygosity studies
`are performed in tumors of affected members in
`multiply affected breast and ovarian cancer
`families shown by linkage analysis to harbor a
`germ-line BRCA1 mutation, it has consistently
`been observed by us and others [54], that the
`chromosome 17 which is lost is the one which
`carries the wild-type BRCAl gene. The chromo(cid:173)
`some 17 retained in the tumors is the one
`
`Molecular cloning of the BRCAJ gene
`
`125
`
`containing the mutant BRCAl. This suggests that
`tumor predisposition in these cases is due to loss
`of a normal BRCAl gene, and provides evidence
`that BRCAl is a tumor suppressor. In the family
`in Figure 1, one would expect that tumors exhibit(cid:173)
`ing LOH of human chromosome 17 would retain
`allele 4.
`
`Familial vs sporadic forms of breast cancer
`
`Breast cancer attributable to lesions at BRCAl
`may be similar to other malignancies that occur
`both as a sporadic form and a familial form: e.g.
`renal-cell carcinomas and Von Rippel Lindau
`disease [51], colon tumors and familial adenoma(cid:173)
`taus polyposis (FAP) and Gardner's syndrome [2,
`21,23,36], and acoustic neuromas which can be
`found sporadically or in individuals with a genetic
`predisposition due to neurofibromatosis type 2
`(NF2) [16].
`In addition to the linkage of breast cancer pre(cid:173)
`disposition to BRCA1 in some families (approx(cid:173)
`imately 60% of families with three or more
`members with breast cancer), there is also some
`evidence that alterations of a gene at 17q21 occur
`in tumors of women with no family history. This
`is based primarily on studies of sporadic breast
`tumors. In one instance, 40.8% of premenopausal
`and 32.5% of postmenopausal breast carcinomas
`had undergone LOH at 17q21.3 [49]. Similar
`results have also been observed by Futreal et al
`[18], who describe a common region of deletion
`that lies between D17S250 and D17S579 at
`17 q 11.2-a21. Other chromosomal regions impli(cid:173)
`cated in the etiology of breast cancer are 3pl3-
`14.3 (the segment where breakpoints are often
`seen in renal cell carcinomas [11,27,43], llp,
`13q, 16q22-q23, and 17p13 [47,48]. An associ(cid:173)
`ation has also been demonstrated between LOH
`on 17p and 17q and amplification of the erbB2
`oncogene [48] which has previously shown to
`have predictive value for recurrence of breast
`cancer [53]. Interestingly, tumors which had lost
`chromosome 17p, the locale of p53, had also lost
`chromosome 13q, the locale of RB 1. These
`
`GeneDX 1006, pg. 5
`
`
`
`126
`
`AM Bowcock
`
`N
`N
`
`It)
`
`oq-
`C\1
`
`C\1 •
`
`c.
`
`10-20 eM
`(1 0,000-20,000kb)
`
`1- 2 eM
`(1 ,000- 2,000kb)
`
`YACs: 200 - 2,000kb
`
`cosmids: 40kb
`
`localization with linkage Jn multiply
`affected families
`
`refinement of localization with
`recombinants in linked families
`
`physical mapping: isolation of
`overlapping clones ( contig)
`across region
`
`cDNAs (approx. 1 every 30kb)
`
`BRCA1
`
`Mutational
`analysis of
`affecteds and
`tumors
`
`Figure 2. Outline of approaches used in positional cloning, beginning with localization of a disease locus to a chromosomal
`region and refinement of the region genetically, followed by cloning of the region, identification of genes in the region, and
`ultimately, identification of the predisposing gene on the basis that it is altered in affected members of linked families.
`
`tumors more frequently had highly malignant
`histopathological features [48].
`Since LOH at 17q21 has been observed in
`breast and ovarian carcinomas, an attempt is
`being made to localize BRCAl on the basis of the
`smallest region that is lost.
`This approach
`facilitated the mapping of neurofibromatosis type
`2 (NF2) and MENl, as well as the cloning of the
`RBI and DCC genes (see review by Ponder [8,17,
`44]). However, it is not always straightforward
`to detect chromosomal regions on the basis of a
`commonly deleted region (e.g. PAP and MENl
`[44]). This may be so if the loss-of-activity
`mutations are not truly recessive at the cellular
`level, so that tumors may arise if only one allele
`is altered. For this reason this type of approach
`is being employed with reservation.
`
`Positional cloning of BRCAl
`
`Positional cloning has been used effectively in the
`
`isolation of several important human disease
`genes such as cystic fibrosis, familial adenoma(cid:173)
`tous polyposis, and Huntington's disease [21,24,
`46]. An outline of this approach is shown in
`Figure 2 and relies on first localizing a disease
`gene to a chromosome or chromosomal region,
`usually by linkage analysis, refining the region
`genetically by obtaining closer markers, and, once
`the region is small enough, cloning it in a large
`but manipulable form. Several different types of
`recombinant molecules can accommodate large
`segments of DNA: Y ACs (yeast artificial chromo(cid:173)
`somes, which accommodate segments of generally
`between 200kb and 2Mb), Pl phage (accommo(cid:173)
`dating segments of approximately lOOkb ), and
`BACs (bacterial artificial chromosomes). These
`large DNA segments are usually converted to
`smaller overlapping ones such as cosmids (which
`have inserts of 40kb) in order to isolate genes.
`We have obtained a series of ordered, overlapping
`YAC clones (a Y AC contig) of the region span(cid:173)
`ning BRCAl, and have identified cosmids hybri-
`
`GeneDX 1006, pg. 6
`
`
`
`dizing to the Y ACs. The overlapping cosmid
`clones are now being used to screen for genes
`within
`the
`region
`that has been defined
`genetically. Genes that are candidates for the
`disease gene can then be tested for alterations in
`patients and in their breast and ovarian tumors.
`
`Refinement of the gene location genetically
`
`Once a gene has been localized by linkage to a
`chromosomal region, it is likely to lie several
`thousand base-pairs from the linked marker. The
`distance Dl7S74 was estimated to lie from the
`gene (lOcM) corresponded
`to approximately
`lO,OOOkb. In order to home in on BRCAl, it was
`first necessary to refine the region genetically by
`identifying closer markers. This can be done with
`families in which breast cancer predisposition
`segregates with a locus at 17q21 (presumably
`BRCAl), utilizing individuals in these linked
`families with a chromosome carrying the altered
`BRCAl allele that has undergone recombination
`between the disease and a closely linked marker.
`These recombinant individuals become crucial in
`the refinement of disease genes. The limit to
`which the region containing the disease gene can
`be refined genetically is dependent upon these
`recombinants. One caveat in the study of a
`common disease such as breast cancer is that one
`cannot discriminate between bona-fide recombi(cid:173)
`nant individuals and individuals who do not carry
`an altered BRCAl but develop a non-familial
`(sporadic) form of the disease. For this reason,
`refinement of the BRCAl region genetically must
`be treated with some caution. With a study of
`recombination breakpoints in linked families, we
`refined BRCAl to a region of less than 4cM,
`flanked by THRA1 on the centromeric side and
`D17S 183 (SCG43) on the telomeric side [6].
`This eliminated many of the genes on 17q that
`were candidates for being the BRCAl gene (e.g.
`HER2/neu, THRAl, WNT3, HOX2, prohibitin
`[PHB], COLlAl, NMEl, and NME2). The two
`that remained were RARA, the gene for the
`alpha-subunit of the retinoic acid receptor, and
`
`Molecular cloning of the BRCAJ gene
`
`127
`
`EDH17B2 which encodes estradiol 17~-hydroxy
`steroid dehydrogenase II. Subsequently these
`have been excluded by further genetic mapping
`(in the case of RARA) and sequencing of affected
`individuals (in the case of EDH17B2) [52]. An
`attempt to refine the region containing BRCAl
`still further was initially hampered by an absence
`of additional genetic markers in this region. We
`have now constructed a very dense genetic map in
`this region which contains 33 ordered polymor(cid:173)
`phic markers (12 genes and 21 anonymous DNA
`segments) lying between D 17S250 and D 17S588.
`This comprises a region of approximately 8.3cM
`[1] with a polymorphic marker every 250kb on
`average.
`
`Physical cloning of the region - YAC
`screens with STS
`
`Once a region containing a disease-gene can be
`refined no further genetically, and no trans(cid:173)
`locations or other cytogenetically visible chromo(cid:173)
`some alterations are detected in the region that
`may disrupt the gene in question, the only viable
`approach at present to obtaining the gene is to
`capture the linked region physically, search for all
`the genes within it, and see which is altered
`consistently in linked families and tumors as
`described above. Until recently, yeast artificial
`chromosomes (YACs) were used to clone large
`segments of DNA. However, YACs are not
`without their problems: 40 - 55% of YACs are
`chimeric and contain genomic sequences from
`two or more non-contiguous regions of the
`In addition, the instability of
`genome [9,3].
`large cloned pieces of DNA results in deletions
`and rearrangements within some Y ACs. These
`problems need to be identified to avoid ambigu(cid:173)
`ous interpretations of data. Other clones con(cid:173)
`taining large inserts such as Pl bacteriophage and
`bacterial artificial chromosomes may not have as
`many associated problems as Y ACs, although
`their insert sizes are not as large and they have
`ryot been used as extensively.
`We are screening YAC libraries with a PCR-
`
`GeneDX 1006, pg. 7
`
`
`
`128
`
`AM Bowcock
`
`based screening method previously described [20].
`We have captured almost the entire region be(cid:173)
`tween D17S250 and GIP in YACs with PCR
`assays for short, unique different sequences,
`termed STSs or sequence tagged sites [37]. Many
`of these were derived from known genes or poly(cid:173)
`morphic markers previously shown to map to this
`region of chromosome 17. The order of the loci
`in the "Y AC contig" was obtained by a combina(cid:173)
`tion of genetic mapping and physical mapping.
`Physical mapping has been achieved by fluores(cid:173)
`cence in-situ hybridization" or "FISH" [56],
`"radiation hybrid mapping" [13], and pulsed-field
`gel electrophoresis [50].
`
`Y AC contig construction
`
`Y AC clones can be ordered on the basis that they
`share common sequences (e.g. common STSs de(cid:173)
`tectable with PCR). With this type of analysis,
`we have linked up and ordered most of the Y ACs
`in the BRCA1 region of human chromosome 17q.
`Surprisingly, the chimerism rate of YACs in this
`region is 80%, which is higher than expected. As
`with most Y AC contigs, there are a few gaps that
`still remain in the Y AC contig map and some
`Y ACs have deletions and rearrangements.
`
`Y A Cs to cosmids
`
`In general, it is easier to identify genes with
`smaller clones than Y ACs, such as cosmids.
`These have inserts of approximately 40 kb, and
`can be used in a variety of strategies to search for
`In order to obtain a set of overlapping
`genes.
`cosmid clones for the BRCA1 region, the YACs
`in the contig have been used to generate probes
`for hybridization to gridded chromosome 17-
`specific cosmids. The probes have been gen(cid:173)
`erated with "inverse repeated sequence PCR".
`This amplifies human DNA sequences between
`human-specific Alul repeats in the Y AC DNA.
`Alul sequences are approximately 360bp long,
`highly conserved, and distributed throughout the
`
`human genome in 300,000 to 500,000 copies [25].
`It is estimated that an Alul sequence occurs at
`least once every 10kb on average. Commonly
`called "Alui-PCR" or "IRS-PCR"
`[28],
`the
`approach described above will generate a variety
`of small fragments from the YAC inserts. Al(cid:173)
`though such PCR fragments contain a small
`amount of Alul sequence, which is repetitive and
`would identify cosmids non-specifically, the Alul
`sequences can be blocked before hybridization by
`being pre-annealed with human DNA rich in re(cid:173)
`petitive sequences. An outline of this approach is
`shown in Figure 3. One problem with using IRS(cid:173)
`PCR products as probes, is that regions poor in
`Alul sequences will be under-represented, and this
`may result in gaps in the resultant cosmid contig.
`
`Identification of genes
`
`We are screening for BRCA1 in a variety of
`eDNA libraries. eDNA libraries are constructed
`by obtaining RNA from a cell or tissue source,
`isolating the mRNA, converting the mRNA to
`eDNA (or copy DNA) with reverse transcriptase,
`generating double-stranded eDNA, and cloning
`the double stranded eDNA fragments into an
`appropriate vector,
`such a plasmid or a
`bacteriophage.
`It is estimated that 3% of the human genome
`is coding sequence for functional and structural
`proteins [5]. Since the size of a haploid human
`genome is 3 X 109 bp, one can calculate that
`there are 100,000 genes. There will therefore be
`one gene, on average, every 30kb. Each cell type
`is estimated to express approximately 10,000
`genes. Transcripts of these genes are expressed
`at varying levels of abundance ranging from one
`to 200,000 copies per cell. One third of all genes
`are expressed at low levels of approximately 1 -
`10 copies per cell [19,39]. When searching for a
`gene such as BRCA 1, one does not know at the
`outset, in which tissue it is expressed. This
`means that the choice of eDNA libraries is
`critical. Some libraries may be better candidates
`than others, however. For example, 30% of all
`
`GeneDX 1006, pg. 8
`
`
`
`Molecular cloning of the BRCAJ gene
`
`129
`
`Inter-repeated sequence (IRS)-PCR
`
`.... ~ ....
`
`YAC
`
`-
`
`Alul repeat
`
`-- --
`--
`- -- -IRS-PCR products
`- -
`•
`- - - -
`- 111.:~=]1-· -
`- -t Hybridize
`
`M
`
`•
`
`•
`
`M Cot1 DNA and
`sonicated human
`placental DNA
`
`~ Label with 32P;
`Pre-hybridize
`\
`Denature
`
`-
`
`oooooooooooe
`ooeooooooooo
`000000000000
`ooooooooeooo
`000000000000
`ooooooeooooo
`eooooooooooo
`000000000000
`Gridded chromosome specific cosmid library
`
`Cosmid homologous to YAC DNA sequence
`
`e
`
`Figure 3. Conversion of a YAC to cosmids by IRS-PCR of YAC DNA, blocking of the repeats in the IRS-PCR products, and
`hybridization to blots of gridded colonies containing chromosome 17 -specific cosmids.
`
`genes are expressed in brain [32]. Similarly,
`placenta is a good source for many genes [38].
`Since breast carcinomas are usually derived from
`epithelial cells, we are also screening a variety of
`epithelial eDNA libraries from nasal and tracheal
`epithelia, normal ovary, and ovarian carcinomas,
`besides a variety of libraries derived from human
`breast carcinoma cell lines, placenta, fibroblast,
`and brain.
`One other issue in screening libraries, is the
`number of cDNAs to be screened to be fairly
`certain that a low abundance eDNA will be
`identified. For this reason, one usually screens
`106 cDNAs derived from a specific tissue source.
`One way of limiting the number of cDNAs to be
`screened, is to begin with a "normalized" or
`partially normalized library, where cDNAs are
`
`equally represented in number. However, the
`construction of normalized
`libraries
`is still
`problematic, and few such libraries are available.
`The "direct selection by hybridization" strategy,
`described below, combats the problem of normal(cid:173)
`ization to some degree. One final issue in
`screening for genes is that if the gene one is
`searching for is transiently expressed, the stage
`at which the RNA is extracted from a particular
`tissue is critical.
`Figure 4 outlines the different approaches that
`are currently used in a search for cDNAs. One of
`the most straightforward means of identifying
`clones of interest in eDNA libraries is by direct
`hybridization of genomic probes from the region
`in question. Probes can be cosmids, phage or
`Y ACs. In order to eliminate non-specific detec-
`
`GeneDX 1006, pg. 9
`
`
`
`130
`
`AM Bowcock
`
`17q12-q21
`
`YAC1
`
`YAC2
`
`YAC3
`
`Cosmids ------
`- - - -
`t
`trapping 1
`ZOO blots 1
`
`CpG island
`
`Exon trapping
`EXON
`
`1
`
`Direct selection
`EXON
`I@
`
`cos
`
`I
`
`' ~ ~
`cells/ splicing \(==-'
`
`PCR-amplified eDNA
`
`RNA size: Northern blots
`
`. - - - eDNA
`
`eDNA
`library
`
`~
`
`Figure 4. Strategies for the identification of genes with cloned genomic DNA from a defined region of the genome.
`
`tion of cDNAs as a result of hybridization to
`human repetitive sequences present in genomic
`clones (approximately 1/3 of the eDNA clones in
`a eDNA library have Alul sequences at their 3'
`ends), repetitive sequences in the probe are first
`"blocked" by pre-hybridization with human repeti(cid:173)
`tive DNA.
`Cosmids are the preferred reagent in the
`"direct selection by hybridization" strategy. This
`approach, pioneered by Lovett et al [29] and
`Parimoo et al [42], utilizes a genomic substrate
`such as cosmids, phage, Y ACs, or even whole
`chromosomal DNA. After blocking the human
`
`repeats in such a substrate, it is hybridized in
`solution to PCR-amplified eDNA. The template/
`eDNA complex is then captured on magnetic
`beads. The complex is washed to eliminate
`unbound material and the bound eDNA i