`
`Nucleic Acids Research
`
`Sequence analysis of cloned cDNA encoding part of an immunoglobulin heavy chain
`
`John Rogers, Patrick Clarke and Winston Salser
`
`Molecular Biology Institute and Department of Biology, University of California, Los Angeles,
`CA 90024, USA
`
`Received 8 March 1979
`
`ABSTRACT
`
`plasmid pH21-1 consists of mouse-derived complementary
`recombinant
`The
`DNA (cDNA) in the
`pMB9.
`plasmid
`insertion
`has
`E. coli
`The mouse
`been
`encodes the CH3 domain and half the CH2 domain of
`completely sequenced,
`and
`The predicted amino acid sequence differs
`the immunoglobulin yl heavy chain.
`at several positions from that previously published for
`this
`protein.
`The
`that in some other eukaryotic messenger
`pattern
`resembles
`of
`codon
`usage
`RNAs.
`secondary
`A computer program has been used
`to
`predict
`the
`optimum
`structure for the mRNA encoding the CH3 domain and the inter-domain junction.
`
`INTRODUCTION
`
`of
`
`(1-4)
`cloning
`elegant techniques
`development
`for
`With
`the
`DNA
`(cDNA),
`it has become possible to
`complementary
`eukaryotic
`to
`messengers
`both
`prepare large quantities
`in
`for
`such
`of
`many
`cDNAs
`sequence
`use
`analysis,
`locating the same sequences in cellular DNA and RNA.
`and
`in
`The
`immunoglobulin heavy chain genes are of particular interest, since
`only
`not
`do
`undergo
`they
`diversity and joining of variable and
`of
`generation
`the
`immunoglobulins,
`constant regions which so far appear to be peculiar to
`but
`also
`members of a developmentally regulated multigene family, and
`they
`are
`the ancestral heavy chain gene apparently arose by tandem
`duplication
`of
`a
`still smaller genetic unit, the immunoglobulin domain.
`pH21-1,
`A recombinant DNA
`plasmid,
`constructed
`has
`been
`containing
`from the heavy chain messenger of the IgGl-producing mouse myeloma
`sequences
`(R. Wall,
`MOPC21
`K. Toth,
`G. Paddock,
`Higuchi,
`R.
`and
`W. Salser,
`unpublished).
`Here
`we report the complete restriction map and DNA sequence
`of the
`mouse-derived
`insert
`cloned
`in
`this
`plasmid.
`contains
`It
`459
`nucleotides
`encoding
`of the yl constant region.
`C-terminal 1 domains
`the
`Some characteristics of the coding sequence are discussed.
`
`C) Information Retrieval Limited 1 Falconberg Court London Wl V 5FG England
`
`3305
`
`Genzyme Ex. 1033, pg 871
`
`
`
`Nucleic Acids Research
`
`MATERIALS AND METHODS
`
`Construction of pH21-1
`
`Construction
`series of cDNA clones containing K light chain mRNA
`of
`a
`sequences has been reported (5).
`The mRNA was
`from
`solid
`tumors
`of
`the
`IgGl-producing
`myeloma
`(6).
`In these same experiments, cDNA
`MOPC21
`mouse
`clones containing heavy chain mRNA sequences were
`constructed
`by
`the
`same
`methods using the 16-17S fraction of the mRNA, in which the principal species
`the heavy chain messenger (7).
`is
`The cDNA was inserted into the EcoRI site
`of plasmid pMB9, by means of poly-dA and poly-dT "tails" on the respective 3'
`ends of insert and plasmid, and the recombinant plasmids were
`cloned
`in
`E.
`(R.
`coli
`Wall,
`Paddock,
`and
`Toth,
`K.
`Higuchi,
`G.
`R.
`W. Salser,
`unpublished).
`One clone gave a distinct peak of hybridization
`16-17 S
`with
`MOPC21
`communication)
`(R.
`DeBorde,
`Wall
`mRNA
`and
`personal
`D.
`and was
`designated pH21-1.
`
`Restriction Analysis
`
`(8)
`Plasmid DNA was prepared as
`(9).
`in
`refs.
`and
`EndoR.TaqI
`was
`prepared
`S. Hendrich
`by
`using an unpublished technique of M. Komaromy.
`It
`was used at 650C in 10 mM HEPES pH 8.4,
`6 mM MgC12, 6 mM a-mercaptoethanol ,
`ml-1
`(NH4)2s04,
`25
`mM
`100 pg
`gelatin.
`Enzymes
`HaeIII and HincII
`were
`purchased from New England Biolabs, and AluI, HhaI and HinfI
`from
`Bethesda
`Research
`Laboratories; they were used as suggested by the suppliers.
`Polyacrylamide
`electrophoresis
`gel
`of
`restriction
`fragments
`for
`preparative
`or
`analytical
`purposes was carried out in 20 x 40 cm gels made
`with 6% acrylamide, 0.2% methylene bisacrylamide,
`12% glycerol,
`in
`running
`buffer.
`Running
`buffer was 50 mM Tris borate, 1 mM EDTA (TBE).
`For strand
`4%
`separation
`the
`gel
`consisted
`of
`acrylamide
`plus
`0.14%
`methylene
`bisacrylamide
`in
`the
`running
`buffer,
`which
`36
`was
`mM Tris base, 30 mM
`NaH2PO4, 1 mM EDTA.
`Samples for 6%
`gels
`were
`loaded
`in
`restriction
`the
`¼4 volume
`buffer,
`diluted
`necessary,
`if
`plus
`(0.03%
`of
`dye
`solution
`bromphenol blue plus xylene cyanol in
`20%
`glycerol).
`Samples
`for
`strand
`separation were prepared in 90 pl of the same dye solution made up to 300 ll
`0.3 M NaOH, and heated at 370C for 3 minutes
`immediately
`before
`loading.
`After
`electrophoresis,
`DNA
`fragments
`were
`visualized by ethidium bromide
`UV
`staining and
`fluorescence,
`or,
`end-labelled,
`if
`autoradiography.
`by
`Elution of
`DNA
`fragments
`was
`carried
`out
`as
`in ref. (10),
`except that
`
`3306
`
`Genzyme Ex. 1033, pg 872
`
`
`
`Nucleic Acids Research
`
`incubation of the crushed gel in elution solution was for 2 days at 420C.
`
`DNA Sequence Analysis
`
`Restriction fragments purified from digests of 150-300 pg
`the
`of
`6-kb
`pH21-1 were treated with bacterial alkaline phosphatase (Worthington
`plasmid
`370C.
`Biochemical, grade f) in 10 mM Tris-HCl
`pH 8.0,
`for
`30
`minutes
`at
`phenol-extracted
`thrice,
`ether-extracted
`twice,
`This
`mixture
`was
`ethanol-precipitated, redissolved in 5 mM Tris pH 9.5, 0.1 mM spermidine, and
`0.01 mM EDTA.Na3, and then denatured by boiling.
`32P
`5' end labelling with
`as in ref. (10), using Tris-HCl rather than sodium glycine
`carried
`was
`out
`T4 polynucleotide kinase was purchased from PL Biochemicals,
`as buffer.
`and
`32P
`(ICN Pharmaceuticals)
`(10).
`ethanol
`After
`made
`from
`y P-ATP
`was
`i
`strand
`by
`separated
`labelled
`ends
`of
`the
`precipitation, the
`DNA were
`separation, or by another restriction cleavage and electrophoretic separation
`of the fragments.
`we used four of the base-specific cleavage reactions of
`sequencing
`For
`Maxam and Gilbert (10), entitled G>A, A>C, T+C,
`and
`fifth
`C.
`A
`reaction
`(A. Maxam, personal communication).
`cleaving at A+G was performed as follows
`End-labelled
`and 1 pg carrier DNA were made up to 30 p1 in 17 mM sodium
`DNA
`900C.
`citrate pH 4.0 and heated for 10 minutes
`2 p1
`1 M NaOH was
`of
`at
`added
`and
`a capillary and heated for a further
`sealed
`30
`the mixture
`in
`minutes at 900C. 20 p1 of urea-dye mixture (10)
`added
`was
`then
`and
`the
`sample was ready for loading on a ladder gel.
`Ladder gels (20% acrylamide, 0.7% methylene bisacrylamide, 7 M urea
`in
`TBE)
`loaded and run as in ref. (10).
`In our later runs we used
`were
`made,
`thin gels (11), of thickness 0.32 mm instead of the regular 1.6 mm, and found
`considerably improved resolution of bands.
`Ladder gels were autoradiographed
`at -70 C on Cronex 4 X-ray film with Dupont Hi-plus intensification screens.
`
`Secondary Structure Prediction
`
`The most stable secondary structure
`represented
`RNA
`the
`for
`the
`(12).
`was predicted using the computer program of Studnicka et al.
`sequence
`This program will examine a large number of
`possible regions of base pairing
`find
`combination of regions which forms the most stable structure.
`that
`to
`The program
`begins
`by
`cataloguing
`all
`possible
`regions
`of
`2
`or
`more
`5231 regions in this "primary region
`consecutive
`base
`pairs.
`There
`were
`prohibitively
`catalogue" for the sequence
`It would
`be
`considered
`here.
`of these regions in a single computation cycle.
`expensive
`consider
`all
`to
`
`by
`
`3307
`
`Genzyme Ex. 1033, pg 873
`
`
`
`Nucleic Acids Research
`
`Therefore we rank the regions
`the computation
`and
`carry
`out
`in
`several
`The ranking uses a weighting function which is the sum of the energy
`cycles.
`divided
`by
`the square root of its length, and the
`of
`itself,
`region
`the
`energy of the best "local structure" which can be obtained by
`combining
`the
`region with all neighboring regions which are separated from it by less than
`10 nucleotides on either strand (W. Salser and L. Nagy,
`preparation ).
`in
`"branch
`migration" procedure is
`Where two primary regions would overlap,
`a
`used
`to
`determine
`most stable non-overlapping combination of parts of
`the
`the two primary regions.
`150
`The
`regions
`with the most favorable weighting factors were chosen
`for the first cycle and the 100 most stable
`regions
`combinations
`these
`of
`computed,
`were
`the
`energies being calculated using the rules given in ref.
`13.
`All these structures shared certain features which permitted us to break
`up the computation into three smaller jobs for the second
`cycle.
`In
`this
`second cycle
`all regions
`down
`to a weighting factor of 39 were
`considered
`(the
`regions).
`equivalent
`900
`of
`the
`top
`5231
`of
`original
`the
`The
`alternative
`structures for the second cycle were in turn examined for common
`features;
`these allowed us to subdivide the sequence into eight jobs for the
`final
`cycle, in which all regions of two or more base pairs were considered.
`In theory
`it
`would
`be
`possible
`to
`improve
`the
`structure
`slightly
`by
`considering
`single
`G-C
`pairs.
`For example, according to our base-pairing
`rules (13), the structure
`5' CUUC-GU
`is more
`stable
`than
`the
`computed
`3' GA-GUCA
`by 2 kcal.
`This additional refinement of the structure
`
`structure
`
`5' CUUCGU
`3' GAGUCA
`was not performed.
`
`Biosafety Precautions
`
`P3 physical containment was used throughout for
`growth
`of
`transformed
`bacteria.
`The
`initial isolation of pH21-1 had been carried out in E.
`coli
`X1849, an EK1 host, in compliance with the Asilomar Guidelines in
`effect
`at
`When the NIH Guidelines (14) were issued, pH21-1 was transferred
`that
`time.
`X1776,
`to
`E. coli
`an
`EK2
`host,
`and
`all
`subsequent
`experiments
`were
`conducted in accordance with those Guidelines.
`
`3308
`
`Genzyme Ex. 1033, pg 874
`
`
`
`Nucleic Acids Research
`
`RESULTS
`
`Restriction Analysis
`
`were tested in parallel digests of pH21-1
`restriction
`Various
`enzymes
`and pMB9 DNA.
`Since the mouse sequence was inserted at the single EcoRI site
`in pMB9, it was expected that comparison of the digests would show
`band
`one
`replaced by one or more bands unique to pH21-1.
`pMB9
`unique
`which
`to
`was
`This was found to be the case,
`tested
`and
`all
`the
`enzymes
`indicated
`an
`inserted segment of about 560 bp in length.
`Mapping was helped by the observation that each pH21-1 digest
`exhibited
`Consistent values for the size
`one pair of submolar bands not shown by pMB9.
`doublet was considered to
`of
`could
`deduced
`only
`be
`if
`the
`insert
`the
`probably
`fragment
`for each
`represent a
`single
`enzyme.
`restriction
`It
`44-bp
`resulted from
`single
`approx.
`deletion
`an
`at
`a
`in
`site
`a minor
`population of the plasmid DNA, since on preparative gels the larger band
`was
`homogeneous,
`smaller band was seen to be only one of up to seven
`while
`the
`(Figure
`1).
`equally spaced bands, the others being
`faint
`such
`Since
`no
`seen in pMB9, deletions of various sizes probably occurred
`heterogeneity
`is
`in the mouse insert or the A.T joints. There is
`repetitive
`no
`sequence
`in
`below),
`(see
`the
`which
`mouse
`insert
`and
`could
`account
`for
`this
`the
`restriction fragment affected always contained the left-hand A.T joint, which
`therefore could be the site of the deletions.
`possible
`to locate some of the restriction sites in the insert
`was
`It
`
`__g5F,+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~.;:.. ..'........9t
`
`_
`
`625
`520
`
`434
`398
`
`260
`
`pg),
`1.
`(300
`Preparative
`Figure
`pH21-1
`TaqI-digested
`of
`gel
`showing
`heterogeneity in one band.
`The uppermost and brightest member of the set
`is
`372
`approximately
`bp in length.
`The prominent doublets above and below the
`set are fragments from the pMB9 parts of the plasmid.
`The side lane contains
`HaeIII-digested pMB9, with fragment sizes marked in basepairs.
`
`3309
`
`Genzyme Ex. 1033, pg 875
`
`
`
`Nucleic Acids Research
`
`without further digests using the positions given by Maniatis et al. (2)
`and
`I. Cummings (unpublished) for the
`restriction
`pMB9
`sites
`in
`the
`nearest
`EcoRI
`pH21-1-specific
`seen in MboII
`site.
`Thus
`of
`the
`four
`fragments
`digests, the 1100-bp fragment must cover the right-hand insert-pMB9 junction,
`the 460-bp fragment with its 420-bp minor band
`must
`left-hand
`cover
`the
`junction,
`and
`the 116-bp and 75-bp fragments must be internal.
`Knowing the
`positions of some sites
`and
`the
`general
`location
`deletion-prone
`of
`the
`region,
`it
`then
`to locate most of the TaqI, HaeIII and AluI
`possible
`was
`sites from the single-enzyme digests.
`The map was refined by
`isolating
`the
`pH21-1-specific TaqI fragments and digesting them with HaeIII, AluI, and
`two
`HaeIII plus MboII.
`The resulting map was confirmed by subsequent
`sequencing
`discovery of an extra MboII site near the middle.
`except for
`the
`The final
`map is given in Figure 2.
`
`W
`
`585-nucleotide insertion
`
`al
`~E" 0 P4
`
`- vF21
`
`a
`
`0)
`
`~4
`0
`
`164
`
`"
`
`co
`
`i(
`)-' -"
`93
`79
`32 128
`1U 71
`AT
`joint \0 o o
`4-- ,.---,4-
`
`CJC
`
`92
`
`1241
`
`41
`
`34
`
`n
`
`q
`
`- AT'
`joint
`
`4-r-
`
`I vE
`4-- -m
`co ~B -
`0
`51~ ~~~0 ~0 ~z
`cliWnosoo to
`)sts
`
`2.
`Figure
`Restriction
`map of the insert and surrounding region in pH21-1.
`Top line: Restriction map of the pMB9 sequences around
`the
`EcoRI
`site,
`as
`deduced
`from analysis of the insert-containing HhaI fragment of pH21-1.
`All
`sites for the enzymes indicated are shown. Distances between the sites are in
`nucleotides.
`Middle line: Restriction map of the sequences inserted into the
`EcoRI
`site
`pMB9.
`of
`All
`sites
`for
`the
`enzymes
`indicated
`are shown.
`Distances between sites are in
`nucleotides.
`The
`numbers
`below
`the
`line
`denote the codons in or after which the cuts are made.
`Arrows indicate where
`sequences
`were
`obtained.
`We
`did
`not
`sequence
`continuously
`across all
`restriction sites, but the continuity of the coding sequence implies that
`no
`nucleotides
`were
`omitted.
`Bottom line: Additional sites inferred from the
`nucleotide sequence, with the numbers of the codons
`at
`which
`the
`cuts are
`MnlI and
`expected to be made.
`HphI,
`like
`MboII, make cuts offset from the
`recognition sequences by 5-10 nucleotides.
`There are
`EcoRI,
`no
`sites
`for
`BamHI, PstI, HindIII, HhaI, HinfI, HpaII, or HaeII in the insert.
`
`3310
`
`Genzyme Ex. 1033, pg 876
`
`
`
`':)t./*
`t'*.:.
`r..,.
`*i.wi..
`..F.-Ajiik
`upikA.i.§a...:
`*w....SF......w.,,_,
`.:..=,,,.4
`**.*...:F
`,,-..
`_..
`s.sw
`s_
`._._ '|-w :ww^w.' X * w
`
`Nucleic Acids Research
`
`preparing the HhaI fragment containing the insert for sequencing,
`While
`we did further restriction analysis in its outer parts, so as to
`extend
`the
`Since parallel digests of whole
`(Figure 2).
`restriction map for pMB9
`known
`pH21-1 and pMB9 never showed any differences in bands not covering the
`EcoRI
`site, this map is believed to represent the "wild-type" pMB9 structure.
`
`DNA Sequence Analysis
`
`restriction sites used for Maxam-Gilbert sequence analysis (10) are
`The
`Representative 'ladders' are shown in Figure
`3,
`and
`indicated in Figure 2.
`It covers 459 nucleotides between
`complete DNA sequence is in Figure 4.
`the
`amino acids
`439
`of
`287
`the A.T joints, and encodes
`the
`in
`sequence
`to
`This includes half of the CH2 domain (amino acids 228-334)
`Adetugbo (15,16).
`.~~~~~~~~~
`
`b) G GA A TC (
`
`._
`
`G GA A TC
`
`C
`
`. _ _ _ _
`
`_ : ..
`
`*s wF vi*Si &
`
`& ^ ...
`
`iX * -¢.$..3:;
`
`G GA A TC
`
`C
`
`_
`
`...
`
`::...:
`
`.:
`
`*::
`..
`
`A
`
`*-
`\
`
`,.
`
`X
`
`,
`
`4
`
`a:..
`
`A :.i
`
`*:
`
`-
`
`*
`
`.:gR.
`
`...
`
`.t
`
`- - ; :
`
`_ *_
`_ _
`
`** *
`
`w-
`
`_
`
`..
`
`...
`
`..
`
`...t *
`
`*
`
`...
`
`....~~~~~~~~~~~~~~~~~~~~~~~~~~~~
`S
`I
`
`.._.
`
`..
`
`.sc
`
`.*
`
`.... :.
`*::-.-::
`
`*-
`
`,az
`
`re
`
`t
`
`_ .:
`-
`
`"Ladders" showing sequences from pH21-1.
`Sequences are read
`from
`Figure 3.
`Asterisks indicate codons differing
`are numbered.
`Codons
`bottom
`top.
`to
`(a) Complementary strand with 5'
`from the reported amino acid sequence (16).
`(b)
`Coding
`3' cut with TaqI; thin gel.
`label at HaeIII site at codon 353,
`strand covering the same region, with 5' label at TaqI site at codon 326,
`3'
`(c)
`Coding strand with 5' label at MboII
`cut with MboII; regular gel.
`site
`3' cut with AluI; thin gel.
`at codon 373,
`
`3311
`
`Genzyme Ex. 1033, pg 877
`
`
`
`Nucleic Acids Research
`
`287
`Glu Glu Gln Phe
`5'-(T) -GAG GAG CAG TTC
`n
`
`291
`Asn Ser Thr Phe
`AAC AGC ACT TTC
`
`Arg Ser Val Ser
`CGC TCA GTC AGT
`
`*
`Glu Leu
`GAA CTT
`40
`
`301
`Pro Ile Met His Gln Asp Trp Leu Asn Gly
`CCC ATC ATG CAC CAA GAC TGG CTC AAT GGC
`60
`
`311
`Lys Glu Phe Lys Cys
`AAG GAG TTC AAA TGC
`80
`
`Arg Val Asn Ser Ala
`AGG GTC AAC AGT GCA
`100
`HincII
`
`321
`Ala Phe Pro
`GCT TTC CCT
`AluI
`
`331
`Ala Pro Ile Glu Lys Thr Ile Ser Lys Thr Lys Gly
`GCC CCC ATC GAG AAA ACC ATC TCC AAA ACC AAA GGC
`140
`TaqI
`
`*
`*
`Arg Pro Lys Ala Pro
`AGA CCG AAG GCT CCA
`160
`
`341
`Gln
`CAG
`
`Val Tyr Thr Ile Pro Pro Pro Lys Glu
`GTG TAC ACC ATT CCA CCT CCC AAG GAG
`180
`
`Asp Phe Phe Pro Glu Asp
`GAC TTC TTC CCT GAA GAC
`MboII
`MboII
`
`351
`Gln Met Ala Lys
`CAG ATG GCC AAG
`HaeIII
`
`371
`Ile Thr Val Glu
`ATT ACT GTG GAG
`260
`
`Asp Lys Val Ser Leu Thr
`GAT AAA GTC AGT CTG ACC
`220
`
`*
`*
`Trp Gln Trp Asn
`TGG CAG TGG AAT
`
`Gly Gln
`GGG CAG
`280
`
`361
`Cys Met Ile Thr
`TGC ATG ATA ACA
`230
`*
`383
`*
`Pro Ala Glu Asn
`CCA GCG GAG AAC
`
`391
`Tyr Lys Asn Thr Gln Pro Ile Met Asp Thr Asp Gly Ser Tyr Phe Val
`TAC AAG AAC ACT CAG CGC ATC ATG GAC ACA GAT GGC TCT TAC TTC GTC
`300
`320
`340
`
`401
`Tyr
`TAC
`
`421
`Val
`GTG
`
`Ser Lys Leu Asn
`AGC AAG CTC AAT
`AluI
`
`411
`Val Gln Lys Ser Asn Trp Glu Ala Gly
`GTG CAG AAG AGC AAC TGG GAG GCA GGA
`360
`380
`MboII
`
`Asn Thr Phe Thr Cys Ser
`AAT ACT TTC ACC TGC TCT
`400
`
`431
`Leu His Glu Gly Leu His Asn His His Thr Glu Lys Ser
`TTA CAT GAG GGC CTG CAC AAC CAC CAT ACT GAG AAG AGC
`430
`HaeIII
`MboII
`
`Leu Ser His Ser
`CTC TCC CAC TCT
`
`439
`Pro
`CCT-(A)-35'
`459n
`
`Nucleotide sequence of the cDNA insert in pH21-1.
`Only the coding
`Figure 4.
`strand is shown, with the encoded amino acid sequence above.
`Positions where
`(16)
`by
`marked
`this differs from the
`reported
`amino
`acid
`are
`sequence
`Additional
`underlining
`asterisks.
`Restriction
`sites
`underlined.
`are
`indicates less certain parts of the sequence.
`
`and
`
`all
`
`of
`
`the
`
`CH3
`
`domain
`
`(amino acids
`
`335-440)
`
`except the C-terminal
`
`glycine.
`30
`first
`of the sequence, and particularly the first
`nucleotides
`The
`ten, are not entirely certain; they were
`far
`from
`useful
`sites and could only be read knowing the amino acid sequence.
`The individual nucleotides in the sequences of the A.T joints were only
`partially resolved, but measurements of the complete poly-A blocks on ladders
`indicated lengths of
`76 +15
`bp on the left and 50 +15 bp on the right.
`
`any
`
`restriction
`
`The
`
`left-hand poly-A tail contains at least two
`transversions in A.T joints have been observed previously (2).
`
`T
`
`residues.
`
`Similar
`
`A -> T
`
`3312
`
`Genzyme Ex. 1033, pg 878
`
`
`
`Nucleic Acids Research
`
`DISCUSSION
`
`Amino Acid Substitutions
`
`published amino
`the
`The DNA sequence implies several substitutions in
`(15,16), as listed in Table I.
`In all but the first, the DNA
`acid
`sequence
`possible
`for
`these
`certain.
`There
`three
`sequence
`appears
`sources
`are
`discrepancies.
`(1) In vivo variation.
`of
`It is conceivable that since the separation
`the
`line used to make the cDNA from the MOPC21 (P3K) line used for
`MOPC21
`tumor
`protein sequencing in Cambridge, one or both
`accumulated
`lines
`had
`either
`switched to expression of a different y chain gene.
`'somatic'
`mutations
`or
`Somatic mutations have been documented in subclones
`of
`the
`Cambridge
`line
`(17-20),
`gene expression have been induced in another
`and
`switches
`in
`y2
`myeloma line in vitro (21); serological evidence has
`presented
`been
`for
`a
`second yl gene in
`(22).
`events are detected only
`these
`mice
`However,
`in
`single cloned cells, not in the cell population as a whole.
`Moreover,
`the
`pH21-1 with other y chains (see below) makes such an
`decreased
`homology
`of
`explanation unlikely.
`(2)
`8
`suggested
`DNA cloning errors.
`the fact that 5 of the
`by
`is
`This
`substitutions would abolish amino acid identities with other y chains
`(Table
`I).
`In particular, valine-296 is conserved in every y, a, and c chain known,
`
`TABLE I.
`
`Substitutions in yl amino acid sequence
`
`Position
`
`DNA sequence:
`
`AA sequence:
`
`Other AA sequences:
`
`Codon
`
`AA
`
`Codon
`
`AA
`
`Mouse Human Rabbit G-pig
`y2a
`yl
`y2
`y
`
`296
`299
`336
`338
`377
`378
`381
`382
`
`*(TCA)
`GAA
`AGA
`AAG
`TGG
`AAT
`CCA
`GCG
`
`(S)
`E
`R
`K
`W
`N
`P
`A
`
`V
`GTN
`A
`GCN
`K
`AAR
`R
`AGR
`TCN/AGY S
`GAY
`D
`GCN
`A
`P
`CCN
`
`V
`A
`S
`R
`N
`N
`T
`E
`
`V
`V
`Q
`R
`S
`N
`E
`P
`
`V
`T
`E
`L
`K
`D
`A
`E
`
`V
`V
`A
`R
`S
`N
`P
`V+-S+-E
`
`Numbering and mouse yl amino acid sequence are from ref. (16). Other amino
`AA = amino acid; + = insertion;
`acid sequences are from refs. (15, 25).
`* = DNA sequence uncertain.
`
`3313
`
`Genzyme Ex. 1033, pg 879
`
`
`
`Nucleic Acids Research
`
`Our DNA sequence is uncertain at that
`
`although not in a p chain (15, 23-28).
`point but appears not to encode valine.
`(3)
`substitutions
`of
`4
`the 8
`involve
`Protein sequencing errors.
`the
`involves
`interchange of pairs of nearby amino acids;
`acid-to-amide
`one
`an
`one (codon 377) would replace a serine which was not definitely
`and
`change;
`present in the peptide sequenced (15) with a tryptophan which could have been
`present in greater molarity than estimated (15).
`the pH21-1 sequence was completed, the corresponding chromosomal
`Since
`yl gene has been cloned and partially sequenced (29).
`exchange between
`The
`positions 336 and 338 is confirmed, and additional exchanges are found in the
`CH1
`domain
`(29).
`Both the nature of the "substitutions" in pH21-1, and the
`presence of the same and similar substitutions in the chromosomal gene, imply
`analyses
`that most of them represent errors in protein sequencing.
`the
`In
`that follow, the DNA sequence is taken to be correct unless otherwise stated.
`
`Comparison with Previous Sequence Data
`
`Some sequence data on the MOPC21 yl mRNA has already been published
`in
`form of a ribonuclease TI oligonucleotide catalogue (30).
`the
`In that work,
`the Ti oligonucleotides were not sequenced,
`but secondary digestion products
`were aligned so that where possible they matched the amino acid sequence.
`Of
`the
`ten
`listed
`oligonucleotides
`the region we have sequenced, all but
`in
`Two of the three (h6 and h29) were located in the right
`three are confirmed.
`places but the true nucleotide sequence is a
`permutation
`of
`the
`suggested
`one; our sequence is equally consistent with the secondary digestion products
`(30).
`tabulated
`by
`Cowan
`et
`al.
`show
`Our
`data
`that
`the
`third
`oligonucleotide, h18,
`does not derive from the place that was
`suggested
`by
`Cowan et al.
`Adetugbo and Milstein (20) have also inferred the
`mRNA
`from
`sequence
`341
`codon
`through 355 from the amino acid sequence of the MOPC21 frameshift
`analysis.
`by
`Their predicted sequence is confirmed
`mutant IF3.
`They
`our
`suggested (17-19) that the premature termination mutant IF1 contained a
`also
`nonsense mutation at serine-358.
`find
`codon
`be
`AGU,
`this
`Since
`we
`to
`however, the mutation cannot be a single base substitution.
`It could perhaps
`U before codon 358, or a 4-base deletion including it,
`be
`insertion
`an
`of
`either of which would create a termination codon in the appropriate position.
`
`Codon Usage
`
`As
`
`in
`
`other
`
`eukaryotic
`
`messengers,
`
`the
`
`pattern
`
`of
`
`codon usage is
`
`3314
`
`Genzyme Ex. 1033, pg 880
`
`
`
`Nucleic Acids Research
`
`clearly
`preferred
`Tables
`and
`nonrandom, as shown in
`is
`C
`III.
`II
`in
`and G is the next most abundant.
`redundant
`Codons for glutamine
`positions,
`and glutamic acid show particular preferences for G over A.
`distribution can be compared with that in the mouse immunoglobulin
`This
`CK domain (31) (Table III), which also shows a preference for C although
`the
`other bases are used equally. The coding sequences for hemoglobin a and B
`in
`the rabbit, which like immunoglobulin Cy and CK are believed to have diverged
`vertebrates (41,42), also share some but
`near to the time of origin
`of
`the
`not all anomalies in individual codon usage (Table III and (13)),
`suggesting
`long
`generally
`conserved
`anomalies
`that
`such
`be
`may
`of
`spans
`over
`evolutionary time.
`the most general features of codon usage,
`III
`Table
`shows
`also
`that
`codons,
`preferences for or against given bases in the third position of
`are
`shared
`by
`The genes for immunoglobulin C
`large
`animal
`of
`groups
`genes.
`regions, hemoglobins, peptide hormones, and histones all fall into
`I,
`Group
`which has high frequency of C, moderate to high G, moderate to low U, and low
`immunoglobulin V-region and for ovalbumin fall into Group
`for
`A.
`Genes
`an
`SV40
`II, with uniform usage except for a mild deficiency of G.
`The genes of
`are the only known representatives of Group III; they all have high U and low
`The significance of these
`C.
`unknown,
`patterns
`is
`although
`of
`analysis
`hemoglobin a usage (43) suggested a correlation
`with the relative abundances
`of tRNAs.
`
`TABLE II.
`
`Codon usage.
`
`Phe
`
`Leu
`
`Leu
`
`Ile
`
`Met
`Val
`
`UUU
`UUC
`UUA
`UUG
`CUU
`CUC
`CUA
`CUG
`AUU
`AUC
`AUA
`AUG
`GUU
`GUC
`GUA
`GUG
`
`0
`8
`1
`0
`1
`3
`0
`2
`2
`4
`1
`4
`0
`4
`0
`4
`
`Ser
`
`Pro
`
`Thr
`
`Ala
`
`UCU
`UCC
`UCA
`UCG
`CCU
`CCC
`CCA
`CCG
`ACU
`ACC
`ACA
`ACG
`GCU
`GCC
`GCA
`GCG
`
`3
`2
`1
`0
`4
`4
`3
`1
`5
`5
`2
`0
`2
`2
`2
`1
`
`Tyr
`
`0
`UAU
`4
`UAC
`0
`STOP UAA
`0
`UAG
`2
`CAU
`4
`CAC
`1
`CAA
`7
`CAG
`4
`AAU
`6
`AAC
`5
`AAA
`8
`AAG
`2
`GAU
`GAC
`4
`2
`GAA
`GAG 10
`
`His
`
`Gln
`
`Asn
`
`Lys
`
`Asp
`
`Glu
`
`Cys
`
`UGU
`UGC
`STOP UGA
`UGG
`Trp
`CGU
`Arg
`CGC
`CGA
`CGG
`AGU
`AGC
`AGA
`AGG
`GGU
`GGC
`GGA
`GGG
`
`Ser
`
`Arg
`
`Gly
`
`0
`3
`0
`4
`0
`1
`0
`0
`3
`4
`1
`1
`0
`4
`1
`1
`
`3315
`
`Genzyme Ex. 1033, pg 881
`
`
`
`Nucleic Acids Research
`
`TABLE III.
`
`Frequencies of bases at third-base positions.
`
`(a) Mouse immunoglobulin yl (pH21-1)
`
`Observed
`Expected for
`uniform usage
`Codon usage index
`
`(b)
`
`Other animal genes
`
`U
`
`C
`
`A
`
`G
`
`28
`38.09
`
`62
`38.09
`
`20
`35.59
`
`43
`41.26
`
`total
`
`153
`153.03
`
`0.74
`
`1.63
`
`0.56
`
`1.04
`
`U
`
`C
`
`A
`
`G
`
`total
`
`ref.
`
`0.74
`0.81
`0.37
`1.12
`0.91
`0.77
`0.39
`0.84
`
`1.63
`1.44
`1.99
`1.12
`1.43
`1.68
`1.70
`1.67
`
`0.56
`0.82
`0.16
`0.25
`0.39
`0.34
`0.45
`0.65
`
`1.04
`0.86
`1.41
`1.46
`1.30
`1.21
`1.41
`0.96
`
`1.23
`1.09
`
`1.11
`1.06
`
`1.10
`1.10
`
`0.58
`0.78
`
`153
`107
`141
`148
`98
`216
`168
`534
`
`128
`386
`
`1.52
`
`0.54
`
`1.18
`
`0.78
`
`1514
`
`-
`(31)
`(32)
`(33)
`(34)
`(35)
`(36)
`(37)
`
`(38)
`(39)
`
`(40)
`
`Group I
`Mouse Ig Cyl (P)
`Mouse Ig CK
`Rabbit Hb a
`Rabbit Hb $
`Rat insulin
`Rat GH
`Human CS
`Sea urchin
`histones
`Group II
`Mouse Ig VX
`Chicken Ov
`Group III
`SV40 all genes
`
`(P)
`(S)
`(P)
`
`(P)
`
`(S)
`
`The codon usage index is the frequency at which a base appears at the
`third position in codons, divided by the frequency at which it would
`appear if all the possible codons for each aminoacid were used uni-
`formly. Stop codons are not included. (P) = partial sequence, (S) =
`signal ("pre") sequence included. Ig = immunoglobulin, Hb = hemoglo-
`bin, GH = growth hormone, CS = chorionic somatomammotropin,
`Ov =
`ovalbumin.
`
`Base Composition and Dinucleotide Frequency
`
`base
`The
`composition and dinucleotide frequencies in the coding strand
`are shown in Table IV.
`There is a severe underabundance of the
`dinucleotide
`CG,
`comparable
`to
`in
`that
`total eukaryotic DNAs (44,45) and hemoglobin a
`genes (33,43,46).
`Demonstration that CG
`is
`not
`deficient
`in
`other
`some
`coding
`sequences,
`such as hemoglobin a genes (32,45), has ruled out earlier
`models in which it was proposed that
`ribosomes
`are
`unable to translate CG-
`rich
`sequences
`effectively.
`Subsequently, it has been suggested that CG in
`eukaryotes is a hotspot for mutation (13).
`The mechanism may be
`methylation
`
`3316
`
`Genzyme Ex. 1033, pg 882
`
`
`
`Nucleic Acids Research
`
`TABLE IV.
`
`Frequencies of nucleotides and dinucleotides.
`
`134 (29%)
`A
`88 (19%)
`T
`105 (23%)
`G
`132 (29%)
`C
`total 459 (100%)
`
`AA
`TA
`GA
`CA
`
`35
`10
`32
`57
`
`AT
`TT
`GT
`CT
`
`20
`16
`17
`35
`
`AC
`44
`35
`AG
`32
`TC
`29
`TG
`30
`GC
`26
`GG
`35
`CC
`5
`CG
`average 28.625
`
`(47)
`Coulondre et al.
`mutation,
`hotspots
`for
`
`the C followed by deamination to yield T (32,45).
`of
`have shown that methylated Cs in E. coli are indeed
`and suggested the same deamination mechanism.
`codon,
`Therefore, and since no amino acid is required to have CG in its
`asked whether the remaining CGs might be maintained by selective pressure
`we
`Cs
`encoded
`by
`these
`five
`amino
`acids
`on the nucleotide sequence.
`The
`(arg-295,
`phe-399) are conserved slightly more
`ile-326,
`pro-337,
`ala-382,
`than average in other y chains (15,25).
`One possible pressure for conserving
`five
`however,
`these
`CGs might be selection for an RNA secondary structure;
`preferentially into regions of strong base pairing in our
`do
`fall
`CGs
`not
`secondary structure prediction (below).
`
`Homologies between C 2 and C 3 domains
`H
`H
`-
`-
`-
`
`domains,
`The DNA sequence covers homologous parts of two immunoglobulin
`CH2 and CH3.
`first nucleotide sequence to cover regions which
`This
`is
`the
`gene,
`so
`are believed to have evolved by tandem duplication of an ancestral
`the sequence for possible nucleotide homology between the
`examined
`have
`we
`Comparison of all known heavy chain sequences (15, 23-28) indicates
`domains.
`that the most probable alignment
`between
`codons
`287-290
`392-395,
`and
`is
`396-428,
`codons 325-334 and 430-439, although the
`292-324
`and
`and
`codons
`positions of the two deletions cannot be defined unequivocally.
`Within these
`regions, excluding codons opposite deletions and codons which differ from the
`sequence of Adetugbo (16), there are 45
`codons
`of
`which
`9
`of
`pairs
`are
`With such low amino acid homology, reflecting
`identical
`in the two domains.
`the very ancient divergence of the CH2 and
`nucleotide
`little
`domains,
`CH3
`would be expected, particularly since the conserved amino acids are
`homology
`(This is implied by the fact that 15 of
`the
`probably selected for function.
`18 amino acids in conserved positions are also conserved in at least 3 of the
`4 heavy chain classes now sequenced (y, ai, c, p), compared to only 24% of the
`addition, most of those
`compared.
`acids
`In
`regions
`the
`in
`other
`amino
`conserved in both CH2 and CH3 are also conserved in the CH1 domain.)
`
`3317
`
`Genzyme Ex. 1033, pg 883
`
`
`
`Nucleic Acids Research
`
`sequence homology can be found between
`nucleotide
`little
`Indeed,
`CH2
`and
`In the 36
`CH3.
`codons
`of
`pairs
`residues,
`for
`nonidentical
`36/108
`nucleotides
`are identical; the value expected by chance is 27/108.
`In the 9
`codons
`residues,
`omitting
`of
`for
`identical
`nucleotides
`pairs
`uniquely
`specified
`by
`requirement,
`coding
`the
`6/11 nucleotides are identical; the
`5/11.
`slight
`value expected
`The
`by
`is
`chance
`chance
`is
`excess
`over
`for
`selection,
`conservation of chemical (and thus coding)
`attributable
`to
`similarities in amino acid replacements, and for C in third-base positions.
`
`Secondary structure
`
`predict
`There is usually little point in attempting
`secondary
`the
`to
`a fragment of an RNA, because long-range interactions may make
`of
`structure
`the most stable folding of the complete RNA very different from that
`of
`the
`(13).
`interest to predict the secondary
`seemed
`fragment
`of
`However,
`it
`structure of the pH21-1 sequence, in order to find
`domain
`whether
`the
`out
`structure of the protein might be reflected in the structure of the RNA.
`One
`that the CH3 domain would fold up into a structure
`anticipate
`might
`either
`independent of CH2, or that there might be prominent
`structure around
`local
`the junction between them.
`The computed secondary structure is presented in Figure 5.
`overall
`The
`kcal/nucleotide.
`stability
`0.345
`0.331
`is
`the
`than
`This
`is
`more
`kcal/nucleotide computed for the complete rabbit 6 globin mRNA using
`less
`a
`computer program (13) and less than the 0.407 kcal/nucleotide found
`powerful
`for the rabbit a globin mRNA using some of the same
`computer
`programs
`used
`here (32).
`Examination of each part of the structure separately did not turn
`up
`any
`regions of local low stability of the sort which might indicate that
`the local sequence paired with other portions of this mRNA.
`The most striking
`feature of the structure is the prominent stem formed
`from nucleotides 46-65 and 210-230.
`The CH2-CH3 junction is approximately in
`the center of the loop formed by this stem.
`The junction itself has now been
`defined by an RNA splice point in codon 335 (29) as shown in
`If
`Figure
`5.
`portion
`this
`of the structure is correct, the fact that the splice point is
`in a region of little secondary structure bounded by
`very
`the
`strong
`stem
`would
`limit the possible roles of secondary structure in directing splicing.
`That secondary structure does have an important role is suggested by the fact
`that the loosely conserved primary sequence common to all splice points
`(48,
`49)
`is
`small
`too
`give
`to
`the
`specificity needed.
`One can imagine that
`secondary structure sequesters some potential splicing sequences so that they
`
`3318
`
`Genzyme Ex. 1033, pg 884
`
`
`
`Nucleic Acids Research
`
`structure of the yl messenger RNA fragment,
`secondary
`Computed
`Figure 5.
`codons 287-439.
`for
`equally
`stable
`structure
`the
`shows
`an
`inset
`The
`The arrow marks the position of the RNA
`beginning
`and
`end of the sequence.
`domains (29).
`splice between the
`CH2
`energy
`of
`the
`and
`CH3
`The
`total
`structure (13) is -158.5 kcal.
`
`are unavailable for
`and
`splicing,
`reasonably
`close
`into
`others
`brings
`in a way that facilitates the correct splicing.
`Since we have not
`proximity
`examined the intervening sequence itself, it also remains possible
`that
`the
`actual precursor molecule contributes in a more
`secondary structure
`of
`the
`Clearly more work will be required
`definite way to the splicing specificity.
`to determine the exact role of secondary structure in RNA splicing.
`
`ACKNOWLEDGEMENTS
`
`R. Wall for communicating data on the identification
`We are grateful to
`kindly gave us pH21-1 DNA,
`and R.
`of pH21-1 by hybridization.
`D.B. DeBorde
`L. Nagy for
`particularly
`thank
`prepared
`additional
`supplies.
`We
`Hammen
`J.R. acknowledges
`running the secondary structure cycles on the computer.
`a
`Regents' Fellowship of the University of California and an
`assistantship
`at
`The project was supported by the American
`Biology Institute.
`Molecular
`t