`
`MOLECULAR BIOLOGY OF
`
`THE CELL
`
`THIRD EDITION
`
`Bruce Alberts 0 Dennis Bray
`Julian Lewis 0 Martin Raff - Keith Roberts
`James D. Watson
`
`Garland Publishing, Inc.
`NewYork 84 London
`
`GeneDX 1025, pg. 1
`
`GeneDX 1025, pg. 1
`
`
`
`Text Editor: Miranda Robertson
`
`Managing Editor: Ruth Adams
`Illustrator: Nigel Orme
`Molecular Model Drawings: Kate Heskelh—Moore
`Director of Electronic Publishing: John M-Roblin
`Computer Specialist: Chuck Bartelt
`Disk Preparation: Carol Winter
`Copy Editor: Shirley M. Cohen
`Production Editor: Douglas Goettzen
`Production Coordinator: Perry Bessas
`Indexer: Maija Hinlde
`
`BruceAlberrs received his PhD. from Harvard University and is
`currently President of the National Academy of Sciences and Professor
`of Biochemistry and Biophysics at the University of California, San
`Francisco. Dennis Bray received his Ph.D. from the Massachusetts
`Institute of Technology and is currently a Medical Research Council
`Fellow in the Department of Zoology, University of Cambridge.
`Julian Lewis received his D.Phil. from the University of Oxford and is
`currently a Senior Scientist in the Imperial Cancer Research Fund
`Developmental Biology Unit, University of Oxford. Martin Rafireceived
`his MD. from McGill University and is currently a Professor in the MRC
`Laboratory for Molecular Cell Biology and the Biology Department,
`University College London. Keith Roberts received his PhD. from the
`University of Cambridge and is currently Head of the Department of Cell
`Biology, the John Innes Institute, Norwich. James D. Watson received his
`PhD. from Indiana University and is currently Director of the Cold Spring
`Harbor Laboratory. He is the author of Molecular Biology of the Gene and,
`with Francis Crick and Maurice Wilkins. won the Nobel Prize in Medicine
`and Physiology in 1962.
`
`© 1983, 1989, 1994 by Bruce Alberts, Dennis Bray, Julian Lewis,
`Martin Raff, Keith Roberts, and James D. Watson.
`
`All rights reserved. No part of this book covered by the copyright hereon
`may be reproduced or used in any form or by any means—graphic,
`electronic, or mechanical, including photocopying, recording, taping, or
`information storage and retrieval systems—without permission of the
`publisher.
`
`Library of Congress Catalogirigein-Publication Data
`Molecular biology of the cell / Bruce Alberta .
`.
`. [et al.].—3rd ed.
`p.
`cm.
`Includes bibliographical references and index.
`ISBN 0-8153—1619—4 (hard Govern—ISBN 0-8l53-l 620—8 (phlo)
`1. Cytology. 2. Molecular biology. I. Alberts, Bruce.
`[DNLMz 1. Cells. 2. Molecular Biology. QH 581.2 M718 1994]
`QH581.2.M64 1994
`574.87‘dc20
`DNLM/ DLC
`
`for Library of Congress
`
`Published by Garland Publishing, Inc.
`717 Fifth Avenue, New York, NY 10022
`
`Printed in the United States of America
`1514131210987654
`
`Front cover: The photograph shows a rat nerve cell
`in culture. It is labeled ( yellow) with a fluorescent
`antibody that stains its cell body and dendritic
`processes. Nerve terminals ( green) from other
`neurons (not visible), which have made synapses on
`the cell, are labeled with a different antibody.
`(Courtesy of Olaf Mundigl and Pietro de Cainilli.)
`Dedication page: Gavin Borden, late president
`of Garland Publishing, weathered in during his
`mid-19805 climb near Mount McKinley with
`MBoC author Bruce Alberts and famous mountaineer
`guide Mugs Stump (1940—1992).
`Back cover: The authors, in alphabetical order,
`crossing Abbey Road in London on their way to lunch.
`Much of this third edition was written in a house just
`around the corner. (Photograph by Richard Olivier.)
`
`93—45907
`CIP
`
`
`
`GeneDX 1025, pg. 2
`
`GeneDX 1025, pg. 2
`
`
`
`
`
`Summary
`
`The sequence ofsubunits in a macromolecale contains information that determines
`the three—dimensional contours ofits surface. lhese contours in turn govern the rec-
`ognition between one molecule and another, or between difierent parts ofthe same
`molecule, by means of weak, noncovalent bonds. The attractive forces are offour
`types: ionic bonds, van der Waals attractions, hydrogen bonds, and an interaction
`between nonpolar groups caused by their hydrophobic expulsion from water. Two
`molecules will recognize each other by a process in which they meet by random dif-
`fusion, stick togetherfor a while, and then dissociate. The strength ofthis interaction
`is generally expressed in terms ofan equilibrium constant. Since the only way to make
`recognition infallible is to make the energy of binding infinitely large, living cells
`constantly make errors; those that are intolerable are corrected by specific repair
`processes.
`
`
`
`Genes Are Made of DNA 9
`
`It has been obvious for as long as humans have sown crops or raised animals that
`each seed or fertilized egg must contain a hidden plan, or design, for the devel-
`opment of the organism. In modern times the science of genetics grew up around
`the premise of imisible information-containing elements, called genes, that are
`distributed to each daughter cell when a cell divides. Therefore, before dividing,
`a cell has to make a copy of its genes in order to give a complete set to each
`daughter cell. The genes in the sperm and egg cells carry the hereditary informa—
`tion from one generation to the next.
`The inheritance of biological characteristics must involve patterns of atoms
`that follow the laws of physics and chemistry: in other words, genes must be
`formed from molecules. At first the nature of these molecules was hard to imag-
`ine. What kind of molecule could be stored in a cell and direct the activities of
`a developing organism and also be capable of accurate and almost unlimited
`replication?
`
`By the end of the nineteenth century biologists had recognized that the car-
`riers of inherited information were the chromosomes that become visible in the
`nucleus as a cell begins to divide. But the evidence that the deoxyribonucleic acid
`(DNA) in mese chromosomes is the substance of which genes are made came
`only much later, from studies on bacteria. In 1944 it was shown that adding pu—
`rified DNA from one strain of bacteria to a second, slightly different bacterial
`strain conferred heritable properties characteristic of the first strain upon the
`second. Because it had been commonly believed that only proteins have enough
`conformational complexity to carry genetic information. this discovery came as
`a surprise, and it was not generally accepted until the early 1950s Today Lhe idea
`that DNA carries genetic information in its long chain of nucleotides is so fun-
`damental to biological thought that it is sometimes difficult to realize the enor—
`mous intellectual gap that it filled.
`
`DNA Molecules Consist of Two Long Chains Held Together
`by Complementary Base Pairs 10
`
`l‘he difficulty that geneticists had in accepting DNA as the substance of genes is
`understandable, considering the simplicily of its chemistry. A DNA chain is a
`long, unbranched polymer composed of only four types of subunits. These are
`the deoxyribouucleotides containing the bases adenine (A). cytosine (C), guanine
`(G), and thymine (T). The nucleotides are linked together by covalent phospho-
`diester bonds that join the 5’ carbon of one deoxyribose group to Lhe 3’ carbon
`of the next (see Panel 2—6, pp. 58-59}. The four kinds of bases are attached to this
`
`98
`
`Chapter 3 : Macromolecules: Structure, Shape, and Information
`
`GeneDX 1025, pg. 3
`
`GeneDX 1025, pg. 3
`
`
`
`5‘ chain end
`
`0
`
`5' end
`down
`
`0\p4)
`\
`0
`
`double-
`helix axis
`
`0
`&P' O
`
`
`
`.
`-
`d
`phosphodiester
`
`5 Chain en
`bond
`hydrogen bond
`3. chain
`
`(Al
`
`(B)
`
`
`and
`
`
`repetitive sugar—phosphate Chain almost like four kinds of beads strung on a
`necklace
`How can a long Chain of nucleotides encode the instructions for an organ—
`ism or even a cell? And how can these messages be copied from one generation
`of cells to the next? The answers lie in the structure of the DNA molecule.
`Early in the 19505 x—ray diffraction analyses of specimens of DNA pulled into
`fibers suggested that the DNA molecule is a helical polymer composed of two
`strands. The helical structure of DNA was not surprising since, as we have seen,
`a helix will often form if each of the neighboring subunits in a polymer is regu-
`larly oriented. But the finding that DNA is two—stranded was of crucial signifi—
`cance. It provided the clue that led, in 1953, to the construction of a model that
`
`fitted the observed X-ray diffraction pattern and thereby solved the puzzle of DNA
`structure and function.
`An essential feature of the model was that all of the bases of the DNA mol-
`
`Eigur: 34s The DNA double helix.
`(A) A short section of the helix viewed
`from its side. Four complementary
`base pal“ are Show” The bases are
`Shown m green, While the dioxyr‘bose
`:2irggr:ngl”§£:)t£:f5:13;]?‘1’3‘1
`strands mm in opposite directions and
`that each base pair is held together by
`either two or three hydrogen bonds
`(see also Panel 3W2, pp, 100,101),
`
`
`
`ecule are on the inside of the double helix, with the sugar phosphates on the
`outside. This demands that the bases on one strand be extremely close to those
`on the other, and the fit proposed required specific base—pairingbetween a large
`purine base (A or (3, each of which has a double ring) on one chain and a smaller
`pyrimidine base (T or C, each of which has a single ring) on the other chain (Fig—
`ure 340).
`Both evidence from earlier biochemical experiments and conclusions derived
`from model building suggested that complementary base pairs (also called
`Watson—Crick base pairs) form between A and T and between G and C. Biochemi-
`cal analyses of DNA preparations from different species had shown that, although
`the nucleotide composition of DNA varies a great deal (for example, from 13%
`A residues to 36% A residues in the DNA of different types of bacteria), there is
`a general rule that quantitatively [G] = [Cl and [A] = [T]. Model building revealed
`that the numbers of effective hydrogen bonds that could be formed between G
`and C or between A and T were greater than for any other combinations (see
`Panel 3—2, pp. 100401). The double-helical model for DNA thus neatly explained
`the quantitative biochemistry.
`
`The Structure of DNA Provides an Explanation
`for Heredity 11
`
`A gene carries biological information in a form that must be precisely copied and
`transmitted from each cell to all of its progeny. The implications of the discov—
`
`Nucleic Acids
`
`99
`
`
`
`GeneDX 1025, pg. 4
`
`GeneDX 1025, pg. 4
`
`
`
`A chain
`‘
`
`t
`
`_
`71‘
`
`:
`1
`
`5
`
`"
`
`1
`
`'
`
`§”—'_‘_.— !
`3
`é >
`"
`’
`"
`s
`y
`‘
`
`_
`
`:
`
`"
`
`A}
`.
`
`,
`
`.
`
`1
`
`_
`'
`
`‘
`
`"
`
`'
`
`_
`
`L
`
`.
`£1
`
`.
`
`_
`x
`
`._.
`..
`
`/
`
`J
`‘
`
`,
`'
`
`l
`
`‘
`‘
`
`,_:
`'
`
`_
`
`3
`
`,
`
`4
`‘ 36
`
`_
`
`"
`
`The Nucleotide Sequence of a Gene Determines the Amino
`Acid Sequence of a Protein 13
`DNA is relatively inelt chemically. The information it contains is expressed in-
`directly via other molecules: DNA directs the synthesis of specific RNA and pro
`tein molecules, which in turn determine the cell’s chemical and physical prop-
`erties.
`At about the time that biophysicists were analyzing the three-dimensional
`structure of DNA by X-ray diffraction. biochemists were intensively studying the
`chemical structure of proteins. It was already known that proteins are chains of
`amino acids joined together by sequential peptide linkages; but it was only in the
`early 19505, when the small protein insulin was sequenced (Figure 3-14), that it
`was discovered that each type of protein consists of a unique sequence of amino
`
`acids. Just as solving the structure of DNA was seminal in understanding the
`molecular basis of genetics and heredity, so sequencing insulin provided a key
`to understanding the structure and function of proteins. If insulin had a definite,
`genetically determined sequence, then presumably so did every other protein.
`It seemed reasonable to suppose, moreover, that the properties of a protein
`would depend on the precise order in which its constituent amino acids are ar—
`ranged.
`
`Both DNA and protein are composed of a linear sequence of subunits; even-
`tually. the analysis of the proteins made by mutant genes demonstrated that the
`two sequences are c0-linear——that is, the nucleotides in DNA are arranged in an
`order corresponding to the order of the amino acids in the protein they specify.
`It became evident that the DNA sequence contains a coded specification of the
`protein sequence. The central question in molecular biology then became how
`a cell translates a nucleotide sequence in DNA into an amino acid sequence in
`a protein.
`
`Portions of DNA Sequence Are Copied into RNA Molecules
`That Guide Protein Synthesis ‘4
`
`The synthesis of proteins involves copying specific regions of DNA (the genes)
`into polynucleotides of a chemically and functionally different type known as
`ribonucleic acid, or RNA. RNA, like DNA, is composed of a linear sequence of
`nucleotides, but it has two small chemical differences: [1) the sugar—phosphate
`backbone of RNA contains ribose instead of a deoxyribose sugar and (2) the base
`thymine (T) is replaced by uracil (U), a very closely related base that likewise pairs
`with A (see Panel 3—2, pp. 100—101).
`RNA retains all of the information of the DNA sequence from which it was
`copied, as well as the base—pairing properties of DNA. Molecules of RNA are syn-
`thesized by a process known as DNA transcription, which is similar to DNA rep-
`lication in that one of the two strands of DNA acts as a template on which the
`base-pairing abilities of incoming nucleotides are tested. When a good match is
`achieved with the DNA template, a tibonucleotide is incorporated as a covalently
`bonded unit. In this way the growing RNA chain is elongated one nucleotide at
`a time.
`
`DNA transcription differs from DNA replication in a number of ways. The
`RNA product, for example, does not remain as a strand annealed to DNA. Iust
`behind the region where the ribonucleotides are being added, the original DNA
`helix re—forms and releases the RNA chain. Thus RNA molecules are single-
`
`104
`
`Chapter 3 : Macromolecules: Structure, Shape, and Information
`
`W
`
`Figure 3—14 The amino acid
`sequence ofbovine insulin. Insulin is
`a very small protein that consists of
`two polypeptide chains, one 21 and
`the other 30 amino acid residues long,
`Each chain has a unique, genetically
`determined sequence of amino acids.
`The one-letter symbols used to
`specify amino acids are those listed
`in Panel 2—5: pages 56—57; the
`5—8 bonds shown in red are disulfide
`bonds between cysteine fesidues‘ The
`protein is made initially as a single
`long polypeptide chain (encoded by a
`single gene) that is subsequently
`0133"“ to give the W0 Chaiflsa
`
`
`
`GeneDX 1025, pg. 5
`
`GeneDX 1025, pg. 5
`
`
`
`
`
`EUCARYOTES
`
`PROCARYOTES
`
`intron exon
`\
`/
`2W:=
`
`éTRANSCRIPTlON
`
`
`
`
`
`'maNA —
`
`
`
`
`
`
`
`
`
`
` RNA
`spucmo
`
`as
`
`TBANSLATlON
`
`
`proteinW
`
`
`
`DNA
`
`‘ immune»:
`_
`,
` mRNAvr-zgrrw .
`; TBANSLATlDN
`‘ proteinmmm ,
`
`
`__ ; 34 Z The transfer of
`.
`,
`information from DNA to protein.
`The transfer proceeds by means of an
`RNA intermediate called messenger
`RNA (mRNA). In procaryotic cells the
`process is simpler than in eucaryotic
`cells. In eucaryotes the coding regions
`of the DNA (in the axons, shown in
`color) are separated by noncoding
`regions (the z'ntrons). As indicated,
`these introns must be removed by an
`enzymatically catalyzed RNA-splicing
`reaction to form the mRNA.
`
`stranded. Moreover, RNA molecules are relatively short compared to DNA mol-
`ecules since they are copied from a limited region of the DNA—enough to make
`one or a few proteins (Figure 3—15). RNA transcripts that direct the synthesis of
`protein molecules are called messenger RNA (mRNA) molecules, while other
`RNA transcripts serve as transfer RNAS (ZRNAS) or form the RNA components of
`ribosomes (rRNA) or smaller ribonucleoprotein particles.
`The amount of RNA made from a particular region of DNA is controlled by
`gene regulatory proteins that bind to specific sites on DNA close to the coding
`sequences of a gene. ln any cell at any given time, some genes are used to make
`RNA in very large quantities while other genes are not transcribed at all. For an
`active gene thousands of RNA transcripts can be made from the same DNA seg-
`ment in each cell generation. Because each mRNA molecule can be translated
`into many thousands of copies of a polypeptide chain, the information contained
`in a small region of DNA can direct the synthesis of millions of copies of a spe-
`cific protein. The protein fibroin, for example, is the major component of silk. In
`each silk gland cell a single fibroin gene makes 104 copies of mRNA, each of which
`directs the synthesis of l05 molecules of fibroin—producing a total of 109 mol~
`ecules of fibroin in just 4 days.
`
`Eucaryotic RNA Molecules Are Spliced to Remove
`Intron Sequences 15
`
`In bacterial cells most proteins are encoded by a single uninterrupted stretch of
`DNA sequence that is copied without alteration to produce an mRNA molecule.
`In 1977 molecular biologists were astonished by the discovery that most eucary~
`Otic genes have their coding sequences (called axons) interrupted by noncoding
`Sequences (called introns). To produce a protein, the entire length of the gene,
`including both its introns and its exons, is first transcribed into a very large RNA
`molecule—the primary transcript. Before this RNA molecule leaves the nucleus,
`a Complex of RNA—processing enzymes removes all of the intron sequences,
`thereby producing a much shorter RNA molecule. After this RNA-processing step,
`Called RNA splicing, has been completed, the RNA molecule moves to the cyto-
`Dlasm as an mRNA molecule that directs the synthesis of a particular protein (see
`Figure 3am.
`This seemingly wasteful mode of information transfer in eucaryotes is pre-
`SUmed to have evolVed because it makes protein synthesis much more versatile.
`The primary RNA transcripts of some genes, for example, can be spliced in vari-
`Ous ways to produce different mRNAs, depending on the cell type or stage of
`
`Nucleic Acids
`
`105
`
`GeneDX 1025, pg. 6
`
`GeneDX 1025, pg. 6
`
`
`
` ,
`
`
`;3 The genetic code. Sets of
`three nucleotides (codons) in an
`mRNA molecule are translated into
`amino acids in the course of protein
`synthesis according to the rules
`shown. The codons GUG and GAG,
`for example, are translated into valine
`and glutamic acid, respectively. Note
`that those codons with U or C as the
`second nucleotide tend to specify the
`more hydrophobic amino acids
`(compare with Panel 25, pp. 56—57).
`
`
`
`
`
`
`
`3.
`
`Avg,
`
`5.
`
`one AGC GUU‘AC
`
`m
`
`1
`
`
`
`c 3.1 T The three possible
`
`reading frames in protein synthesis.
`In the process of translating a
`nucleotide sequence (blue) into an
`amino acid sequence (green), the
`sequence of nucleotides in an mRN‘A
`molecule is read from the 5’ to the 3'
`end in sequential sets of three
`nucleotides. In principle, therefore,
`the same RNA sequence can specifY_
`three completely different amino 301
`sequences, depending on the
`“reading frame.”
`
`
`
`GeneDX 1025, pg. 7
`
`1st position
`
`2nd position
`
`3rd position
`(3' end)
`
`i
`
`Ser
`Set
`Set
`Ser
`
`Tyr
`Tyr
`STOP
`STOP
`
`Cys
`Cys
`STOP
`Trp
`
`_
`,.
`.:
`
`
`
`
`_
`Arg
`His
`Pro
`a:
`Arg
`His
`Pro
`.
`Arg
`Gln
`Pro
`PTO
`Gln
`Arg
`’2
`
`
`Set
`Asn
`Thr
`lie
`Set
`Asn
`Thr
`lle
`lle
`Thr
`Lys
`Arg
`Met
`Thr
`Lys
`Arg
`
`
`Val
`Val
`Val
`Val
`
`Ala
`Ala
`Ala
`Ala
`
`Asp
`Asp
`Glu
`Glu
`
`Gly
`Gly
`Gly
`Gly
`
`,
`j
`.
`
`development. This allows different proteins to be produced from the same gene.
`Moreover, because the presence of numerous introns facilitates genetic recom-
`bination events between exons, this type of gene arrangement is likely to have
`been profoundly important in the early evolutionary history of genes, speeding
`up the process whereby organisms evolve new proteins from parts of preexist-
`ing ones instead of evolving totally new amino acid sequences.
`
`Sequences of Nucleotides in mRNA Are “Read” in Sets
`of Three and Translated into Amino Acids 1‘3
`
`The rules by which the nucleotide sequence of a gene is translated into the amino
`acid sequence of a protein, the so—called genetic code, were deciphered in the
`early 19603. The sequence of nucleotides in the mRNA molecule that acts as
`an intermediate was found to be read in serial order in groups of three. Each
`triplet of nucleotides, called a codon, specifies one amino acid. Since RNA is
`a linear polymer of four different nucleotides, there are 43 = 64 possible codon
`triplets (remember that it is the sequence of nucleotides in the triplet that is
`important). However. only 20 different amino acids are commonly found in
`proteins, so that most amino acids are specified by several codons: that is, the
`genetic code is degenerate. The code (shown in Figure 3—16) has been highly
`conserved during evolution: with a few minor exceptions, it is the same in organ—
`isms as diverse as bacteria, plants, and humans.
`In principle, each RNA sequence can be translated in any one of three dif—
`ferent readingflames depending on where the decoding process begins (Figure
`3—17). In almost every case only one of these reading frames will produce a func—
`tional protein. Since there are no punctuation signals except at the beginning and
`end of the RNA message, the reading frame is set at the initiation of the trans
`lation process and is maintained thereafter.
`
`tRNA Molecules Match Amino Acids
`to Groups of Nucleotides ‘7
`
`The codons in an mRNA molecule do not directly recognize the amino acids they
`specify in the way that an enzyme recognizes a substrate. The translation of
`mRNA into protein depends on ”adaptor" molecules that recognize both an
`
`106
`
`Chapter 3 : Macromolecules: Structure, Shape, and Information
`
`GeneDX 1025, pg. 7
`
`
`
`Summary
`
`Before the synthesis of a particular protein can begin, the corresponding mRNA
`molecule must be produced by DNA transcription. Then a small ribosomal subunit
`binds to the mRNA molecule at a start cation (AUG) that is recognized by a unique
`initiator tRM/l molecule. A large ribosomal subunit binds to complete the ribosome
`and initiate the elongation phase ofprotein synthesis. During this phase aminoacyl
`ZRNAs, each bearinga specific amino acid, sequentially bind to the appropriate cation
`in mRNA by forming complementary base pairs with the tRNA anticoa'nn. Each
`amino acid is added to the carboxyl-terminal end of the growing polypeptide by
`means ofa cycle ofthree sequential steps: aminoacyl-tRNA binding, followed by pep—
`tide bond formation, followed by ribosome translocation. The ribosome progresses
`from cation to cation in the 5’—ro-3’ direction along the mRNA molecule until one of
`three stop codons is reached. A releasefactor then binds to the stop coolon, term inal—
`ing translation and releasing the completed polypeptidefi'om the ribosome.
`Eucaryotic and procaryotic ribosomes are highly homologous, despite substantial
`differences in the number and size of their rRNA and protein components. The pre-
`dominant role of rRNA in ribosome structure and function is likely to reflect the
`ancient origin ofprotein synthesis, which is thought to have evolved in an environ—
`ment dominated by RNA—mediated catalysis.
`
`‘\ “v «.0213 16
`.“ix fid‘}.uc£h.i
`
`
`The long-term survival of a species may be enhanced by genetic changes, but the
`survival of the individual demands genetic stability. Maintaining genetic stability
`requires not only an extremely accurate mechanism for replicating the DNA
`before a cell divides, but also mechanisms for repairing the many accidental le-
`sions that occur continually in DNA. Most such spontaneous changes in DNA are
`temporary because they are immediately corrected by processes collectively
`called DNA repair. Only rarely do the cell’s DNA maintenance processes fail and
`allow a permanent change in the DNA. Such a change is called a mutation, and
`it can destroy an organism if the change occurs in a vital position in the DNA se-
`quence.
`
`Before examining the mechanisms or” DNA repair, we briefly discuss the
`maintenance of DNA sequences from one generation to the next.
`
`DNA Sequences Arc Maintained with Very High Fidelity 17
`
`The rate at which stable changes occur in DNA sequences (the mutation rate) can
`be estimated only indirectly. One way is to compare the amino acid sequence of
`the same protein in several species. The fraction of the amino acids that are dif—
`ferent can then be compared with the estimated number of years since each pair
`of species diverged from a common ancestor, as determined from the fossil
`record. In this way one can calculate the number of years that elapse. on aver—
`ago, before an inherited change in the amino acid sequence of a protein becomes
`fixed in the species. Because each such change will commonly reflect the alter
`arion of a single nucleotide in the DNA sequence of the gene encoding that pro-
`tein, this value can be used to estimate the average number of years required to
`produce a single, stable mutation in the gene.
`Such calculations always will substantially underestimate the actual muta-
`
`tion rate because most mutajons will spoil the function of the protein and vanish
`from the population through natural selection. But There is one family of proteins
`Whose sequence does not seem to matter, arid so the genes that encode [hem can
`accumulate mutations without being selected against. These proteins are the
`fibrirropepfides—ZO—residue-long fragmenis that are discarded from the protein
`fibrinogen when it is activated to form fibrin during blood clotting. Since the
`function of fibrinopeprides apparently does not depend on their amino acid se-
`
`242 Chapter 6 : Basic Genetic Mechanisms
`
`GeneDX 1025, pg. 8
`
`GeneDX 1025, pg. 8
`
`
`
`observed Mutation Rates in Proliferating Cells
`Onsjstent with Evolutionary Estimates 13
`
`
`theY can tolerate almost any amino acid change. Sequence analysis of
`_ Opeptides indicates that an average-sized protein 400 amino acids long
`on fand0m1y altered by an amino acid change roughly once every 200,000
`_
`bite recently, DNA sequencing technology has made it possible to com—
`
`815’
`esponding nucleotide sequences in regions of the genome that do not
`
`Corr rotein. Comparisons of such sequences in several mammalian species
`. C
`e estimates of the mutation rate during evolution that are in excellent
`
`_ proficient with those obtained from the fibrinopeptide studies.
`‘ 3!?
`
`mutation rate can be estimated more directly by observing the rate at which
`
`The taneous genetic changes arise in a large population of cells followed over a
`
`s Drivel}: short period of time. This can he done either by estimating the fre—
`{glancy with which new mutants arise in very large animal populations (in a
`L
`' grim, of fruit flies or mice, for example) or by screening for changes in specific
`Egoteins in cells growing in culture. Although they are only approximate, the
`
`numbers obtained in both cases are consistent with an error frequency of 1 base-
`
`pair change in roughly 109 base pairs for each cell generation. Consequently, a
`_ Single gene that encodes an average-sized protein (containing about 103 coding
`
`-
`i base pairs) Would suffer a mutation once in about 105 cell generations. This num—
`
`" he; is at least roughly consistent with the evolutionary estimate described above,
`.vahiCh one mutation appears in an average gene in the germ line every 200,000
`
`, years.
`
`Most Mutations in Proteins Are Deleterious
`' and Are Eliminated by Natural Selection 19
`
`, When the number of amino acid differences in a particular protein is plotted for
`
`several pairs of species against the time since the species diverged, the result is
`,a reasonably straight line. That is, the longer the period since divergence, the
`larger the number of differences. For convenience, the slope of this line can be
`expressed in terms of the “unit evolutionary time" for that protein, which is the
`average time required for 1 amino acid change to appear in a sequence of 100
`amino acid residues. When various proteins are compared, each shows a different
`but characteristic rate of evolution (Figure 6—32). Since all DNA base pairs are
`thought to be subject to roughly the same rate of random mutation, these differ—
`ent rates muSt reflect differences in the probability that an organism with a ran—
`dOHI mutation over the given protein will survive and propagate. Changes in
`amino acid sequence are evidently much more harmful for some proteins than
`for Others. From Table 6—2 we can estimate that about 6 of every 7 random amino
`acld Changes are harmful over the long term in hemoglobin, about 29 of every
`30.3mino acid changes are harmful in cytochrome c, and virtually all amino
`acid Changes are harmful in histone H4. We assume that individuals who carried
`:glch harmful mutations have been eliminated from the population by natural
`GCtlon.
`
`LOW Mutation Rates Are Necessary for Life as We Know It 19
`33:16 most mutations are deleterious, no species can afford to allow them to
`@Ulate at a high rate in its germ cells. We discuss later why the observed
`“133% frequency, low though it is, nevertheless, is thought to limit the number
`BO’OOSoential proteins that any organism can encode in its germ line to about
`highe - By an extenston ot the same arguments, a mutatlon frequency tenfold
`e
`r Would limit an organlsm to about 6000 essential proteins. In this case
`a filltion would probably have stopped at an organism no more complex than
`H fly.
`
`DNA Repair
`
`AO aminoacidchangesper100aminoacids
`
`
`
`
`
`200
`
`400
`
`600
`
`800
`
`1000
`
`millions of years since divergence
`of species
`
`Figure 6—32 Different proteins
`evolve at very different rates. A
`comparison of the rates of amino acid
`change found in hemoglobin, cyto-
`chrome c, and the fibrinopeptides.
`Hemoglobin and cytochrome c have
`changed much more slowly during
`evolution than the fibrinopeptides. In
`determining rates of change per year
`[as in Table 6—2), it is important to
`realize that two species that diverged
`from a common ancestor 100 million
`years ago are separated by 200 million
`years of evolutionary time.
`
`243
`
`GeneDX 1025, pg. 9
`
`GeneDX 1025, pg. 9
`
`
`
`4. DNA cloning, whereby a single DNA molecule can be copied to generate
`many billions of identical molecules.
`
`5. DNA engineering, by which DNA sequences are altered to make modified
`versions of genes, which are reinserted back into cells or organisms.
`
`in this chapter we explain how recombinant DNA technology has generated
`the new experimental approaches that have revolutionized cell biology.
`
`
`
`Before the 19703 the goal of isolating a single gene from a large chromosome
`seemed unattainable. Unlike a protein, a gene does not exist as a discrete entity
`in cells, but rather as a small region of a much larger DNA molecule. Although
`the DNA molecules in a cell can be randomly broken into small pieces by me—
`chanical force, a fragment containing a single gene in a mammalian genome
`would still be only one among a hundred thousand or more DNA fragments,
`indistinguishable in their average size. How could such a gene be purified? Since
`all DNA molecules consist of an approximately equal mixture of the same four
`nucleotides, they cannot be readily separated, as proteins can, on the basis of
`their different charges and binding properties. Moreover, even if a purification
`scheme could be devised, vast amounts of DNA would be needed to yield enough
`of any particular gene to be useful for further experiments.
`The solution to all of these problems began to emerge with the discovery of
`restriction nucleases. These enzymes, which can be purified from bacteria, cut the
`DNA double helix at specific sites defined by the local nucleotide sequence, pro~
`ducing double—stranded DNA fragments of strictly defined sizes. Different spe—
`cies of bacteria make restriction nucleases with different sequence specificities,
`and it is relatively simple to find a restriction nuclease that will create a DNA
`fragment that includes a particular gene. The size of the DNA fragment can then
`
`
`
`
` m- ,vwa .i. ,_ c - __ am. "3,:
`
`
`
`
`liable 7—3. Some Major Steps in the Development of Recombinant DNA TechnologyW
`1869
`Miescher isolated DNA for the first time.
`
`1944
`
`1953
`
`1957
`
`1961
`
`1962
`
`1966
`1967
`
`Avery provided evidence that DNA, rather than protein, carries the
`genetic information during bacterial transformation.
`Watson and Crick proposed the double-helix model for DNA structure
`based on x-ray results of Franklin and Wilkins.
`Kornberg discovered DNA polymerase, the enzyme now used to
`produce labeled DNA probes.
`Marmur and Doty discovered DNA ren aturation, establishing the
`specificity and feasibility of nucleic acid hybridization reactions.
`Arber provided the first evidence for the existence of DNA restriction
`nucleases, leading to their later purification and use in DNA sequence
`characterization by Nathans and H. Smith.
`Nirenberg, Ochoa, and Khorana elucidated the genetic code.
`Gellert discovered DNA ligase, the enzyme used to join DNA fragments
`together.
`
`1975
`
`1972—1973 DNA cloning techniques were developed by the laboratories of Boyer,
`Cohen, Be