`
`MOLECULAR BIOLOGY OF
`
`THE CELL
`
`THIRD EDITION
`
`Bruce Alberts 0 Dennis Bray
`Julian Lewis 0 Martin Raff 0 Keith Roberts
`James D. Watson
`
`Garland Publishing, Inc.
`NewYm-k & London
`
`GeneDX 1025, pg. 1
`
`GeneDX 1025, pg. 1
`
`
`
`Text Editor: Miranda Robertson
`
`Managing Editor: Ruth Adams
`Illustrator: Nigel Orrne
`Molecular Model Drawings: Kate Hesketh—Moore
`Director of Electronic Publishing: John M-Roblin
`Computer Specialist: Chuck Bartelt
`Disk Preparation: Carol Winter
`Copy Editor: Shirley M. Cobert
`Production Editor: Douglas Goertzen
`Production Coordinator: Perry Bessas
`Indexer: Maija Hinkle
`
`BruceAlberts received his Ph.D. from Harvard University and is
`currently President of the National Academy of Sciences and Professor
`of Biochemistry and Biophysics at the University of California, San
`Francisco. Dennis Bray received his Ph.D. from the Massachusetts
`Institute of Technology and is currently a Medical Research Council
`Fellow in the Department of Zoology, University of Cambridge.
`Julian Lewis received his D.Phil. from the University of Oxford and is
`currently a Senior Scientist in the Imperial Cancer Research Fund
`Developmental Biology Unit, University of Oxford. Martin Raffreceived
`his MD. from McGill University and is currently a Professor in the MRC
`Laboratory for Molecular Cell Biology and the Biology Department,
`University College London. Keith Roberts received his Ph.D. from the
`University of Cambridge and is currently Head of the Department of Cell
`Biology, the John Innes Institute, Norwich. James D. Watson received his
`Ph.D. from Indiana University and is currently Director of the Cold Spring
`Harbor Laboratory. He is the author of Molecular Biology of the Gene and,
`with Francis Crick and Maurice Wilkins, won the Nobel Prize in Medicine
`and Physiology in 1962.
`
`© 1983, 1989, 1994 by Bruce Alberts, Dennis Bray, Julian Lewis,
`Martin Raff. Keith Roberts, and James D. Watson.
`
`All rights reserved. No part of this book covered by the copyright hereon
`may be reproduced or used in any form or by any means—graphic,
`electronic, or mechanical, including photocopying, recording, taping, or
`information storage and retrieval systems—without permission of the
`publisher.
`
`Library of Congress Cataloging-in-Publication Data
`Molecular biology of the cell / Bruce Alberts .
`.
`. let al.].——3rd ed.
`p.
`cm.
`Includes bibliographical references and index.
`ISBN 0—8153—1619—4 (hard c0ver).-ISBN 0-81534620-8 (pbk.)
`1. Cytology. 2. Molecular biology. I. Alberts, Bruce.
`lDNLM: 1. Cells. 2. Molecular Biology. QH 581.2 M718 1994]
`QH581.2.M64 1994
`574.87—dc20
`DNLM/DLC
`
`for Library of Congress
`
`Published by Garland Publishing, Inc.
`717 Fifth Avenue, New York, NY 10022
`
`Printed in the United States of America
`1514131210987654
`
`Front cover: The photograph shows a rat nerve cell
`in culture. It is labeled ( yellow) with a fluorescent
`antibody that stains its cell body and dendritic
`processes. Nerve terminals ( green 1 from other
`neurons (not visible), which have made synapses on
`the cell, are labeled with a different antibody.
`(Courtesy of Olaf Mundigl and Pietro de Camilli.)
`Dedication page: Gavin Borden, late president
`of Garland Publishing, weathered in during his
`mid-1980s climb near Mount McKinley with
`MBoC author Bruce Alberts and famous mountaineer
`guide Mugs Stump (1940—1992).
`Back cover: The authors, in alphabetical order,
`crossing Abbey Road in London on their way to lunch.
`Much of this third edition was written in a house just
`around the corner. (Photograph by Richard Olivier.)
`
`93—45907
`CIP
`
`
`
`GeneDX 1025, pg. 2
`
`GeneDX 1025, pg. 2
`
`
`
`
`
`Summary
`
`The sequence ofsubunits in a macromolecule contains information that determines
`the three‘dimenst‘onal contours ofits surface. lhese contours in turn govern the rec-
`ognition between one molecule and another, or between different parts of the same
`molecule, by means of weak, noncovalent bonds. The attractive forces are offour
`types: ionic bonds, van der Waals attractions, hydrogen bonds, and an interaction
`between nonpolar groups caused by their hydrophobic expulsion from water. Two
`molecules will recognize each other by a process in which they meet by random dif—
`fusion, stick togetherfora while, and then dissociate. The strength ofthis interaction
`is generally expressed in terms ofan equilibrium constant. Since the only way to make
`recognition infallible is to make the energy of binding infinitely large, living cells
`constantly make errors; those that are intolerable are corrected by specific repair
`processes.
`
`fl.4
`
`8
`
`Genes Are Made of DNA 9
`
`It has been obvious for as long as humans have sown crops or raised animals that
`each seed or fertilized egg must contain a hidden plan, or design, for the devel—
`opment of'the organism. In modern times the science of genetics grew up around
`the premise of invisible informalioncontaining elements, called genes, that are
`distributed to each daughter cell when a cell divides. Therefore, before dividing,
`a cell has to make a copy of its genes in order to give a complete set to each
`daughter cell. The genes in the sperm and egg cells carry the hereditary informa—
`tion from one generation to the next.
`The inheritance of biological characteristics must involve patterns of atoms
`that follow the laws of physics and chemistry: in other words, genes must be
`formed from molecules. At first the nature of these molecules was hard to irnag-
`ine. What kind of molecule could be stored in a cell and direct the activities of
`a developing organism and also be capable of accurate and almost unlimited
`replication?
`
`By the end of the nineteenth century biologists had recognized that the car-
`riers of inherited information were the chromosomes that become visible in the
`nucleus as a cell begins to divide. But the evidence that the deoxyribonucleic acid
`(DNA) in these chromosomes is the substance of which genes are made came
`only much later, from studies on bacteria. In 1944 it was shown that adding pu—
`rified DNA from one strain of bacteria to a second, slightly different bacterial
`strain conferred heritable properties characteristic of the first strain upon the
`second. Because it had been commonly believed that only proteins have enough
`conformational complexity to carry genetic information, this discovery came as
`a surprise, and it was not generally accepted untfl the early 19508. Today the idea
`that DNA carries genetic information in its long chain of nucleotides is so fun—
`damental to biological thought that it is sometimes difficult to realize the cnor—
`mous intellectual gap that it filled.
`
`DNA Molecules Consist of Two Long Chains Held Together
`by Complementary Base Pairs 10
`
`l'he difiiculty that geneticists had in accepting DNA as the substance of genesis
`understandable, considering the simplicity of its chemistry. A DNA chain is a
`long, unbranched polymer composed of only four types of subunits. These are
`the deoxyiibonucleotides containing the bases adenine (A), cytosine (C), guanine
`(G), and thymine (T). The nucleotides are linked together by covalent phospho-
`diester bonds that join the 5’ carbon of one deoxyribose group to the 3’ carbon
`of the next (see Panel 2—6, pp. 5869). The four kinds of bases are attached to this
`
`98
`
`ChapterS : Macromolecules: Structure, Shape, and Information
`
`GeneDX 1025, pg. 3
`
`GeneDX 1025, pg. 3
`
`
`
`5' chain end
`
`
`
`“4 .
`O
`
`
`
`
`
`1360‘
`O phosphodiester
`bond
`
`double-
`helix axis
`
`as»
`0
`
`
`
`iriigurg 3— i «a The DNA double helix.
`(A) A short section of the helix viewed
`from its side. Four complementary
`base pairs are shown. The bases are
`shown in green, while the deoxyribose
`sugars are blue. (B) The helix viewed
`from an end. Note that the two DNA
`strands run in opposite directions and
`that each base pair is held together by
`either two or three hydrogen bonds
`(see also Panel 3—2, pp. 100—101).
`
`
`
`(A)
`
`repetitive sugar-phosphate chain almost like four kinds of beads strung on a
`necklace.
`How can along chain of nucleotides encode the instructions for an organ-
`ism or even a cell? And how can these messages be copied from one generation
`of cells to the next? The answers lie in the structure of the DNA molecule
`Early in the 19505 x—ray diffraction analyses of specimens of DNA pulled into
`fibers suggested that the DNA molecule is a helical polymer composed of two
`strands. The helical structure of DNA was not surprising since, as we have seen,
`a helix will often form if each of the neighboring subunits in a polymer is regu—
`larly oriented. But the finding that DNA is two—stranded was of crucial signifi-
`cance. It provided the clue that led, in 1953, to the construction of a model that
`fitted the observed x—ray diffraction pattern and thereby solved the puzzle of DNA
`structure and function.
`An essential feature of the model was that all of the bases of the DNA mol—
`
`ecule are on the inside of the double helix, with the sugar phosphates on the
`outside. This demands that the bases on one strand be extremely close to those
`on the other, and the fit proposed required specific base—pairingbetween a large
`purine base [A or G, each of which has a double ring) on one chain and a smaller
`pyrimidine base (T or C, each of which has a single ring) on the other chain (Fig—
`ure 3—10).
`Both evidence from earlier biochemical experiments and conclusions derived
`from model building suggested that complementary base pairs (also called
`Watson—Crick base pairs) form between A and T and between G and C. Biochemi-
`cal analyses of DNA preparations from different species had shown that, although
`the nucleotide composition of DNA varies a great deal (for example, from 13%
`A residues to 36% A residues in the DNA of different types of bacteria), there is
`a general rule that quantitatively [G] = [C] and [A] : [T]. Model building revealed
`that the numbers of effective hydrogen bonds that could be formed between G
`and C or between A and T were greater than for any other combinations (see
`Panel 3—2, pp. 100—101). The double-helical model for DNA thus neatly explained
`the quantitative biochemistry.
`
`The Structure of DNA Provides an Explanation
`for Heredity 11
`
`A gene carries biological information in a form that must be precisely copied and
`transmitted from each cell to all of its progeny. The implications of the discov—
`
`Nucleic Acids
`
`99
`
`
`
`GeneDX 1025, pg. 4
`
`GeneDX 1025, pg. 4
`
`
`
`The Nucleotide Sequence of a Gene Determines the Amino
`Acid Sequence of a Protein 13
`
`DNA is relatively inert chemically. The information it contains is expressed in—
`directly via other molecules: DNA directs the synthesis of specific RNA and pro-
`tein molecules, which in turn determine the cell’s chemical and physical prop-
`erties.
`
`At about the time that biophysicists were analyzing the three-dimensional
`structure of DNA by X-ray diffraction, biochemists were intensively studying the
`chemical structure of proteins. It was already known that proteins are chains of
`amino acids joined together by sequential peptide linkages; but it was only in the
`early 19505, when the small protein insulin was sequenced (Figure 3—14), that it
`was discovered that each type of protein consists of a unique sequence of amino
`acids. Just as solving the structure of DNA was seminal in understanding the
`molecular basis of genetics and heredity, so sequencing insulin provided a key
`to understanding the structure and function of proteins. If insulin had a definite,
`genetically determined sequence. then presumably so did every other protein.
`It seemed reasonable to suppose, moreover, that the properties of a protein
`would depend on the precise order in which its constituent amino acids are ar-
`ranged.
`Both DNA and protein are composed of a linear sequence of subunits; even-
`tually, the analysis of the proteins made by mutant genes demonstrated that the
`two sequences are co-linear—that is, the nucleotides in DNA are arranged in an
`order corresponding to the order of the amino acids in the protein they specify.
`It became evident that the DNA sequence contains a coded specification of the
`protein sequence. The central question in molecular biology then became how
`a cell translates a nucleotide sequence in DNA into an amino acid sequence in
`a protein.
`
`Portions of DNA Sequence Are Copied into RNA Molecules
`That Guide Protein Synthesis 14
`
`The synthesis of proteins involves copying specific regions of DNA (the genes)
`into polynucleotides of a chemically and functionally different type known as
`ribonucleic acid, or RNA. RNA, like DNA, is composed of a linear sequence of
`nucleotides, but it has two small chemical differences: (1) the sugar—phosphate
`backbone of RNA contains ribose instead of a deoxyribose sugar and (2) the base
`thymine (T) is replaced by uracil (U), a very closely related base that likewise pairs
`with A (see Panel 3—2, pp. 100—101).
`RNA retains all of the information of the DNA sequence from which it was
`copied, as well as the base-pairing propenies of DNA. Molecules of RNA are syn»
`thesized by a process known as DNA transcription, which is similar to DNA rep-
`lication in that one of the two strands of DNA acts as a template on which the
`base—pairing abilities of incoming nucleotides are tested. When a good match is
`achieved with the DNA template, a ribonucleotide is incorporated as a covalently
`bonded unit. In this way the growing RNA chain is elongated one nucleotide at
`a time.
`
`DNA transcription differs from DNA replication in a number of ways. The
`RNA product, for example, does not remain as a strand annealed to DNA. Just
`behind the region where the ribonucleotides are being added, the original DNA
`helix re—forms and releases the RNA chain. Thus RNA molecules are single-
`
`104 Chapter3 : Macromolecules: Structure, Shape, and Information
`
`
`
`1
`
`Figure 3—1; The amino acid
`sequence of bovine insulin. Insulin is
`a very small protein that consists of
`two polypeptide chains, one 21 and
`the other 30 amino acid residues long,
`Each chain has a unique, genetically
`determined sequence of amino acids.
`The one»letter symbols used to
`specify amino acids are those listed
`in Panel 2—5, pages 56—57; the
`S‘S bonds shown in red are disulfide
`bonds between cysteine residues. The
`protein is made initially as a single
`long polypeptide chain (encoded by a
`single gene) that is subsequently
`cleaved to give the two chains.
`
`
`
`GeneDX 1025, pg. 5
`
`GeneDX 1025, pg. 5
`
`
`
`
`
`EUCARYOTES
`
`PROCARYOTES
`
`intron exon
`\
`/
`:2W:=
`
`éyTRANSCRIPTlON
`
`RNA
`spucmo
`
`DNA '
`=2:W===
`
`" TRANSCRiPTlON
`
`.
`,
`-,
`I’DRNA xéfiu'v: .ca.1. my,
`l TRANSLATION
`
`
`
`
`
`
`preteinmm ,
`
`
`
`:7 15;:
`
`'
`
`
`TBANSLATlON
`
`proteinWe.
`
`
`
`stranded. Moreover, RNA molecules are relatively short compared to DNA mol-
`ecules since they are copied from a limited region of the DNA—enough to make
`one or a few proteins (Figure 345). RNA transcripts that direct the synthesis of
`protein molecules are called messenger RNA (mRNA) molecules, while other
`RNA transcripts serve as transfer RNAs (rRNAs) or form the RNA components of
`ribosomes (rRNA) or smaller ribonucleoprotein particles.
`The amount of RNA made from a particular region of DNA is controlled by
`gene regulatory proteins that bind to specific sites on DNA close to the coding
`sequences of a gene. In any cell at any given time, some genes are used to make
`RNA in very large quantities while other genes are not transcribed at all. For an
`active gene thousands of RNA transcripts can be made from the same DNA seg—
`ment in each cell generation. Because each mRNA molecule can be translated
`into many thousands of copies of a polypeptide chain, the information contained
`in a small region of DNA can direct the synthesis of millions of copies of a spe—
`cific protein. The protein fibroirz, for example, is the major component of silk. In
`each silk gland cell a single fibroin gene makes 104 copies of mRNA, each of which
`directs the synthesis of 105 molecules of fibroin—producing a total of 109 mol—
`ecules of fibroin in just 4 days.
`
`Eucaryotic RNA Molecules Are Spliced to Remove
`lntron Sequences 15
`
`In bacterial cells most proteins are encoded by a single uninterrupted stretch of
`DNA sequence that is copied without alteration to produce an mRNA molecule.
`In 1977 molecular biologists were astonished by the discovery that most eucary-
`Ofic genes have their coding sequences (called axons) interrupted by noncoding
`Sequences (called introns). To produce a protein, the entire length of the gene,
`including both its introns and its exons, is first transcribed into a very large RNA
`muleculefithe primary transcript. Before this RNA molecule leaves the nucleus,
`a Complex of RNA—processing enzymes removes all of the intron sequences,
`thereby producing a much shorter RNA molecule. After this RNA-processing step,
`Called RNA splicing, has been completed, the RNA molecule moves to the cyto-
`plasm as an mRNA molecule that directs the synthesis of a particular protein (see
`Figure 3—15).
`This seemingly wasteful mode of information transfer in eucaryotes is pre-
`Sumed to have evolved because it makes protein synthesis much more versatile.
`The primary RNA transcripts of some genes, for example, can be spliced in vari-
`Ous ways to produce different mRNAs, depending on the cell type or stage of
`
`Nucleic Acids
`
`.
`
`: 5~j I. The transfer of
`
`information from DNA to protein.
`The transfer proceeds by means of an
`RNA intermediate called messenger
`RNA (mRNA). In procaryotic cells the
`process is simpler than in eucaryotic
`cells. In eucaryotes the coding regions
`of the DNA (in the axons, shown in
`color) are separated by noncoding
`regions (the introns). As indicated,
`these introns must be removed by an
`enzymatically catalyzed RNA—splicing
`reaction to form the mRNA.
`
`105
`
`GeneDX 1025, pg. 6
`
`GeneDX 1025, pg. 6
`
`
`
`2nd position
`
`1st position
`3rd position
`(5‘ end)
`(3' end)
`Ml
`Ser
`Tyr
`Cys
`Ser
`Tyr
`Cys
`Ser
`STOP
`STOP
`Ser
`STOP
`Trp
`
`1
`_.
`;‘
`
`_
`Arg
`His
`Pro
`:"
`Arg
`His
`Pro
`-
`Arg
`Gln
`Pro
`Pro
`Gln
`Arg
`3
`
`
`
`
`
`
`c 3—; F The genetic code. Sets of
`:.
`three nucleotides tendons) in an
`mRNA molecule are translated into
`
`
`
`amino acids in the course of protein
`synthesis according to the rules
`shown. The codons GUG and GAG,
`for example, are translated into valine
`and glutamic acid, respectively. Note
`that those codons with U or C as the
`second nucleotide tend to specify the
`more hydrophobic amino acids
`(compare with Panel 2—5. pp. 5657).
`
`Thr
`Thr
`Thr
`Thr
`
`Ala
`Ala
`Ala
`Ala
`
`Asn
`Asn
`Lys
`Lys
`
`Asp
`Asp
`Glu
`Glu
`
`Ser
`Ser
`Arg
`Arg
`
`Gly
`Gly
`Gly
`Gly
`
`j,
`_
`1
`
`“
`
`‘
`
`development. This allows different proteins to be produced from the same gene.
`Moreover, because the presence of numerous introns facilitates genetic recom—
`bination events between exons, this type of gene arrangement is likely to have
`been profoundly important in the early evolutionary history of genes, speeding
`up the process whereby organisms evolve new proteins from parts of preexist-
`ing ones instead of evolving totally new amino acid sequences.
`
`Sequences of Nucleotides in mRNA Are “Read” in Sets
`of Three and Translated into Amino Acids 15
`
`The rules by which the nucleotide sequence of a gene is translated into the amino
`acid sequence of a protein, the so-called genetic code, were deciphered in the
`early 19605. The sequence of nucleotides in the mRNA molecule that acts as
`an intermediate was found to be read in serial order in groups of three. Each
`triplet of nucleotides, called a codon, specifies one amino acid. Since RNA is
`a linear polymer of four different nucleotides, there are 43 = 64 possible codon
`triplets (remember that it is the sequence of nucleotides in the triplet that is
`important). However. only 20 different amino acids are commonly found in
`proteins, so that most amino acids are specified by several codons; that is, the
`genetic code is degenerate. The code (shown in Figure 3716) has been highly
`conserved during evolution: with a few minor exceptions, it is the same in organ-
`isms as diverse as bacteria, plants, and humans.
`In principle, each RNA sequence can be translated in any one of three dif—
`ferent readingframes depending on where the decoding process begins (Figure
`34 7). In almost every case only one of these reading frames will produce a func-
`tional protein. Since there are no punctuation signals except at the beginning and
`end of the RNA message, the reading frame is set at the initiation of the trans
`lation process and is maintained thereafter.
`
`tRNA Molecules Match Amino Acids
`to Groups of Nucleotides 17
`
`The codons in an mRNA molecule do not directly recognize the amino acids they
`specify in die way that an enzyme recognizes a substrate. The translation of
`mRNA into protein depends on “adaptor” molecules that recognize both an
`
`106 Chapters : Macromolecules: Structure, Shape, and Information
`
`
`
`
` 7&1? The three possible
`reading frames in protein synthesis-
`In the process of translating a
`nucleotide sequence (blue) into an
`amino acid sequence (green), the
`sequence of nucleotides in an mRNA
`molecule is read from the 5’ to the 3'
`end in sequential sets of three
`nucleotides. In principle, therefore,
`the same RNA sequence can specify
`three completely different amino aCid
`sequences. depending on the
`“reading frame,"
`
`GeneDX 1025, pg. 7
`
`GeneDX 1025, pg. 7
`
`
`
`Summary
`
`Before the synthesis of a particular protein can begin, the corresponding mRNA
`molecule must be produced by DNA transcription. Then a small ribosomal subunit
`binds to the mRNA molecule at a start codon (AUG) that is recognized by a unique
`initiator tRNA molecule. A large ribosomal subunit binds to complete the ribosome
`and initiate the elongation phase ofprotein synthesis. During this phase aminoacyl
`tRNAs, each bearing a specific amino acid, sequentially bind to the appropriate codon
`in mRNA by forming complementary base pairs with the tRNA anticodon. Each
`amino acid is added to the carboxyl—terminal end of the growing polypeptide by
`means ofa cycle ofthree sequential steps: aminoaqyl-tRNA binding, followed by pep—
`tide bond formaiion, followed by ribosome translocation. The ribosome progresses
`from codon to cation in the 5’-to—3’ direction along the mRNA molecule until one of
`three stop codons is reached. A releasefactor then binds to the stop codon, term inat—
`ing translation and releasing the completed polypeptidefi'om the ribosome.
`Eucaryotic and procaryotic ribosomes are highly homologous, despite substantial
`differences in the number and size of their rRNA and protein components. The pre-
`dominant role of rRNA in ribosome structure and fimction is likely to reflect the
`ancient orig'n ofprotein synthesis, which is thought to have evolved in an environ—
`ment dominated by EVA—mediated catalysis.
`
`
` T
`:3: Keypads ‘6
`
`The long-term survival of a species may be enhanced by genetic changes, but the
`survival of the individual demands genetic stability. Maintaining genetic stability
`requires not only an extremely accurate mechanism for replicating the DNA
`before a cell divides, but also mechanisms for repairing the many accidental le-
`sions that occur continually in DNA. Most such spontaneous changes in DNA are
`temporary because they are immediately corrected by processes collectively
`called DNA repair. Only rarely do the cell’s DNA maintenance processes fail and
`allow a permanent change in the DNA. Such a change is called a mutation, and
`it can destroy an organism if the change occurs in a viral position in the DNA se-
`quence.
`
`Before examining the mechanisms or" DNA repair, we briefly discuss the
`maintenance of DNA sequences from one generation to the next.
`
`DNA Sequences Are Maintained with Very High Fidelir 1"
`
`The rate at which stable changes occur in DNA sequences (Lhe mutation rate] can
`be estimated only indirectly. One way is to compare the amino acid sequence of
`the same protein in several species. The fraction of the amino acids that are dif—
`ferent can then be compared with the estimated number of years since each pair
`of species diverged from a common ancestor, as determined from the fossil
`record. In this way one can calculate the number of years that elapse, on aver
`ago, before an inherited change in the amino acid sequence of a protein becomes
`fixed in the species. Because each such change will commonly reflect the alter—
`ation of a single nucleoride in the DNA sequence of the gene encoding that pro-
`tein, this value can be used to estimate the average number of years required to
`produce a single, stable mutation in the gene.
`Such calculations always will substantially underestimate the actual muta-
`tion rate because most mutations will spoil the function of the protein and vanish
`from the population through natural selection. But there is one family of proteins
`whose sequence does not seem to matter, and so the genes th or encode them can
`accumulate mutations without being selected against. These proteins are the
`fibrinopeptides~20—residue-long fragments that are discarded from the protein
`fibrinogen when it is activated to form fibrin during blood clotting. Since the
`function of fibrinopeptides apparenfly does not depend on their amino acid se-
`
`242 Chapterfi : Basic Generic Mechanisms
`
`GeneDX 1025, pg. 8
`
`GeneDX 1025, pg. 8
`
`
`
`the can tolerate almost any amino acid change. Sequence analysis of
`eptides indicates that an average—sized protein 400 amino acids long
`
`be randomly altered by an amino acid change roughly once every 200,000
`
`More recently, DNA sequencing technology has made it possible to com-
`- 3,9315 Tesponding nucleotide sequences in regions of the genome that do not
`
`Cm rotein- Comparisons of such sequences in several mammalian species
`
`, c
`for pestimaies of the mutation rate during evolution that are in excellent
`
`L, predefiznt with those obtained from the fibrinopeptide studies.
`‘ _ 3313
`The obsered Mutation Rates in Proliferatinlg Cells
`3 Consistent With Evolutionary Estimates
`
`utatiOH rate can be estimated more directly by observing the rate at which
`The 1:1 neous genetic changes arise in a large population of cells followed over a
`
`6903:8131 short period of time. This can be done either by estimating the fre-
`
`gelallcy with which new mutants arise in very large animal populations (in a
`(111:2), of fruit flies or mice, for example) or by screening for changes in specific
`
`' icooieins in cells growing in culture. Although they are only approximate, the
`
`* n:mbers obtained in both cases are consistent with an error frequency of 1 base-
`,
`ail’ change in roughly 109 base pairs for each cell generation. Consequently, a
`
`; single gene that encodes an average—sized protein (containing about 103 coding
`base pairs) would suffer a mutation once in about 106 cell generations. This num-
`
`' her is at least roughly consistent with the evolutionary estimate described above,
`inwhiCh one mutation appears in an average gene in the germ line every 200,000
`
`,
`,' years.
`
`
`
`Most Mutations in Proteins Are Deleterious
`and Are Eliminated by Natural Selection 19
`
`
`
`When the number of amino acid differences in a particular protein is plotted for
`
`several pairs of species against the time since the species diverged, the result is
`
`a reasonably straight line. That is, the longer the period since divergence, the
`larger the number of differences. For convenience, the slope of this line can be
`
`expressed in terms of the “unit evolutionary time" for that protein, which is the
`average time required for 1 amino acid change to appear in a sequence of 100
`amino acid residues. \Nhen various proteins are compared, each shows a different
`but characteristic rate of evolution (Figure 6—32). Since all DNA base pairs are
`thOUght to be subject to roughly the same rate of random mutation, these differ—
`ent rates musr reflect differences in the probability that an organism with a ran-
`dom mutation over the given protein will survive and propagate. Changes in
`amino acid sequence are evidently much more harmful for some proteins than
`{01: Others. From Table 6—2 we can estimate that about 6 of every 7 random amino
`acrd Changes are harmful over the long term in hemoglobin, about 29 of every
`30' amino acid changes are harmful in cytochrome c, and virtually all amino
`acid Changes are harmful in histone H4. We assume that individuals who carried
`:ElCh harmful mutations have been eliminated from the population by natural
`ection.
`
`LOW Mutation Rates Are Necessary for Life as We Know It 19
`23:3 most mutations are deleterious, no species can afford to allow them to
`rnutaUllllate at a high rate in its germ cells. We discuss later why the observed
`of Es Imitfrequency, low though it is, nevertheless, is thought to limit the number
`60,0085m1a1 proteins. that any organism can encode in its germ line to about
`high - By an extenSion of the same arguments, a mutation frequency tenfold
`e
`er Would limit an organism to about 6000 essential proteins. In this case
`a fiftion would probably have stopped at an organism no more complex than
`It fly
`
`DNA Repair
`
`
`
`40 aminoacidchangesper100aminoacids
`
`400
`
`600
`
`800
`
`1000
`
`
`
`
`
`200
`
`millions of years since divergence
`of species
`
`Figure 6—32 Different proteins
`evolve at very different rates. A
`comparison of the rates of amino acid
`change found in hemoglobin, cyto-
`chrome c, and the fibrinopeptides.
`Hemoglobin and cytochrome c have
`changed much more slowly during
`evolution than the fibrinopeptides. In
`determining rates of change per year
`[as in Table 6—2), it is important to
`realize that two species that diverged
`from a common ancestor 100 million
`years ago are separated by 200 million
`years of evolutionary time.
`
`243
`
`GeneDX 1025, pg. 9
`
`GeneDX 1025, pg. 9
`
`
`
`4. DNA cloning, whereby a single DNA molecule can be copied to generate
`many billions of identical molecules.
`
`5. DNA engineering, by which DNA sequences are altered to make modified
`versions of genes, which are reinserted back into cells or organisms.
`
`In this chapter we explain how recombinant DNA technology has generated
`the new experimental approaches that have revolutionized cell biology.
`
`
`
`
`
`
`
`Before the 1970s the goal of isolating a single gene from a large chromosome
`seemed unattainable. Unlike a protein, a gene does not exist as a discrete entity
`in cells, but rather as a small region of a much larger DNA molecule. Although
`the DNA molecules in a cell can be randomly broken into small pieces by me—
`chanical force, a fragment containing a single gene in a mammalian genome
`would still be only one among a hundred thousand or more DNA fragments,
`indistinguishable in their average size. How could such a gene be purified? Since
`all DNA molecules consist of an approximately equal mixture of the same four
`nucleotides, they cannot be readily separated, as proteins can, on the basis of
`their different charges and binding properties. Moreover, even if a purification
`scheme could be devised, vast amounts of DNA would be needed to yield enough
`of any particular gene to be useful for further experiments.
`The solution to all of these problems began to emerge with the discovery of
`restriction nucleases. These enzymes, which can be purified from bacteria, cut the
`DNA double helix at specific sites defined by the local nucleotide sequence, pro—
`ducing double-stranded DNA fragments of strictly defined sizes. Different spe—
`cies of bacteria make restriction nucleases with different sequence specificities,
`and it is relatively simple to find a restriction nuclease that will create a DNA
`fragment that includes a particular gene. The size of the DNA fragment can then
`
`Table 7—: Some Major Steps in the Development of Recombinant DNA TechnologyW
`1869
`Miescher isolated DNA for the first time.
`
`1944
`
`1953
`
`1957
`
`1961
`
`1962
`
`1966
`1967
`
`Avery provided evidence that DNA, rather than protein, carries the
`genetic information during bacterial transformation.
`Watson and Crick proposed the double-helix model for DNA structure
`based on x—ray results of Franklin and Wilkins.
`Kornberg discovered DNA polymerase, the enzyme now used to
`produce labeled DNA probes.
`Marmur and Dory discovered DNA renaturation, establishing the
`specificity and feasibility of nucleic acid hybridization reactions.
`Arber provided the first evidence for the existence of DNA restriction
`nucleases, leading to their later purification and use in DNA sequence
`characterization by Nathans and H. Smith.
`Nirenberg, Ochoa, and Khorana elucidated the genetic code.
`Gellert discovered DNA ligase, the enzyme used to join DNA fragments
`together.
`
`1975
`
`197271973 DNA cloning techniques were developed by the laboratories of Boyer,
`Cohen, Berg, and their colleagues at Stanford University and the
`University of California at San Francisco.
`Soufltern developed gel—transfer hybridization for the detection of
`specific DNA sequences.
`1975—1977 Sanger and Barrell and Maxam and Gilbert developed rapid DNA-
`sequencing methods.
`1981—1982 Pahniter and Brin