`
`I
`
`BIOCHEMISTRY
`
`• FIFTH EDITION •
`
`Jeremy M. Berg
`Johns Hopkins University School of Medicine
`
`John L. Tymoczko
`Carleton College
`
`Lubert Stryer
`Stanford University
`
`Web content by
`Neil D. Clarke
`Johns Hopkins University School of Medicine
`
`W. H. Freeman and Company .
`New York
`II
`
`Novo Nordisk Ex. 2076, P. 1
`Mylan Institutional v. Novo Nordisk
`IPR2020-00324
`
`
`
`Geneta1 Library System
`Universly of Wisconsin - Madison
`728 Stale Street
`Maison. WI 53706-1494
`U.s.A.
`
`Steenbock Memorial Library
`University of Wisconsin - Madison
`550 Babcock Drive
`Madison, WI 53706-1293
`
`PUBLISHER: Michelle Julet
`
`DEVELOPMENT EDITOR: Susan Moran
`
`NEW MEDIA DEVELOPMENT EDITOR: Sonia Divittorio
`
`NEW MEDIA AND SUPPLEMENTS EDITOR: Mark Santee
`
`MEDIA DEVELOPERS: CADRE design; molvisions.com-
`3D molecular visualization
`
`MARKETING MANAGER: Carol Coffey
`
`PROJECT EDITOR: Georgia Lee Hadler
`
`MANUSCRIPT EDITOR: Patricia Zimmerman
`
`COVER AND TEXT DESIGN: Victoria Tomaselli
`
`COVER ILLUSTRATION: Torno Narashima
`
`ILLUSTRATION COORDINATOR: Cecilia Varas
`
`ILLUSTRATIONS: Jeremy Berg with Network Graphics
`
`PHOTO EDITOR: Vikii Wong
`
`PHOTO RESEARCHER: Dena Betz
`
`PRODUCTION COORDINATOR: Julia DeRosa
`
`COMPOSITION: TechBooks
`
`MANUFACTURING: RR Donnelley & Sons Company
`
`Library of Congress Cataloguing-in-Publication Data
`
`Berg, Jeremy Mark
`Biochemistry/ Jeremy Berg, John Tymoczko, Lubert Stryer.- 5th ed.
`p.
`cm.
`Fourth ed. by Lubert Stryer.
`Includes bibliographical references and index.
`ISBN 0-7167-4954-8 (CH1-31 only}---ISBN 0-7167-3051-0 (CH1-34)
`ISBN 0-7167-4955-6 (CH32-34 only}---ISBN 0-7167-4684-0 (CH1- 34,
`International edition)
`1. Biochemistry. I. Tymoczko, John L. II. Stryer, Lubert. III. Title.
`
`QP514.2 .S66 2001
`5 72----dc21
`
`2001051259
`
`© 2002 by W . H . Freeman and Company;© 1975, 1981, 1988,
`1995 by Lubert Stryer. All rights reserved
`
`No part of this book may be reproduced by any mechanical, photographic, or
`electronic process, or in the form of a phonographic recording, nor may it be stored in
`a retrieval system, transmitted, or otherwise copied for public or private use, without
`written permission from the publisher.
`
`Printed in the United States of America
`
`First printing 2001
`
`W . H. Freeman and Company
`41 Madison Avenue, New York, New York 10010
`Houndmills, Basingstoke RG21 6XS, England
`
`Novo Nordisk Ex. 2076, P. 2
`Mylan Institutional v. Novo Nordisk
`IPR2020-00324
`
`
`
`n
`
`Protein Structure and Function
`
`Crystals of human insulin. Insulin is a protein hormone, crucial for
`maintaining blood sugar at appropriate levels. (Below) Chains of amino
`acids in a specific sequence (the primary structure) define a protein
`like insulin. These chains fold into well-defined structures (the tertiary
`structure)-in this case a single insulin molecule. Such structures
`assemble with other chains to form arrays such as the complex of six
`insulin molecules shown at the far right (the quarternary structure).
`These arrays can often be induced to form well-defined crystals (photo
`at left), which allows determination of these structures in detail.
`[(Left) Alfred Pasieka/ Peter Arnold.)
`
`C
`Primary
`structure
`
`Secondary
`structure
`
`Tertiary
`structure
`
`Quarternary
`structure
`
`Proteins are the most versatile macromolecules in living systems and serve
`crucial functions in essentially all biological processes. They function as cat(cid:173)
`alysts, they transport and store other molecules such as oxygen, they pro(cid:173)
`vide mechanical support and immune protection, they gen-
`erate movement, they transmit nerve impulses, and they
`control growth and differentiation. Indeed, much of this
`text will focus on understanding what proteins do and how
`they perform these functions.
`Several key properties enable proteins to participate in
`such a wide range of functions.
`
`OUTLINE
`
`l.1 Proteins Are Built from a Repertoire
`of 20 Amino Acids
`
`1. Proteins are linear polymers built of monomer units called
`amino acids. The construction of a vast array of macromol(cid:173)
`ecules from a limited number of monomer building blocks
`is a recurring theme in biochemistry. Does protein func(cid:173)
`tion depend on the linear sequence of amino acids? The
`function of a protein is directly dependent on its three(cid:173)
`dimensional structure (Figure 3.1). Remarkably, proteins
`spontaneously fold up into three-dimensional structures
`that are determined by the sequence of amino acids in the
`protein polymer. Thus, proteins are the embodiment of the
`transition from the one-dimensional world of sequences to the
`three-dimensional world of molecules capable of diverse
`activities.
`
`2. Proteins contain a wide range of functional groups. These
`functional groups include alcohols, thiols, thioethers, carboxylic
`
`l.2 Primary Structure: Amino Acids Are
`Linked by Peptide Bonds to Form
`Polypeptide Chains
`
`l.l Secondary Structure: Polypeptide
`Chains Can Fold into Regular Structures
`Such as the Alpha Helix, the Beta Sheet,
`and Turns and Loops
`
`l.4 Tertiary Structure: Water-Soluble
`Proteins Fold into Compact Structures
`with Nonpolar Cores
`
`l.5 Quaternary Structure: Polypeptide
`Chains Can Assemble into Multisubunit
`Structures
`
`l.6 The Amino Acid Sequence
`of a Protein Determines Its Three(cid:173)
`Dimensional Structure
`
`Novo Nordisk Ex. 2076, P. 3
`Mylan Institutional v. Novo Nordisk
`IPR2020-00324
`
`
`
`7 42
`CHAPTER 3 • Protein Structure and Function
`
`FIGURE 3.2 A complex protein
`assembly. An electron micrograph of
`insect flight tissue in cross section shows
`a hexagonal array of two kinds of protein
`filaments. [Courtesy of Dr. Michael Reedy.]
`
`~ FIGURE 3.3 Flexibility and
`
`function. Upon binding iron, the protein
`lactoferrin undergoes conformational
`changes that allow other molecules to
`distinguish between the iron-free and the
`iron-bound forms.
`
`~ FIGURE 3.1 Structure dictates function. A protein component of the DNA
`
`replication machinery surrounds a section of DNA double helix. The structure of the protein
`allows large segments of DNA to be copied without the replication machinery dissociating
`from the DNA.
`
`acids, carboxamides, and a variety of basic groups. When combined in vari(cid:173)
`ous sequences, this array of functional groups accounts for the broad spec(cid:173)
`trum of protein function. For instance, the chemical reactivity associated with
`these groups is essential to the function of enzymes, the proteins that catalyze
`specific chemical reactions in biological systems (see Chapters 8- 10).
`
`3. Proteins can interact with one another and with other biological macro(cid:173)
`molecules to form complex assemblies. The proteins within these assemblies
`can act synergistically to generate capabilities not afforded by the individ(cid:173)
`ual component proteins (Figure 3.2). These assemblies include macro(cid:173)
`molecular machines that carry out the accurate replication of DNA, the
`transmission of signals within cells, and many other essential processes.
`
`4. Some proteins are quite rigid, whereas others display limited flexibility .
`Rigid units can function as structural elements in the cytoskeleton (the in(cid:173)
`ternal scaffolding within cells) or in connective tissue. Parts of proteins with
`limited flexibility may act as hinges, springs, and levers that are crucial to
`protein function, to the assembly of proteins with one another and with
`other molecules into complex units, and to the transmission of information
`within and between cells (Figure 3.3).
`
`Iron
`
`Novo Nordisk Ex. 2076, P. 4
`Mylan Institutional v. Novo Nordisk
`IPR2020-00324
`
`
`
`3.1 PROTEINS ARE BUILT FROM A REPERTOIRE
`OF 20 AMINO ACIDS
`
`Amino acids are the building blocks of proteins. An a-amino acid consists
`of a central carbon atom, called the a carbon, linked to an amino group, a
`carboxylic acid group, a hydrogen atom, and a distinctive R group. The R
`group is often referred to as the side chain. With four different groups con(cid:173)
`nected to the tetrahedral a-carbon atom, a-amino acids are chiral; the two
`mirror-image forms are called the L isomer and the D isomer (Figure 3.4).
`
`4l r--
`
`A Repertoire of 20 Amino Acids
`
`Notation for distinguishing stereoisomers(cid:173)
`The four different substituents of an
`asymmetric carbon atom are assigned
`a priority according to atomic number.
`The lowest-priority substituent, often
`hydrogen, is pointed away from the
`viewer. The configuration about the
`carbon is called 5, from the Latin sinis(cid:173)
`ter for "left," if the progression from
`the highest to the lowest priority is
`counterclockwise. The configuration is
`called R, from the Latin rectus for
`"right," if the progression is clockwise.
`
`L isomer
`
`o isomer
`
`FIGURE l.4 The L and o isomers of amino acids. R refers to the side chain.
`The L and o isomers are mirror images of each other.
`
`Only L amino acids are constituents of proteins . For almost all amino acids,
`the L isomer has S (rather than R ) absolute configuration (Figure 3.5). Al(cid:173)
`though considerable effort has gone into understanding why amino acids in
`proteins have this absolute configuration, no satisfactory explanation has
`been arrived at. It seems plausible that the selection of Lover D was arbi(cid:173)
`trary but, once made, was fixed early in evolutionary history.
`Amino acids in solution at neutral pH exist predominantly as dipolar
`ions (also called zwitterions). In the dipolar form, the amino group is proton(cid:173)
`ated (- NH3 +) and the carboxyl group is deprotonated (- COO - ). The ion(cid:173)
`ization state of an amino acid varies with pH (Figure 3.6). In acid solution
`(e.g. , pH 1), the amino group is protonated (- NH3 +) and the carboxyl group
`is not dissociated (- COOH). As the pH is raised, the carboxylic acid is the
`first group to give up a proton, inasmuch as its pKa is near 2. The dipolar
`form persists until the pH approaches 9, when the protonated amino group
`
`FIGURE l.5 Only L amino acids are
`found in proteins. Almost all L amino
`acids have an 5 absolute configuration
`(from the Latin sinister meaning "left").
`The counterclockwise direction of the
`arrow from highest- to lowest-priority
`substituents indicates that the chiral
`center is of the 5 configuration.
`
`Zwitterionic form
`
`Both groups
`deprotonated
`
`/
`
`/
`
`Both groups
`protonated
`
`1
`C .g
`~
`C
`Qj u
`C
`0 u
`
`0
`
`2
`
`4
`
`6
`
`8
`
`10
`
`12
`
`14
`
`pH
`
`FIGURE l.6 Ionization state as a
`function of pH. The ionization state of
`amino acids is altered by a change in pH.
`The zwitterionic form predominates near
`physiological pH.
`
`Novo Nordisk Ex. 2076, P. 5
`Mylan Institutional v. Novo Nordisk
`IPR2020-00324
`
`
`
`44
`CHAPTER 3 • Protein Structure and Function
`
`Glycine
`(Gly, G)
`
`Alanine
`(Ala, A)
`
`FIGURE 3.7 Structures of glycine and
`alanine. (Top) Ball-and-stick models
`show the arrangement of atoms and
`bonds in space. (Middle) Stereochemically
`realistic formulas show the geometrical
`arrangement of bonds around atoms (see
`Chapter 1 Appendix) . (Bottom) Fischer
`projections show all bonds as being
`perpendicular for a simplified
`representation (see Chapter 1 Appendix) .
`
`+H3N-
`
`cp
`
`C-
`l
`H
`Glycine
`{Gly, G)
`
`coo-
`
`Alanine
`(Ala, A)
`
`Valine
`(Val, V)
`
`Leucine
`(Leu, L)
`
`lsoleucine
`(lie, 1)
`
`Methionine
`(Met,M)
`
`~ I t 2
`
`\
`
`CH 2j
`
`Valine
`(Val, V)
`
`Leucine
`(Leu, L)
`
`lsoleucine
`(lie, 1)
`
`+H3N-
`
`C- coo-
`l
`H
`Methionine
`(Met,M)
`
`FIGURE 3.8 Amino acids with aliphatic side chains. The additional chiral center
`of isoleucine is indicated by an asterisk.
`
`Novo Nordisk Ex. 2076, P. 6
`Mylan Institutional v. Novo Nordisk
`IPR2020-00324
`
`
`
`- - - - - - - - - - -< 45 r--
`
`A Repertoire of 20 Amino Acids
`
`loses a proton. For a review of acid- base concepts and pH, see the appendix
`to this chapter.
`Twenty kinds of side chains varying in size, shape, charge, hydrogen(cid:173)
`bonding capacity, hydrophobic character, and chemical reactivity are com(cid:173)
`monly found in proteins. Indeed, all proteins in all species- bacterial, ar(cid:173)
`chaeal, and eukaryotic- are constructed from the same set of 20 amino
`acids. This fundamental alphabet of proteins is several billion years old. The
`remarkable range of functions mediated by proteins results from the diver(cid:173)
`sity and versatility of these 20 building blocks. Understanding how this al(cid:173)
`phabet is used to create the intricate three-dimensional structures that en(cid:173)
`able proteins to carry out so many biological processes is an exciting area of
`biochemistry and one that we will return to in Section 3.6.
`Let us look at this set of amino acids. The simplest one is glycine, which
`has just a hydrogen atom as its side chain. With two hydrogen atoms
`bonded to the a-carbon atom, glycine is unique in being achiral. Alanine,
`the next simplest amino acid, has a methyl group (-CH3 ) as its side chain
`(Figure 3.7).
`Larger hydrocarbon side chains are found in valine,
`leucine, and
`isoleucine (Figure 3.8). Methionine contains a largely aliphatic side chain that
`includes a thioether (--S-) group. The side chain of isoleucine includes an
`additional chiral center; only the isomer shown in Figure 3.8 is found in
`that is, they tend
`proteins. The larger aliphatic side chains are hydrophobic-
`to cluster together rather than contact water. The three-dimensional struc(cid:173)
`tures of water-soluble proteins are stabilized by this tendency of hy(cid:173)
`drophobic groups to come together, called the hydrophobic effect (see Sec(cid:173)
`tion 1.3. 4 ). The different sizes and shapes of these hydrocarbon side chains
`enable them to pack together to form compact structures with few holes .
`Praline also has an aliphatic side chain, but it differs from other members
`of the set of 20 in that its side chain is bonded to both the nitrogen and the
`a-carbon atoms (Figure 3.9). Praline markedly influences protein architec(cid:173)
`ture because its ring structure makes it more conformationally restricted
`than the other amino acids.
`
`H2
`C-
`
`/
`H2C
`" w
`H2
`
`cH 2
`
`coo-
`
`FIGURE 3.9 Cyclic structure of proline.
`The side chain is joined to both the c,
`carbon and the amino group.
`
`Three amino acids with relatively simple aromatic side chains are part of
`the fundamental repertoire (Figure 3.10). Phenylalanine, as its name indi(cid:173)
`cates, contains a phenyl ring attached in place of one of the hydrogens of
`alanine. The aromatic ring of tyrosine contains a hydroxyl group. This hy(cid:173)
`droxyl group is reactive, in contrast with the rather inert side chains of the
`other amino acids discussed thus far . Tryptophan has an indole ring joined
`to a methylene (- CH 2-) group; the indole group comprises two fused rings
`and an NH group. Phenylalanine is purely hydrophobic, whereas tyrosine
`and tryptophan are less so because of their hydroxyl and H groups. The
`aromatic rings of tryptophan and tyrosine contain delocalized 'IT electrons
`that strongly absorb ultraviolet light (Figure 3.11 ).
`A compound's extinction coefficient indicates its ability to absorb light.
`Beer's law gives the absorbance (A) of light at a given wavelength:
`
`Novo Nordisk Ex. 2076, P. 7
`Mylan Institutional v. Novo Nordisk
`IPR2020-00324
`
`
`
`46
`CHAPTER 3 • Protein Structure and Function
`
`Phenylalanine
`(Phe, F)
`
`Tyrosine
`(Tyr, Y)
`
`Tryptophan
`(Trp, W)
`
`FIGURE 3.10 Amino acids with
`aromatic side chains. Phenylalanine,
`tyrosine, and tryptophan have hydrophobic
`character. Tyrosine and tryptophan also
`have hydrophilic properties because of
`their -OH and -NH- groups, respectively.
`
`10,000
`
`H
`
`HC ~
`I
`HC ~
`
`H
`C
`""---cH
`II
`_,,,-c
`C
`H
`
`Phenylalanine
`(Phe, F)
`
`Tyrosine
`(Tyr, Y)
`
`Tryptophan
`(Trp, W)
`
`A= Ecl
`
`Beer's law
`
`where E is the extinction coefficient [in units that are the
`reciprocals of molarity and distance in centimeters (M - 1
`cm - l)], c is the concentration of the absorbing species (in
`units of molarity, M ), and 1 is the length through which
`the light passes (in units of centimeters). For tryptophan,
`absorption is maximum at 280 nm and the extinction
`coefficient is 3400 M - 1 cm - l whereas, for tyrosine, ab(cid:173)
`sorption is maximum at 276 nm and the extinction coef(cid:173)
`ficient is a less-intense 1400 M - 1 cm -
`l . Phenylalanine
`absorbs light less strongly and at shorter wavelengths.
`T he absorption of light at 280 nm can be used to estimate
`
`Trp
`
`----' 8,000
`E u
`' ~
`
`4,000
`
`...,
`'-' 6,000
`C
`Q) ·o
`ii:
`Q)
`0
`u
`C
`
`0 ·-e
`C -~ 2,000
`
`LJ.J
`
`0
`220
`
`240
`
`280
`260
`Wavelength (nm)
`
`300
`
`320
`
`FIGURE 3.11 Absorption spectra of the aromatic amino acids
`tryptophan (red) and tyrosine (blue). Only these amino acids
`absorb strongly near 280 nm. [Courtesy of Greg Gatto).
`
`Novo Nordisk Ex. 2076, P. 8
`Mylan Institutional v. Novo Nordisk
`IPR2020-00324
`
`
`
`Serine
`(Ser, S)
`
`Threonine
`(Thr, T)
`
`47
`A Repertoire of 20 Amino Acids
`
`[M
`~ +H3N-
`c -
`l
`H
`Serine
`(Ser, S)
`
`coo-
`
`Threonine
`(Thr, T)
`
`FIGURE 3.12 Amino acids containing
`aliphatic hydroxyl groups. Serine and
`threonine contain hydroxyl groups that
`render them hydrophilic. The additional
`chiral center in threonine is indicated by
`an asterisk.
`
`the concentration of a protein in solution if the number of tryptophan and
`tyrosine residues in the protein is known.
`Two amino acids, serine and threonine, contain aliphatic hydroxyl groups
`(Figure 3.12). erine can be thought of as a hydroxylated version of alanine,
`whereas threonine resembles valine with a hydroxyl group in place of one
`of the valine methyl groups. The hydroxyl groups on serine and threonine
`make them much more hydrophilic (water loving) and reactive than alanine
`and valine. Threonine, like isoleucine, contains an additional asymmetric
`center; again only one isomer is present in proteins.
`Cysteine is structurally similar to serine but contains a suljhydryl, or thiol
`(-SH ), group in place of the hydroxyl (-OH) group (Figure 3.13). The
`sulfhydryl group is much more reactive. Pairs of sulfhydryl groups may
`come together to form disulfide bonds, which are particularly important in
`stabilizing some proteins, as will be discussed shortly.
`
`w·-s
`
`CH2
`
`+H3NXC00-
`Cysteine
`(Cys, C)
`
`+H3N-
`
`ffi
`ltJi
`
`coo-
`
`c -
`l
`H
`
`FIGURE 3.13 Structure of cysteine.
`
`We turn now to amino acids with very polar side chains that render them
`highly hydrophilic. Lysine and arginine have relatively long side chains that
`terminate with groups that are positively charged at neutral pH. Lysine is
`capped by a primary amino group and arginine by a guanidinium group.
`Histidine contains an imidazole group, an aromatic ring that also can be pos(cid:173)
`itively charged (Figure 3.14 ).
`
`Novo Nordisk Ex. 2076, P. 9
`Mylan Institutional v. Novo Nordisk
`IPR2020-00324
`
`
`
`48 ~
`CHAPTER 3 • Protein Structure and Function
`
`Lysine
`(Lys, K)
`
`Arginine
`(Arg. R)
`
`Histidine
`(His, H)
`
`FIGURE 3.14 The basic amino acids
`lysine, arginine, and histidine.
`
`Lysine
`(Lys, K)
`
`+H N-
`3
`
`c- coo-
`I
`H
`Arginine
`(Arg. R)
`
`Histidine
`(His, H)
`
`~
`
`With a pKa value near 6, the imidazole group can be uncharged or posi(cid:173)
`tively charged near neutral pH, depending on its local environment (Figure
`3.15). Indeed, histidine is often found in the active sites of enzymes, where
`the imidazole ring can bind and release protons in the course of enzymatic
`reactions .
`The set of amino acids also contains two with acidic side chains : aspar(cid:173)
`tic acid and g lutamic acid (Figure 3 .16). These amino acids are often called
`aspartate and glutamate to emphasize that their side chains are usually neg(cid:173)
`atively charged at physiological pH. Nonetheless, in some proteins these
`side chains do accept protons, and this ability is often functionally impor(cid:173)
`tant. In addition, the set includes uncharged derivatives of aspartate and
`
`FIGURE 3.15 Histidine ionization.
`Histidine can bind or release protons near
`physiological pH.
`
`Novo Nordisk Ex. 2076, P. 10
`Mylan Institutional v. Novo Nordisk
`IPR2020-00324
`
`
`
`Aspartate
`(Asp, D)
`
`Glutamate
`(Glu, E)
`
`Asparagine
`(Asn, N)
`
`Glutamine
`(Gln,Q)
`
`-,P
`
`O=-=C
`"-.CH
`x2
`+H3N
`c oo-
`
`Q __
`_Q
`"'--c,1/
`I
`CH2
`I
`c -
`I
`H
`Aspartate
`(Asp, D)
`
`+H3N-
`
`coo -
`
`0
`-
`\\
`c--~ o
`I
`H2C"'-.
`CH2
`
`+H3NXC00-
`
`- 0
`0 -_ -
`"'--c ;,-
`I
`CH2
`I
`CH2
`I
`c - coo-
`I
`H
`Glutamate
`(Glu, E)
`
`+H3N-
`
`NH2
`I
`O= C
`"-.CH
`x2
`+H3N
`c o o-
`
`0 ~
`
`/ NH2
`C
`I
`CH2
`
`+H3N-
`
`c - coo-
`I
`
`H2N
`\
`/ = o
`H2C"'-.
`CH2
`
`+H3NX
`
`C00-
`
`~ .,,--NH2
`C
`I
`CH2
`I
`CH2
`
`+H3N-
`
`coo-
`
`c -
`I
`H
`Glutamine
`(Gln,Q)
`
`FIGURE 3.16 Amino acids with side-chain carboxylates and carboxamides.
`
`glutamate- asparagine and glutamine- each of which contains a terminal
`carboxamide in place of a carboxylic acid (Figure 3 .16 ).
`Seven of the 20 amino acids have readily ionizable side chains. These
`7 amino acids are able to donate or accept protons to facilitate reactions
`as well as to form ionic bonds. Table 3.1 gives equilibria and typical pKa
`values for ionization of the side chains of tyrosine, cysteine, arginine, ly(cid:173)
`sine, histidine, and aspartic and glutamic acids in proteins. Two other
`the terminal a-amino group and the terminal a(cid:173)
`groups in proteins-
`carboxyl group- can be ionized, and typical pKa values are also included
`in Table 3.1.
`Amino acids are often designated by either a three-letter abbreviation or
`a one-letter symbol (Table 3.2). The abbreviations for amino acids are the
`first three letters of their names, except for asparagine (Asn), glutamine
`(Gln), isoleucine (Ile), and tryptophan (Trp). The symbols for many amino
`acids are the first letters of their names (e.g., G for glycine and L for leucine);
`the other symbols have been agreed on by convention. These abbreviations
`and symbols are an integral part of the vocabulary of biochemists.
`
`~ )' How did this particular set of amino acids become the building blocks
`T of proteins? First, as a set, they are diverse; their structural and chem(cid:173)
`ical properties span a wide range, endowing proteins with the versatility to
`assume many functional roles. Second, as noted in Section 2.1.1, many of
`these amino acids were probably available from prebiotic reactions. Finally,
`excessive intrinsic reactivity may have eliminated other possible amino
`
`Novo Nordisk Ex. 2076, P. 11
`Mylan Institutional v. Novo Nordisk
`IPR2020-00324
`
`
`
`7 50
`CHAPTER 3 • Protein Structure and Function
`
`I
`
`TABLE 3.1 Typical pKa values of ionizable groups in proteins
`
`Group
`
`Terminal a-carboxyl group
`
`Aspartic acid
`Glutamic acid
`
`Histidine
`
`Terminal a-amino group
`
`Acid
`
`0
`II
`_,,, C..____ 0 / H
`
`0
`II
`_,,, C..____ 0 / H
`
`H
`I
`
`N -G ' H
`
`+ H
`- N~H
`H
`
`Cysteine
`
`Tyrosine
`
`Lysine
`
`Arginine
`
`Base
`
`Typical pKa *
`
`3.1
`
`4.1
`
`6.0
`
`8.0
`
`8.3
`
`10.9
`
`10 .8
`
`12.5
`
`• pKa values depend on temperature, ionic strength, and the microenvironment of the
`ionizable group.
`
`I
`
`T ABLE 3.2 Abbreviations for amino acids
`
`Amino acid
`
`Alanine
`Arginine
`Asparagine
`Aspartic Acid
`Cysteine
`Ghrtamme
`G~icAcid
`Glycin.:._
`Histidine
`Isoleucine
`Leucine
`Lysine
`
`~
`
`Three-letter
`abbreviation
`
`One-letter
`abbreviation Amino acid
`
`Three-letter
`abbreviation
`
`One-letter
`abbreviation
`
`Ala
`Arg
`Asn
`Asp
`Cys
`Gln
`Glu
`Gly
`His
`Ile
`Leu
`Lys
`
`A
`R
`N
`D
`C
`Q
`E
`G
`H
`I
`L
`K
`
`Methionine
`Pheny lalanThe
`--
`--
`Praline
`?erine
`Threonins:
`Tryptophan
`Tvrosine
`~ne
`Asparagine or
`aspartic acid
`Glutamine or
`glutamic acid
`
`Met
`Phe
`Pro
`Ser
`Thr
`Trp
`Tyr
`Val
`Asx
`
`Glx
`
`M
`F
`p
`s
`T
`w
`y
`V
`B
`
`z
`
`Novo Nordisk Ex. 2076, P. 12
`Mylan Institutional v. Novo Nordisk
`IPR2020-00324
`
`
`
`acids. For example, amino acids such as homoserine and homocysteine tend
`to form five-membered cyclic forms that limit their use in proteins; the al(cid:173)
`ternative amino acids that are found in proteins-
`serine and cysteine- do
`not readily cyclize, because the rings in their cyclic forms are too small
`(Figure 3.17).
`
`f
`51
`Primary Structure
`
`+ HX
`
`+ HX
`
`Homoserine
`
`Serine
`
`*
`
`FIGURE 3.17 Undesirable reactivity in
`amino acids. Some amino acids are
`unsuitable for proteins because of
`undesirable cyclization. Homoserine can
`cyclize to form a stable, five-membered
`ring, potentially resulting in peptide-bond
`cleavage. Cyclization of serine would form
`a strained, four-membered ring and thus
`is unfavored. X can be an amino group
`from a neighboring amino acid or another
`potential leaving group.
`
`3.2 PRIMARY STRUCTURE: AMINO ACIDS ARE LINKED
`BY PEPTIDE BONDS TO FORM POLYPEPTIDE CHAINS
`
`Proteins are linear polymers formed by linking the a-carboxyl group of one
`amino acid to the a-amino group of another amino acid with a peptide bond
`(also called an amide bond). The formation of a dipeptide from two amino acids
`is accompanied by the loss of a water molecule (Figure 3.18). The equilibrium
`of this reaction lies on the side of hydrolysis rather than synthesis. Hence,
`onethe(cid:173)
`the biosynthesis of peptide bonds requires an input of free energy.
`less, peptide bonds are quite stable kinetically; the lifetime of a peptide bond
`in aqueous solution in the absence of a catalyst approaches 1000 years.
`
`Peptide bond
`
`FIGURE 3.18 Peptide-bond formation.
`The linking of two amino acids is
`accompanied by the loss of a molecule
`of water.
`
`A series of amino acids joined by peptide bonds form a polypeptide chain,
`and each amino acid unit in a polypeptide is called a residue. A polypeptide
`chain has polarity because its ends are different, with an a-amino group at
`one end and an a-carboxyl group at the other. By convention, the amino end
`is taken to be the beginning of a polypeptide chain, and so the sequence of
`amino acids in a polypeptide chain is written starting with the amino(cid:173)
`terminal residue. Thus, in the pentapeptide Tyr-Gly-Gly-Phe-Leu (YGGFL),
`tyrosine is the amino-terminal (N-terminal) residue and leucine is the car(cid:173)
`boxyl-terminal (C-terminal) residue (Figure 3.19). Leu-Phe-Gly-Gly-Tyr
`(LFGGY) is a different pentapeptide, with different chemical properties.
`A polypeptide chain consists of a regularly repeating part, called the main
`chain or backbone, and a variable part, comprising the distinctive side chains
`(Figure 3.20). The polypeptide backbone is rich in hydrogen-bonding po(cid:173)
`tential. Each residue contains a carbonyl group, which is a good hydrogen(cid:173)
`bond acceptor and, with the exception of praline, an NH group, which is a
`
`Novo Nordisk Ex. 2076, P. 13
`Mylan Institutional v. Novo Nordisk
`IPR2020-00324
`
`
`
`j 52 f - - - - - - -
`CHAPTER 3 • Protein Structure and Function
`
`OH
`
`FIGURE 3.19 Amino acid sequences
`have direction. This illustration of the
`pentapeptide Try-Gly-Gly-Phe-Leu (YGGFL)
`shows the sequence from the amino
`terminus to the carboxyl terminus. This
`pentapeptide, Leu-enkephalin, is an opioid
`peptide that modulates the perception of
`pain. The reverse pentapeptide, Leu-Phe(cid:173)
`Gly-Gly-Tyr (LFGGY), is a different molecule
`and shows no such effects.
`
`FIGURE 3.20 Components of a
`polypeptide chain. A polypeptide chain
`consists of a constant backbone (shown in
`black) and variable side chains (shown in
`green).
`
`Dalton-
`A unit of mass very nearly equal to that
`of a hydrogen atom. Named after John
`Dalton (1766-1844), who developed
`the atomic theory of matter.
`
`Kilodalton (kd)-
`A unit of mass equal to 1000 daltons.
`
`Tyr
`
`Gly
`
`Gly
`
`Phe
`
`Leu
`
`Amino
`terminal residue
`
`Carboxyl
`terminal residue
`
`good hydrogen-bond donor. These groups interact with each other and with
`functional groups from side chains to stabilize particular structures, as will
`be discussed in detail.
`Most natural polypeptide chains contain between 50 and 2000 amino
`acid residues and are commonly referred to as proteins. Peptides made of
`small numbers of amino acids are called oligopeptides or simply peptides. The
`mean molecular weight of an amino acid residue is about 110, and so the
`molecular weights of most proteins are between 5500 and 220,000 . We can
`also refer to the mass of a protein, which is expressed in units of daltons;
`one dalton is equal to one atomic mass unit. A protein with a molecular
`weight of 50,000 has a mass of 50,000 daltons, or 50 kd (kilodaltons).
`In some proteins, the linear polypeptide chain is cross-linked. The most
`common cross-links are disulfide bonds, formed by the oxidation of a pair of
`cysteine residues (Figure 3.21 ). The resulting unit of linked cysteines is
`
`FIGURE 3.21 Cross-links. The formation
`of a disulfide bond from two cysteine
`residues is an oxidation reaction.
`
`Cysteine
`
`Oxidation
`
`Reduction
`
`s
`
`H_ /H2
`"'--..✓t·,c/
`.
`II
`H
`0
`
`Cysteine
`
`Cystine
`
`Novo Nordisk Ex. 2076, P. 14
`Mylan Institutional v. Novo Nordisk
`IPR2020-00324
`
`
`
`53
`Primary Structure
`
`called cystine. Extracellular proteins often have several disulfide bonds,
`whereas intracellular proteins usually lack them. Rarely, nondisulfide cross(cid:173)
`links derived from other side chains are present in some proteins. For ex(cid:173)
`ample, collagen fibers in connective tissue are strengthened in this way, as
`are fibrin blood clots.
`
`3.2.1 Proteins Have Unique Amino Acid Sequences
`That Are Specified by Genes
`In 1953, Frederick Sanger determined the amino acid sequence of insulin,
`a protein hormone (Figure 3.22). This work is a landmark in biochemistry be(cid:173)
`cause it showed for the first time that a protein has a precisely defined amino
`acid sequence. Moreover, it demonstrated that insulin consists only of L
`amino acids linked by peptide bonds between a-amino and a-carboxyl
`groups. This accomplishment stimulated other scientists to carry out se(cid:173)
`quence studies of a wide variety of proteins. Indeed, the complete amino
`acid sequences of more than 100,000 proteins are now known. The striking
`fact is that each protein has a unique, precisely defined amino acid sequence.
`The amino acid sequence of a protein is often referred to as its primary
`structure.
`A series of incisive studies in the late 19 50s and early 1960s revealed that
`the amino acid sequences of proteins are genetically determined. The se(cid:173)
`quence of nucleotides in D A, the molecule of heredity, specifies a com(cid:173)
`plementary sequence of nucleotides in R A, which in turn specifies the
`amino acid sequence of a protein. In particular, each of the 20 amino acids
`of the repertoire is encoded by one or more specific sequences of three nu (cid:173)
`cleotides (Section 5. 5).
`Knowing amino acid sequences is important for several reasons. First,
`knowledge of the sequence of a protein is usually essential to elucidating its
`mechanism of action (e.g., the catalytic mechanism of an enzyme). More(cid:173)
`over, proteins with novel properties can be generated by varying the se(cid:173)
`quence of known proteins. Second, amino acid sequences determine the
`three-dimensional structures of proteins. Amino acid sequence is the link
`between the genetic message in D A and the three-dimensional structure
`that performs a protein's biological function. Analyses of relations between
`amino acid sequences and three-dimensional structures of proteins are un(cid:173)
`covering the rules that govern the folding of polypeptide chains. T hird, se(cid:173)
`quence determination is a component of molecular pathology, a rapidly
`growing area of medicine. Alterations in amino acid sequence can produce
`abnormal function and disease. Severe and sometimes fatal diseases, such
`as sickle-cell anemia and cystic fibrosis, can result from a change in a sin(cid:173)
`gle amino acid within a protein. Fourth, the sequence of a protein reveals
`much about its evolutionary history (see Chapter 7). Proteins resemble one
`another in amino acid sequence only if they have a common ancestor. Con(cid:173)
`sequently, molecular events in evolution can be traced from amino acid se(cid:173)
`quences; molecular paleontology is a flourishing area of research.
`
`s- - - - - -s
`A chain
`I
`I
`Gly-lle-Val -Glu-Gln-Cys-Cys-Ala -Ser-Val-Cys-Ser-Leu-Tyr-Gln-Leu- Glu-Asn-Tyr- Cys-Asn
`I
`I 21
`s
`10
`15
`s
`s
`I
`/
`s
`s
`I
`I
`B chain
`Phe-Val-Asn-Gln-His-Leu-Cys-Gly-Ser-His-Leu-Val-Glu-Al a-Leu-Tyr-Leu-Val- Cys-Gly-Glu-Arg-Gly-Phe-Phe-Tyr-Thr-Pro-Lys-Ala
`5
`10
`15
`20
`25
`30
`
`FIGURE 3.22 Amino acid sequence
`of bovine insulin.
`
`Novo Nordisk Ex. 2076, P. 15
`Mylan Institutional v. Novo Nordisk
`IPR2020-00324
`
`
`
`3.2.2 Polypeptide Chains Are Flexible
`Yet Conformationally Restricted
`
`Examination of the geometry of the protein backbone re(cid:173)
`veals several important features. First, the pe