throbber
I Mol Evol (1983) 19:171-175
`
`Jounal of
`Molecular Evolution
`© Springer -Verlag 1983
`
`Whatis a Conservative Substitution?
`
`5;‘mon French! and Barry Robson”
`
`1
`D
`2 “Pattment of Decision Theory, University of Manchester, Manchester, M13 9PL, Great Britain
`Partment of Biochemistry, University of Manchester, Manchester, M13 9PL, Great Britain
`
`emmary, It is commonly recognised that many evolu-
`con,ed changes of amino acid sequence in proteins are
`or €rvative:
`a substitution of one amino acid residue
`another has a far greater chanceof being acceptedif
`inve two residues are similar
`in properties. Here we
`eteete what properties
`are most
`important
`in
`Mining the similarity of two amino acids, from
`cattiglutionary point of view. Our
`results confirm
`mole, Observations that
`the hydrophobicity and the
`oe bulk of the side chain tend to be conserved.
`Dressy importantly they also show that evolutionary
`tes favour the conservation of secondary structure,
`-©
`that all these properties can be arranged in a two
`necasional diagram in which distances well preserve
`i Observed substitution frequencies between amino
`sional These results were obtained by a multi-dimen-
`tio, Scaling technique; and are independent of any
`emo Opinions about conserved properties. Thus,it is
`-Snstrated that all relations of importance to single
`"NO acid substitutions can be represented by a single
`han which is much more comprehensible and useful
`requ the usual
`tabular representation of substitution
`« tern Such a figure conveniently portrays the
`Ochemical code” for conservative substitution.
`
`Ktien Words: Amino acid substitution — Protein evolu-
`Dhobi Conservation of secondary structure — Hydro-
`Sdicity — Bulk — Multidimensional scaling
`
`nttoduction
`Daingnt et al. (1972) have collated much data concern-
`Work © amino acid sequences of proteins. From their
`and that of others it
`is apparent that natural
`
`Off:
`Thprine requests to: B. Robson
`
`selection has favoured changes in protein sequence in
`which certain physical and chemical properties of resi-
`dues are conserved (‘conservative substitution’). The
`question that concerned us was whether the relevant
`properties of have been properly and completely identi-
`fied. We have been brought to the conclusion that inter-
`esting details have not been appreciated, except per-
`haps by inspection of similar sequences which does not
`allow all
`the significant properties to be considered
`together in quantitative, objective, and useful manner.
`We have sought a more objective, quantitative approach,
`finally using a data-analytic method not widely exploit-
`ed by evolutionary molecular biologists; this study there-
`fore serves also to bring this technique to their attention.
`The topic of conservative substitutions is of interest
`for three reasons. First, there is the obvious need for
`such information in order to consider the similarity and
`relatedness of sequences. Second, it has often been hy-
`pothesised that natural selection pressures mainly favour
`the conservation of 3-dimensional structure while allow-
`ing for extensive substitution (cf. ‘neutralist theories’).
`If this is so, the few chemical and physical properties
`conserved would presumably be those that most gener-
`ally determine the 3-dimensional structure of a protein.
`Third, such knowledge is a pointer to which properties
`of residues must be modelled in computer simulation of
`protein folding. In speaking of 3-dimensional structure,
`we include the secondary structure on which the gross
`3-dimensional structure depends.
`Our analysis starts from that of Dayhoffet al. (1972)
`(their table 9.10) which constitutes a ‘relatedness odds
`matrix’. The elements of this matrix give the ratio of
`two probabilities:
`the probability that two residues at
`the same locus in two proteins are the consequence of
`common ancestry, and the probability that the relation
`occurred only by chance. The data were derived from
`comparing sequences within the cytochrome c, haemo-
`MSNExhibit 1026 - Page 1 of 5
`MSNv. Bausch - IPR2023-00016
`
`

`

`172
`
`globin, myoglobin, virus coat, chymotrypsinogen, glycer-
`aldehyde 3-phosphate dehydrogenase, clupeine insulin,
`and ferrodixin families of proteins. By combining sever-
`al quite different families,
`they have obtained an ac-
`count of the selective pressures on proteins in general
`rather than in specific instances. The measure of Day-
`hoff et al. thus provides a matrix (o.), where the ele-
`ments have the properly that ¢, > o,, if amino acids i
`and j appear more similar to each other from the view-
`point of evolutionary pressure than amino acids k and 1.
`Dayhoff et al. have noted that this similarity data
`points naturally to a classification of amino acids into
`5 groups:
`
`Hydrophilic: Ala, Pro, Gly, Glu, Asp, Gln, Asn, Ser,
`Thr
`Cys
`Sulphydryl:
`Aliphatic: Val, Me, Leu, Met
`Basic:
`Lys, Arg, His
`Aromatic: Phe, Tyr, Trp.
`
`Here we extend Dayhoffet al.’s analysis through the
`Statistical technique of multidimensional scaling. We re-
`fine their grouping and show that this new groupingcor-
`responds to a very high degree with one deduced by
`Robson and Suzuki (1976). This latter classification
`grouped amino acid residues according to their tenden-
`cies to be involved in different forms of secondary struc-
`ture. This correspondence between the two classifica-
`tions is the first objective evidence from substitution
`probabilities for the reasonable conjecture that natural
`selection strongly favours
`the maintenance of
`the
`intrinsic
`stability of
`secondary structure
`features.
`
`Method
`
`Dayhoff et al. (1972) gave a similarity matrix, an ordering of ele-
`ments reflecting the ordering of pairwise similarities between ob-
`jects, here amino acid residues. The occurrence of such data is
`commonplace in psychology and sociology. Within those disci-
`plines a family of statistical techniques, known collectively as
`multidimensional scaling, have been developed to explore and
`analyse similarity matrices. Surveys of these methods may be
`found in Shepard (1974), Shepard et al. (1972) and Sibson
`(1972). Briefly, the similarity matrix of Dayhoff et al.
`is ana-
`lysed as follows. Using iterative optimisation techniques de-
`scribed in Kruskal (1964) and Guttman (1968) a set of 20 points
`(one for each amino acid residue is found in m dimensions such
`that nearer two points are, the more similar are the correspond-
`ing amino acids to evolutionary pressures. Essentially,
`this is
`comparable to finding the geographical distribution of towns
`from only an ordering of the (approximately determined)
`intertown distances (Kendal! (1971)) and, furthermore, without
`knowing that the solution is two dimensional. More precisely,
`the optimisation is a best fit (in a particular least squares sense
`Kruskal (1964)) of the interpoint distances to the negatives of
`the measures of Dayhoffet al. asking only that
`
`d;; > diy <=> %%5 <O1,
`
`MSNExhibit 1026 - Page 2 of 5
`MSNv. Bausch - IPR2023-00016
`
`Where the d are the interpoint distances corresponding to the
`o. Because this demands only that the o are reasonably ordered
`and does not assume any functional relationship between the
`dij and oy, this method is known to be very robust.
`
`Results
`
`A representation was readily obtained in two dimensions
`without any evidence that the use of a higher dimensio"
`would display any further information (Fig. 1) (the ob-
`tained stress (Kruksal 1964) was 9% and the Monté
`Carlo test procedure of Spence and Graef (1974) sus
`gested clearly that a 2-dimensional representation w%
`adequate).
`Since the optimisation technique underlying multi-
`dimensional scaling is iterative, it requires an initial co™
`figuration. To avoid any possible bias we started with
`ten district random configurations. In each case the 1&
`sult converged to one with no significant differenc®
`from the one shown in Figure 1.
`As expected (Dayhoff et al. (1972), Dickerson and
`Geiss (1969)) conservation of the hydrophobic nature 9
`the residue is the most visually apparent feature. Al
`points lie fairly close to a curve,
`the distance along
`which (from charged sidechain such as lysine, arginin®
`histidine to nonpolar aromatic residues,
`tyrosine a?
`tryptophan) correlated well visually with increasiné
`hydrophobicity. However the “horse shoe” shape of the
`curve also suggests a property of secondary importanc®:
`namely bulk which increases towards the right of thé
`diagram. These two properties, hydrophobicity and
`bulk, are the only two amino acid properties that ca"
`be clearly seen to vary systematically along a trajectory»
`linear or otherwise, on the diagram. An automatic search
`for other properties of importance was also undertake
`by analysis of the variation of many properties (Jungek
`(1978)) using a method developed by Carroll (1972)
`using a program in the MDS(X) suite for multidime™
`sional
`scaling. However
`this failed to discover any
`further systematic variation. Thus it appears that the
`representation can answer the very general question:
`what amino acid properties tend to be conserved i?
`evolution. Hydrophobicity and molecular bulk are thé
`ones that we observed.
`However the innovation in the diagram is that it c4*
`answer more specific questions than that general on
`Namely, what amino acid properties are conserved 1"
`evolutionary change starting from a specific amino acid?
`Closer inspection of Figure 1
`reveals features that at
`first glance seem curious. For example,the proximity ©
`glycine and proline and of alanine and glutamic acid 4
`the left side of the diagram is quite inconsistent wit
`their bulk or degree of hydrophobicity. However, thes¢
`apparent anomalies are in the nature of groupings whict
`are strikingly similar to those obtained by Robson 2!
`Suzuki (1976) who undertook a clustering analysis ©
`
`

`

`173
`
`
`
`
`
` DIRECTION OF IWGREASING
`MOLECULAR
`VOLUME
`
`potter
`
`~ ~
`
`se
`
`Fithosepyodimensional scaling plot (see text) of the odds-relatedness matrix of Dayhoff et al. (1972). The symbols correspond to
`esidts obson and Suzuki (1976). ¢ Hydrophobic residues; ° Hydrophobic residues which have ability to form hydrogen bonds;
`Fo Gh which may receive or donate hydrogen bonds; © Residues which may receive and donate hydrogen bonds; = Gly; # Pro;
`ni uso His (see text), The numbers associated with each amino acid are their hydrophobicities as given by Levitt (1976). The
`Wo j
`lcated directions give general trends in the diagram of increasing molecular weight and volume. Note that the axes have no
`MOP signif:
`aa
`:
`a
`aes
`' Significance in this technique, much as would also arise if a map of Britain were constructed from tables of distances between
`to
`.
`es and villages. That is to say, such a distance table contains no information about North-South, East-West axes (though the
`direr,
`tie of South might be deduced ¢ posterior in the grounds that a warmer climate encouraged habitation; in a similar way the
`Proper
`€s of importance are deduced above leaving in mind that important trends need not belinear or even lie on a curve)
`
`UCe amino acid residues in a space whose dimensions
`omningwine to the helix, extended chain, and coil
`atson Power of a residue. The work took no account
`rOteing of evolutionary
`relationships between
`Conky 8 or residues, but only between sequence and
`Son inReo Their figures are reproduced for compari-
`mo
`Ig. 2. The similar groupings obtained in Fig, 1
`Nstrate for
`the first
`time that
`the preferences
`a ferent
`types of backbone conformational (sec-
`i Dow Structure are also a property of considerable
`Mutation for evolutionary pressures on amino acid
`effect a As discussed by Robson and Suzuki, this
`tions ins as a result of sidechain-backbone interac-
`Senge of. way largely determined by the nature or ab-
`G
`hydrogen bonding groups in the sidechain.
`lyojne, Proline, alanine and glutamic acid were treated
`
`as special cases on the groundsof special stereochemical
`effects and this is strongly supported by the present
`study.
`interesting differences. These
`There are, however,
`authors classified residues according to whether side-
`chains were non-hydrogen bonding(filled circles in Fig.
`1), could receive and donate a hydrogen bond (crossed
`circles), or could receive or donate a hydrogen bond
`(open circles). Histidine is 10% protonated at neutral
`pH and with reservations was assigned to the group of
`residues whose sidechains can both receive and donate
`hydrogen bond. From this point of view of evolutionary
`pressures, Fig. 1 placesit firmly alongside lysine andar-
`ginine, which are close to fully charged at neutral pH.
`Indeed, it reveals that evolutionary pressure places much
`greater emphasis on whether a sidechain is negatively
`MSN Exhibit 1026 - Page 3 of 5
`MSNv. Bausch - IPR2023-00016
`
`

`

`174
`
`
`
`Pleatedsheet
`
`Tur
`
`information Helix information
`imormatian -8
`
`-6
`
`-2
`-4
`Pleated sheet
`
`2
`0
`information
`
`4
`
`6
`
`B
`
`«10
`
`MSNExhibit 1026 - Page 4 of 5
`MSNv. Bausch - IPR2023-00016
`
`
`
`Turnintormation
`
`:, Helix information
`(decinats) -6
`Reverseturninformation
`
`
`
`4
`
`4
`2
`0
`-2
`Coil information (decinats)
`
`6
`
`‘
`:
`.
`Fig. 2. The groupings of Robson and Suzuki based on conformational tendencies and physicochemical properties alone, i.e. withou
`reference to comparison of homologous sequences. Symbols as Fig. 1
`
`t
`
`charged (glutamate and aspartate) or positively charged
`(lysine, arginine, histidine) than did the tentative assign-
`ments of Robson and Suzuki based on clustering analysis
`and the sidechain-backbone hydrogen bonding inter-
`actions. Cysteine also deviates from the largely non-
`polar but weakly hydrogen bonding group to whichit
`was assigned by the cluster analysis of Robson and
`Suzuki, but this may be expected from the point of view
`of evolutionary pressure because of its special role in
`forming covalent disulphide bridges
`in some cases.
`Use of tables of substitution distances obtained inde-
`pendently for intracellular and extracellular proteins
`might well clarify this point, though this would depart
`from the idea of seeking ‘‘gross” global determinants of
`
`‘substitution frequencies independent of any kind of
`family grouping, and independent of any specific inter
`actions peculiar to a protein class. On the whole, ho¥”
`ever,
`the agreement
`is remarkable and this illustrates
`the value of multidimensional scaling in revealing pa
`terns which may be meaningful to the observer.
`The similarity between alanine and glutamate (a8)
`and proline and glycine (PG) in terms of substitut!?
`distances may: seem surprising in view of the fact th?
`the former are strong helix formers, the latter stron?
`helix breakers. A preiliminary view might be that mole?
`ular bulk dominates here, perhaps along with a
`physical properties of less general importance. Howeve®
`other types of secondary structure tendency must
`
`

`

`extents and it may be that the ability to disrupt
`duce1 ed (primarily plented sheet) structure, to intro-
`is of Ocal bends in it, and to demarcate its boundaries,
`are no Prime evolutionary importance. These aspects
`W underinvestigation.
`
`Conclusions
`(ypution of proteins in general has tended to conserve
`confor degree of hydrophobicity of a residue, (2) the
`Tmational preferences of its backbone and (3) its
`to xAl these are continuous properties and the extent
`a
`ich a substitution is conservative is correspondingly
`ioe_Of degree. Since the maximum distance in
`m 1 is between glycine and tryptophan, changesbe-
`. on Tesidues at less than one third this distance might
`tioeehiently classified as ‘““good”’ conservative substi-
`Serve Because most substitutions which would con-
`constine involve greater distances in Fig. i and indeed
`nates ute bad” conservative substitution, the domi-
`}
`of importance seems to be in the order given
`ve, with bulk playing a subserviantif significantrole.
`work emphasizes the value of multidimensional
`Scaling
`in reaching conclusions without any initial
`@ prion;
`assumptions. Jorre and Currow (1975) have ap-
`Dlieg
`anal
`the technique to a similar problem but
`their
`the4 a strong theoretical input which modelled
`onf Prior beliefs about relationships. Moreover it was
`Vtech.
`to a single well-defined protein family,
`the
`olution” © group, and therefore considered only
`1Onary pressures relating to the structure, stability
`function of cytochrome c. Hence they arrived at
`ferent different conclusions and answered a dif-
`agai
`question. The advantage of the present work is,
`n, that it applies to the conversation of substitutions
`ef eens in general, using extensive data from which
`ve S$ peculiar to conformations of specific families
`Presumably been almost entirely averaged out.
`AgnOwledgements. The programsin our analysis were from the
`th
`) Package developed by A.P.M. Coxon and funded by
`“cial Science Research Council. Computing facilities were
`Pro
`j
`entnes by the University of Manchester Regional Computer
`
`175
`
`One of us (SF) is most grateful to Dr. C.C.F. Blake for
`encouraging him to work in this area. The other (BR) is grateful
`for S.R.C. funding relevant ta the discovery of properties relat-
`ing to protein folding simulations.
`After preparation of this manuscript Dr. W. Taylor has drawn
`to our attention to his very similar results and conclusions
`independently obtained (Taylor 1982). We are grateful to him
`for useful discussions.
`
`References
`
`Carroll ID (1972) Individual differences and multidimensional
`scaling. In: Shepard RN, Romney AK, Nerlove SB, Multi-
`dimensional
`scaling: Theory
`and Applications
`in the
`Behavioural sciences, Seminar Press, London, pp 105—155
`Dayhoff MO, Eck RV, Park CM (1972) A model of evolutionary
`change in proteins. In: Dayhoff MO (ed) Atlas of protein
`sequence
`and structure, National Biomedical Research
`Foundation, Georgetown University, Washington DC, pp
`89~99
`Dickerson KE, Geis I (1969) The structure and action of pro-
`teins. Harper and Row, New York
`Guttmann L (1968) A general non-metric technique for finding
`the smallest co-ordinate space for a configuration of points.
`Psychometrika 33:469-—506
`Jorre RP, Curnow RN (1975) A model for the evolution of
`proteins. Biochimie 57:1147-1156
`Jungck JR (1978) The genetic code as a periodic table. J Mol
`Evol 11:211-224
`Kendall DG (1971) Construction of maps from oddbits of infor-
`mation. Nature 231:158—-159
`Kruskal JB (1964) Non-metric multidimensional scaling. Psycho-
`metrika 29:1—-27
`Levitt M (1976) A simplified representation of protein confor-
`mations for rapid simulation of protein folding. J Mol Biol
`104:59~107
`Robson B, Suzuki E (1976) Conformational properties of amino
`acid residues in globular proteins. J Mol Biol 107:327-—356
`Shepard RN (1974) Representation of structure in similarity
`data: problems and prospects. Psychometrika 39:373—421
`Shepard RN, Romney AK, Nerlove SB (1972) Multidimensional
`scaling: Theory and applications in the behavioural sciences.
`Vols I and IL. Seminar Press, London
`Sibson R (1972) Order in variant methods for data analysis (with
`discussion) J Roy Statist Soc B34:31 1-349
`Spence I, Graef J (1974) The determination of the underlying di-
`mensionality of an empirically obtained matrix of proxi-
`mities. Multivariate Behavioural Research 9:331—342
`Taylor, W (1982) Private Communication
`
`Received July 20/Accepted November1, 1982
`
`MSNExhibit 1026 - Page 5 of 5
`MSNv. Bausch - IPR2023-00016
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket