throbber
I Ir Evol (1983) 19:171-1"~5 Journal of Molecular Evolution ~) Springer-Verlag 1983 What is a Conservative Substitution? Simon French 1 and Barry Robson 2 113 2 epartment of Decision Theory, University of Manchester, Manchester, M13 9PL, Great Britain bepattment of Biochemistry, University of Manchester, Manchester, M13 9PL, Great Britain SUnaraary. It is commonly recognised that many evolu- tionary changes of amino acid sequence in proteins are r176 a substitution of one amino acid residue for another has a far greater chance of being accepted if the two residues are similar in properties. Here we investigate what properties are most important in determining the similarity of two amino acids, from the evolutionary point of view. Our results confirm earlier observations that the hydrophobicity and the raolecular bulk of the side chain tend to be conserved. MOre importantly they also show that evolutionary Pressures favour the conservation of secondary structure, and that all these properties can be arranged in a two tlirneasional diagram in which distances well preserve the observed substitution frequencies between amino acids. These results were obtained by a multi-dimen- Siortal scaling technique; and are independent of any Prior opinions about conserved properties. Thus, it is demonstrated that all relations of importance to single a~aino acid substitutions can be represented by a single figure, which is much more comprehensible and useful i? the usual tabular representation of substitution equencies. Such a figure conveniently portrays the Stereoche ,, rvatwe substitution mical code for conse ' ' " . Key tion Words: Amino acid substitution - Protein evolu- Conservation of secondary structure - Hydro- Phobicity _ Bulk - Multidimensional scaling Introduction ~bayhoff et al. (1972) have collated much data concern- ~ag the amino acid sequences of proteins. From their ~VOrk and that of others it is apparent that natural O[~rint requests to: B. Robson selection has favoured changes in protein sequence in which certain physical and chemical properties of resi- dues are conserved ('conservative substitution'). The question that concerned us was whether the relevant properties of have been properly and completely identi- fied. We have been brought to the conclusion that inter- esting details have not been appreciated, except per- haps by inspection of similar sequences which does not allow all the significant properties to be considered together in quantitative, objective, and useful manner. We have sought a more objective, quantitative approach, finally using a data-analytic method not widely exploit- ed by evolutionary molecular biologists; this study there- fore serves also to bring tiffs technique to their attention. The topic of conservative substitutions is of interest for three reasons. First, there is the obvious need for such information in order to consider the similarity and relatedness of sequences. Second, it has often been hy- pothesised that natural selection pressures mainly favour the conservation of 3-dimensional structure while aUow- ing for extensive substitution (cf. 'neutralist theories'). If this is so, the few chemical and physical properties conserved would presumably be those that most gener- ally determine the 3-dimensional structure of a protein. Third, such knowledge is a pointer to which properties of residues must be modelled in computer simulation of protein folding. In speaking of 3-dimensional structure, we include the secondary structure on which the gross 3-dimensional structure depends. Our analysis starts from that of Dayhoff et al. (1972) (their table 9.10) which constitutes a 'relatedness odds matrix'. The dements of this matrix give the ratio of two probabilities: the probability that two residues at the same locus in two proteins are the consequence of common ancestry, and the probability that the relation occurred only by chance. The data were derived from comparing sequences within the cytochrome c, haemo-
`
`MYLAN EXHIBIT - 1026
`Mylan Pharmaceuticals, Inc. v. Bausch Health Ireland, Ltd. - IPR2022-00722
`
`

`

`172 globin, myoglobin, virus coat, chymotrypsinogen, glycer- aldehyde 3-phosphate dehydrogenase, clupeine insulin, and ferrodixin families of proteins. By combining sever- al quite different families, they have obtained an ac- count of the selective pressures on proteins in general rather than in specific instances. The measure of Day- hoff et al. thus provides a matrix (oij), where the ele- ments have the properly that on> Okl if amino acids i and j appear more similar to each other from the view- point of evolutionary pressure than amino acids k and 1. Dayhoff et al. have noted that this similarity data points naturally to a classification of amino acids into 5 groups: Hydrophilic: Ala, Pro, Gly, Glu, Asp, Gin, Asn, Ser, Thr Sulphydryl: Cys Aliphatic: Val, Ile, Leu, Met Basic: Lys, Arg, His Aromatic: Phe, Tyr, Trp. Here we extend Dayhoff et al.'s analysis through the statistical technique of multidimensional scaling. We re- fine their grouping and show that this new grouping cor- responds to a very high degree with one deduced by Robson and Suzuki (1976). This latter classification grouped amino acid residues according to their tenden- cies to be involved in different forms of secondary struc- ture. This correspondence between the two classifica- tions is the first objective evidence from substitution probabilities for the reasonable conjecture that natural selection strongly favours the maintenance of the intrinsic stability of secondary structure features. Method Dayhoff et al. (1972) gave a similarity matrix, an ordering of ele- ments reflecting the ordering of pairwise similarities between ob- jects, here amino acid residues. The occurrence of such data is commonplace in psychology and sociology. Within those disci- plines a family of statistical techniques, known collectively as multidimensional scaling, have been developed to explore and analyse similarity matrices. Surveys of these methods may be found in Shepard (1974), Shepard et al. (1972) and Sibson (1972). Briefly, the similarity matrix of Dayhoff et al. is ana- lysed as follows. Using iterative optimisation techniques de- scribed in Kruskal (1964) and Guttman (1968) a set of 20 points (one for each amino acid residue is found in m dimensions such that nearer two points are, the more similar are the correspond- ing amino acids to evolutionary pressures. Essentially, this is comparable to finding the geographical distribution of towns from only an ordering of the (approximately determined) intertown distances (Kendall (1971)) and, furthermore, without knowing that the solution is two dimensional. More precisely, the optimisation is a best fit (in a particular least squares sense K.ruskal (1964)) of the interpoint distances to the negatives of the measures of Dayhoff et al. asking only that <=> a.. < dij > dk| tj ~ Where the d are the interpoint distances corresponding to the o. Because this demands only that the a are reasonably ordered and does not assume any functional relationship between the dij and aij , this method is known to be very robust. Results A representation was readily obtained in two dimensions without any evidence that the use of a higher dimension would display any further information (Fig. 1) (the ob- tained stress (Kruksal 1964) was 9% and the Monte Carlo test procedure of Spence and Graef (1974) sug" gested clearly that a 2-dimensional representation was adequate). Since the optimisation technique underlying multi" dimensional scaling is iterative, it requires an initial con" figuration. To avoid any possible bias we started with ten district random configurations. In each case the re- sult converged to one with no significant difference from the one shown in Figure 1. As expected (Dayhoff et al. (1972), Dickerson and Geiss (1969)) conservation of the hydrophobic nature of the residue is the most visually apparent feature. All points lie fairly close to a curve, the distance along which (from charged sideehain such as lysine, arginine, histidine to nonpolar aromatic residues, tyrosine and tryptophan) correlated well visually with increasing hydrophobicity. However the "horse shoe" shape of the curve also suggests a property of secondary importance, namely bulk which increases towards the right of the diagram. These two properties, hydrophobicity and bulk, are the only two amino acid properties that can be dearly seen to vary systematically along a trajectorY, linear or otherwise, on the diagram. An automatic search for other properties of importance was also undertaken by analysis of the variation of many properties (Jung ek (1978)) using a method developed by Carroll (1972) using a program in the MDS(X) suite for multidime~" sional scaling. However this failed to discover any further systematic variation. Thus it appears that the representation can answer the very general question: what amino acid properties tend to be conserved i9 evolution. Hydrophobicity and molecular bulk are the ones that we observed. However the innovation in the diagram is that it c~ answer more specific questions than that general one. Namely, what amino acid properties are conserved i~ evolutionary change starting from a specific amino acid? Closer inspection of Figure 1 reveals features that at first glance seem curious. For example, the proximity of glycine and proline and of alanine and ghitamic acid at the left side of the diagram is quite inconsistent witll their bulk or degree of hydrophobicity. However, these apparent anomalies are in the nature of groupings wlaicl~ are strikingly similar to those obtained by Robson and Suzuki (1976) who undertook a clustering analysis of
`
`

`

`173 i I i ikP I 1 oc t.I 0-053 |IILECIION Of IIfCIIEA$1N8 HOLECIIL AR VOLUHE o;, q X~a.---_" ...... f-- -_ :.: / % oW 3"4 ~ . 1. Two-dimensional scaling plot (see text) of the odds-relatedness matrix of Dayhoff et al. (1972). The symbols correspond to ~se of Robson and Suzuki (1976). (cid:12)9 Hydrophobic residues; o Hydrophobic residues which have ability to form hydrogen bonds; esldues which may receive or donate hydrogen bonds; o Residues which may receive and donate hydrogen bonds; m Gly; A Pro; tukJa, o Glu ; o His (see text). The numbers associated with each amino acid are their hydrophobicities as given by Levitt (1976). The a;~.m.dic.ated directions give general trends in the diagram of increasing molecular weight and volume. Note that the axes have no all norz significance in this technique, much as would also arise if a map of Britain were constructed from tables of distances between tlirt~Wns and villages. That is to say, such a distance table contains no information about North-South, East-West axes (though the br~ton of South might be deduced a posterior in the grounds that a warmer climate encouraged habitation; in a similar way the ~l~erties of importance are deduced above leaving in mind that important trends need not be linear or even lie on a curve) the 20 amino acid residues in a space whose dimensions Corresponding forming power to the helix, extended chain, and coil of a residue. The work took no account Whatsoever of evolutionary relationships between Proteins or residues, but only between sequence and conformation. Their figures are reproduced for compari- sd:n in Fig. 2. The similar groupings obtained in Fig. 1 ~raOnstrate for the first time that the preferences ,ur differen t types of backbone conformational (sec- %dary) structure are also a property of considerable l~POrtance for evolutionary pressures on amino acid r~Utati~ As discussed by Robson and Suzuki, this effect arises as a result of sidechain-backbone interac- tions in a way largely determined by the nature or ab- Sence of ~1. (cid:12)9 hydrogen bonding groups in the sidechain. YCne, proline, alanine and glutamic acid were treated as special cases on the grounds of special stereochemical effects and this is strongly supported by the present study. There are, however, interesting differences. These authors classified residues according to whether side- chains were non-hydrogen bonding (filled circles in Fig. 1), could receive and donate a hydrogen bond (crossed circles), or could receive or donate a hydrogen bond (open circles). Histidine is 10% protonated at neutral pH and with reservations was assigned to the group of residues whose sidechains can both receive and donate hydrogen bond. From this point of view of evolutionary pressures, Fig. 1 places it firmly alongside lysine and ar- ginine, which are close to fully charged at neutral pH. Indeed, it reveals that evolutionary pressure places much greater emphasis on whether a sidechain is negatively
`
`

`

`174 n -el I I ( I I I I I /' s 0 ' ', " ' , @i ~ O~ / , ~o ' \ E| I I I I I I 1 1 ,, Helix infocrvMlt ion E _s -2 I- -4 i -6 8 ~ G +/ ~ +-" ,, x x . x\ t ~ t I I I I 1 1 -o -6 -4 -2 o 2 4 6 O HetlX informaion 10 I I I I I 1 I I I -4 -6 40 / Iio ,, / +" ~_Tt y t ':Y\ "-',-6,, I ) / ', oO/ I I I i I I I I -8 -~ -4 -2 o 4 6 B -I0 PII~I(~I lhl~t inforl"r~tlor~ /IB C 0 IJ C ,9 .c c IZ B I I I I I ~,.~,~, ~\~ / N I 2 ( ~0 "" ~,"0 ; ~J (cid:12)9 S t \x O ~// / "'-'//lOT / 0 // I / / / / -2 - IC~ / \ o 8/ I ~ I I I -6 -.4 -2 0 2 4 Coil informahon (decinat$) Fig. 2. The groupings of Robson and Suzuki based on conformational tendencies and physicochemical properties alone, i.e. without reference to comparison of homologous sequences. Symbols as Fig. 1 charged (glutamate and aspartate) or positively charged 0ysine, arginine, histidine) than did the tentative assign- ments of Robson and Suzuki based on clustering analysis and the sidechain-backbone hydrogen bonding inter- actions. Cysteine also deviates from the largely non- polar but weakly hydrogen bonding group to which it was assigned by the cluster analysis of Robson and Suzuki, but this may be expected from the point of view of evolutionary pressure because of its special role in forming covalent disulphide bridges in some cases. Use of tables of substitution distances obtained inde- pendently for intracellular and extracellular proteins might well clarify this point, though this would depart from the idea of seeking "gross" global determinants of substitution frequencies independent of any kind of family grouping, and independent of any specific ilater" actions peculiar to a protein class. On the whole, how" ever, the agreement is remarkable and this illustrates the value of multidimensional scaling in revealing pitt" terns which may be meaningful to the observer. The similarity between alanine and glutamate (Ate) and proline and glycine (PG) in terms of substituti~ distances may seem surprising in view of the fact that the former are strong helix formers, the latter strong helix breakers. A preiliminary view might be that moleC" ular bulk dominates here, perhaps along with a few physical properties of less general importance. HoWever' other types of secondary structure tendency must b~
`
`

`

`175 COnsidered, and it may be that the ability to disrupt extended (primarily plented sheet) structure, to intro- duce local bends in it, and to demarcate its boundaries, is of prime evolutionary importance. These aspects are now under investigation. Conclusions One of us (SF) is most grateful to Dr. C.C.F. Blake for encouraging him to work in this area. The other (BR) is grateful for S.R.C. funding relevant to the discovery of properties relat- ing to protein folding simulations. After preparation of this manuscript Dr. W. Taylor has drawn to our attention to his very similar results and conclusions inttependently obtained (Taylor 1982). We are grateful to him for useful discussions. EVolution of proteins in general has tended to conserve (1) the degree of hydrophobicity of a residue, (2) the e~ preferences of its backbone and (3) its bulk. All these are continuous properties and the extent to which a substitution is conservative is correspondingly a matter of degree. Since the maximum distance in Figure 1 is between glyeine and tryptophan, changes be- tween residues at less than one third this distance might be conveniently classified as "good" conservative substi- tutions. Because most substitutions which would con- Serve bulk involve greater distances in Fig. 1 and indeed COnstitute "bad" conservative substitution, the domi- nance of importance seems to be in the order given above, with bulk playing a subserviant if significant role. This work emphasizes the value of multidimensional Scaling in reaching conclusions without any initial a Pri~ assumptions. Jorre and Curfew (1975)have ap- Plied the technique to a similar problem but their analysis had a strong theoretical input which modelled their prior beliefs about relationships. Moreover it was r to a single well-defined protein family, the r c group, and therefore considered only evolutionary pressures relating to the structure, stability and function of cytochrome c. Hence they arrived at ~3naewhat different conclusions and answered a dif- ferent question. The advantage of the present work is, again, that it applies to the conversation of substitutions of Proteins in general, using extensive data from which effect s have peculiar to conformations of specific families presumably been almost entirely averaged out. ~~ The programs in our analysis were from the th '~tX) package developed by A.P.M. Coxon and funded by e Social Scl 1 Corn utmg fae~htles were (cid:12)9 'enee Research Counci. p .... PrOVided by the University of Manchester Regional Computer Centre(cid:12)9 References Carter JD (1972) Individual differences and multidimensional scaling. In: Shepard RN, Romney AK, Nerlove SB, Multi- dimen~onal scaling: Theory and Applications in the Behavioural sciences. Seminar Press, London, pp 105-155 Dayhoff Me, Eek RV, Park CM (1972) A model of evolutionary change in proteins. In: Dayhoff Me (ed) Atlas of protein sequence and structure. National Biomedical Research Foundation, Georgetown University, Washington DC, pp 89-99 Diekerson KE, Geis I (1969) The structure and action of pro- teins. Harper and Row, New York Guttmann L (1968) A general non-metric technique for finding the smallest co-ordinate space for a configuration of points. Psychomettika 33:469-506 Jor~e RP, Curnow RN (1975) A model for the evolution of proteins. Biochimie 57:1147-I 156 Jungck JR (1978) The genetic code as a periodic table. J Mol Evol 11:211-224 KendaU DG (1971) Construction of maps from odd bits of infor- mation. Nature 231:158-159 Kruskal JB (1964) Non-metric multidimensional scaling. Psycho- metrika 29:1-27 Levitt M (1976) A simplified representation of protein confor- mations for rapid simulation of protein folding. J Mol Biol 104:59-107 Robson B, Suzuki E (1976) Conformational properties of amino acid residues in globular proteins. J Mol Biol 107:327-356 $hepaxd RN (1974) Representation of structure in similarity data: problems and prospects. Psychometrika 39:373-421 Shepard RN, Romney AK, Nerlove SB (1972) Multidimensional sealing: Theory and applications in the bohavioural sciences. Vols I and II. Seminar Press, London Sibson R (1972) Order in variant methods for data analysis (with discussion) J Roy Statist See B34:311-349 Spence I, Graef J (1974) The determination of the underlying di- mensionality of an empirically obtained matrix of proxi- mities. Multivariate B ehavioural Research 9:331-342 Taylor, W (1982) Private Communication Received July 20/Accepted November 1,1982
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket