`
`MYLAN EXHIBIT - 1026
`Mylan Pharmaceuticals, Inc. v. Bausch Health Ireland, Ltd. - IPR2022-00722
`
`
`
`172 globin, myoglobin, virus coat, chymotrypsinogen, glycer- aldehyde 3-phosphate dehydrogenase, clupeine insulin, and ferrodixin families of proteins. By combining sever- al quite different families, they have obtained an ac- count of the selective pressures on proteins in general rather than in specific instances. The measure of Day- hoff et al. thus provides a matrix (oij), where the ele- ments have the properly that on> Okl if amino acids i and j appear more similar to each other from the view- point of evolutionary pressure than amino acids k and 1. Dayhoff et al. have noted that this similarity data points naturally to a classification of amino acids into 5 groups: Hydrophilic: Ala, Pro, Gly, Glu, Asp, Gin, Asn, Ser, Thr Sulphydryl: Cys Aliphatic: Val, Ile, Leu, Met Basic: Lys, Arg, His Aromatic: Phe, Tyr, Trp. Here we extend Dayhoff et al.'s analysis through the statistical technique of multidimensional scaling. We re- fine their grouping and show that this new grouping cor- responds to a very high degree with one deduced by Robson and Suzuki (1976). This latter classification grouped amino acid residues according to their tenden- cies to be involved in different forms of secondary struc- ture. This correspondence between the two classifica- tions is the first objective evidence from substitution probabilities for the reasonable conjecture that natural selection strongly favours the maintenance of the intrinsic stability of secondary structure features. Method Dayhoff et al. (1972) gave a similarity matrix, an ordering of ele- ments reflecting the ordering of pairwise similarities between ob- jects, here amino acid residues. The occurrence of such data is commonplace in psychology and sociology. Within those disci- plines a family of statistical techniques, known collectively as multidimensional scaling, have been developed to explore and analyse similarity matrices. Surveys of these methods may be found in Shepard (1974), Shepard et al. (1972) and Sibson (1972). Briefly, the similarity matrix of Dayhoff et al. is ana- lysed as follows. Using iterative optimisation techniques de- scribed in Kruskal (1964) and Guttman (1968) a set of 20 points (one for each amino acid residue is found in m dimensions such that nearer two points are, the more similar are the correspond- ing amino acids to evolutionary pressures. Essentially, this is comparable to finding the geographical distribution of towns from only an ordering of the (approximately determined) intertown distances (Kendall (1971)) and, furthermore, without knowing that the solution is two dimensional. More precisely, the optimisation is a best fit (in a particular least squares sense K.ruskal (1964)) of the interpoint distances to the negatives of the measures of Dayhoff et al. asking only that <=> a.. < dij > dk| tj ~ Where the d are the interpoint distances corresponding to the o. Because this demands only that the a are reasonably ordered and does not assume any functional relationship between the dij and aij , this method is known to be very robust. Results A representation was readily obtained in two dimensions without any evidence that the use of a higher dimension would display any further information (Fig. 1) (the ob- tained stress (Kruksal 1964) was 9% and the Monte Carlo test procedure of Spence and Graef (1974) sug" gested clearly that a 2-dimensional representation was adequate). Since the optimisation technique underlying multi" dimensional scaling is iterative, it requires an initial con" figuration. To avoid any possible bias we started with ten district random configurations. In each case the re- sult converged to one with no significant difference from the one shown in Figure 1. As expected (Dayhoff et al. (1972), Dickerson and Geiss (1969)) conservation of the hydrophobic nature of the residue is the most visually apparent feature. All points lie fairly close to a curve, the distance along which (from charged sideehain such as lysine, arginine, histidine to nonpolar aromatic residues, tyrosine and tryptophan) correlated well visually with increasing hydrophobicity. However the "horse shoe" shape of the curve also suggests a property of secondary importance, namely bulk which increases towards the right of the diagram. These two properties, hydrophobicity and bulk, are the only two amino acid properties that can be dearly seen to vary systematically along a trajectorY, linear or otherwise, on the diagram. An automatic search for other properties of importance was also undertaken by analysis of the variation of many properties (Jung ek (1978)) using a method developed by Carroll (1972) using a program in the MDS(X) suite for multidime~" sional scaling. However this failed to discover any further systematic variation. Thus it appears that the representation can answer the very general question: what amino acid properties tend to be conserved i9 evolution. Hydrophobicity and molecular bulk are the ones that we observed. However the innovation in the diagram is that it c~ answer more specific questions than that general one. Namely, what amino acid properties are conserved i~ evolutionary change starting from a specific amino acid? Closer inspection of Figure 1 reveals features that at first glance seem curious. For example, the proximity of glycine and proline and of alanine and ghitamic acid at the left side of the diagram is quite inconsistent witll their bulk or degree of hydrophobicity. However, these apparent anomalies are in the nature of groupings wlaicl~ are strikingly similar to those obtained by Robson and Suzuki (1976) who undertook a clustering analysis of
`
`
`
`173 i I i ikP I 1 oc t.I 0-053 |IILECIION Of IIfCIIEA$1N8 HOLECIIL AR VOLUHE o;, q X~a.---_" ...... f-- -_ :.: / % oW 3"4 ~ . 1. Two-dimensional scaling plot (see text) of the odds-relatedness matrix of Dayhoff et al. (1972). The symbols correspond to ~se of Robson and Suzuki (1976). (cid:12)9 Hydrophobic residues; o Hydrophobic residues which have ability to form hydrogen bonds; esldues which may receive or donate hydrogen bonds; o Residues which may receive and donate hydrogen bonds; m Gly; A Pro; tukJa, o Glu ; o His (see text). The numbers associated with each amino acid are their hydrophobicities as given by Levitt (1976). The a;~.m.dic.ated directions give general trends in the diagram of increasing molecular weight and volume. Note that the axes have no all norz significance in this technique, much as would also arise if a map of Britain were constructed from tables of distances between tlirt~Wns and villages. That is to say, such a distance table contains no information about North-South, East-West axes (though the br~ton of South might be deduced a posterior in the grounds that a warmer climate encouraged habitation; in a similar way the ~l~erties of importance are deduced above leaving in mind that important trends need not be linear or even lie on a curve) the 20 amino acid residues in a space whose dimensions Corresponding forming power to the helix, extended chain, and coil of a residue. The work took no account Whatsoever of evolutionary relationships between Proteins or residues, but only between sequence and conformation. Their figures are reproduced for compari- sd:n in Fig. 2. The similar groupings obtained in Fig. 1 ~raOnstrate for the first time that the preferences ,ur differen t types of backbone conformational (sec- %dary) structure are also a property of considerable l~POrtance for evolutionary pressures on amino acid r~Utati~ As discussed by Robson and Suzuki, this effect arises as a result of sidechain-backbone interac- tions in a way largely determined by the nature or ab- Sence of ~1. (cid:12)9 hydrogen bonding groups in the sidechain. YCne, proline, alanine and glutamic acid were treated as special cases on the grounds of special stereochemical effects and this is strongly supported by the present study. There are, however, interesting differences. These authors classified residues according to whether side- chains were non-hydrogen bonding (filled circles in Fig. 1), could receive and donate a hydrogen bond (crossed circles), or could receive or donate a hydrogen bond (open circles). Histidine is 10% protonated at neutral pH and with reservations was assigned to the group of residues whose sidechains can both receive and donate hydrogen bond. From this point of view of evolutionary pressures, Fig. 1 places it firmly alongside lysine and ar- ginine, which are close to fully charged at neutral pH. Indeed, it reveals that evolutionary pressure places much greater emphasis on whether a sidechain is negatively
`
`
`
`174 n -el I I ( I I I I I /' s 0 ' ', " ' , @i ~ O~ / , ~o ' \ E| I I I I I I 1 1 ,, Helix infocrvMlt ion E _s -2 I- -4 i -6 8 ~ G +/ ~ +-" ,, x x . x\ t ~ t I I I I 1 1 -o -6 -4 -2 o 2 4 6 O HetlX informaion 10 I I I I I 1 I I I -4 -6 40 / Iio ,, / +" ~_Tt y t ':Y\ "-',-6,, I ) / ', oO/ I I I i I I I I -8 -~ -4 -2 o 4 6 B -I0 PII~I(~I lhl~t inforl"r~tlor~ /IB C 0 IJ C ,9 .c c IZ B I I I I I ~,.~,~, ~\~ / N I 2 ( ~0 "" ~,"0 ; ~J (cid:12)9 S t \x O ~// / "'-'//lOT / 0 // I / / / / -2 - IC~ / \ o 8/ I ~ I I I -6 -.4 -2 0 2 4 Coil informahon (decinat$) Fig. 2. The groupings of Robson and Suzuki based on conformational tendencies and physicochemical properties alone, i.e. without reference to comparison of homologous sequences. Symbols as Fig. 1 charged (glutamate and aspartate) or positively charged 0ysine, arginine, histidine) than did the tentative assign- ments of Robson and Suzuki based on clustering analysis and the sidechain-backbone hydrogen bonding inter- actions. Cysteine also deviates from the largely non- polar but weakly hydrogen bonding group to which it was assigned by the cluster analysis of Robson and Suzuki, but this may be expected from the point of view of evolutionary pressure because of its special role in forming covalent disulphide bridges in some cases. Use of tables of substitution distances obtained inde- pendently for intracellular and extracellular proteins might well clarify this point, though this would depart from the idea of seeking "gross" global determinants of substitution frequencies independent of any kind of family grouping, and independent of any specific ilater" actions peculiar to a protein class. On the whole, how" ever, the agreement is remarkable and this illustrates the value of multidimensional scaling in revealing pitt" terns which may be meaningful to the observer. The similarity between alanine and glutamate (Ate) and proline and glycine (PG) in terms of substituti~ distances may seem surprising in view of the fact that the former are strong helix formers, the latter strong helix breakers. A preiliminary view might be that moleC" ular bulk dominates here, perhaps along with a few physical properties of less general importance. HoWever' other types of secondary structure tendency must b~
`
`
`
`175 COnsidered, and it may be that the ability to disrupt extended (primarily plented sheet) structure, to intro- duce local bends in it, and to demarcate its boundaries, is of prime evolutionary importance. These aspects are now under investigation. Conclusions One of us (SF) is most grateful to Dr. C.C.F. Blake for encouraging him to work in this area. The other (BR) is grateful for S.R.C. funding relevant to the discovery of properties relat- ing to protein folding simulations. After preparation of this manuscript Dr. W. Taylor has drawn to our attention to his very similar results and conclusions inttependently obtained (Taylor 1982). We are grateful to him for useful discussions. EVolution of proteins in general has tended to conserve (1) the degree of hydrophobicity of a residue, (2) the e~ preferences of its backbone and (3) its bulk. All these are continuous properties and the extent to which a substitution is conservative is correspondingly a matter of degree. Since the maximum distance in Figure 1 is between glyeine and tryptophan, changes be- tween residues at less than one third this distance might be conveniently classified as "good" conservative substi- tutions. Because most substitutions which would con- Serve bulk involve greater distances in Fig. 1 and indeed COnstitute "bad" conservative substitution, the domi- nance of importance seems to be in the order given above, with bulk playing a subserviant if significant role. This work emphasizes the value of multidimensional Scaling in reaching conclusions without any initial a Pri~ assumptions. Jorre and Curfew (1975)have ap- Plied the technique to a similar problem but their analysis had a strong theoretical input which modelled their prior beliefs about relationships. Moreover it was r to a single well-defined protein family, the r c group, and therefore considered only evolutionary pressures relating to the structure, stability and function of cytochrome c. Hence they arrived at ~3naewhat different conclusions and answered a dif- ferent question. The advantage of the present work is, again, that it applies to the conversation of substitutions of Proteins in general, using extensive data from which effect s have peculiar to conformations of specific families presumably been almost entirely averaged out. ~~ The programs in our analysis were from the th '~tX) package developed by A.P.M. Coxon and funded by e Social Scl 1 Corn utmg fae~htles were (cid:12)9 'enee Research Counci. p .... PrOVided by the University of Manchester Regional Computer Centre(cid:12)9 References Carter JD (1972) Individual differences and multidimensional scaling. In: Shepard RN, Romney AK, Nerlove SB, Multi- dimen~onal scaling: Theory and Applications in the Behavioural sciences. Seminar Press, London, pp 105-155 Dayhoff Me, Eek RV, Park CM (1972) A model of evolutionary change in proteins. In: Dayhoff Me (ed) Atlas of protein sequence and structure. National Biomedical Research Foundation, Georgetown University, Washington DC, pp 89-99 Diekerson KE, Geis I (1969) The structure and action of pro- teins. Harper and Row, New York Guttmann L (1968) A general non-metric technique for finding the smallest co-ordinate space for a configuration of points. Psychomettika 33:469-506 Jor~e RP, Curnow RN (1975) A model for the evolution of proteins. Biochimie 57:1147-I 156 Jungck JR (1978) The genetic code as a periodic table. J Mol Evol 11:211-224 KendaU DG (1971) Construction of maps from odd bits of infor- mation. Nature 231:158-159 Kruskal JB (1964) Non-metric multidimensional scaling. Psycho- metrika 29:1-27 Levitt M (1976) A simplified representation of protein confor- mations for rapid simulation of protein folding. J Mol Biol 104:59-107 Robson B, Suzuki E (1976) Conformational properties of amino acid residues in globular proteins. J Mol Biol 107:327-356 $hepaxd RN (1974) Representation of structure in similarity data: problems and prospects. Psychometrika 39:373-421 Shepard RN, Romney AK, Nerlove SB (1972) Multidimensional sealing: Theory and applications in the bohavioural sciences. Vols I and II. Seminar Press, London Sibson R (1972) Order in variant methods for data analysis (with discussion) J Roy Statist See B34:311-349 Spence I, Graef J (1974) The determination of the underlying di- mensionality of an empirically obtained matrix of proxi- mities. Multivariate B ehavioural Research 9:331-342 Taylor, W (1982) Private Communication Received July 20/Accepted November 1,1982
`
`