`With 5 text—figures
`Printed in Great Britain
`
`591
`
`An analysis of variance test for normality
`(complete samples)T
`
`BY S. S. SHAPIRO AND M. B. WILK
`
`General Electric 00. and Bell Telephone Laboratories, Inc.
`
`1.
`
`INTRODUCTION
`
`The main intent Of this paper is to introduce a new statistical procedure for testing a
`complete sample for normality. The test statistic is obtained by dividing the square of an
`appropriate linear combination of the sample order statistics by the usual symmetric
`estimate of variance. This ratio is both scale and origin invariant and hence the statistic
`is appropriate for a test of the composite hypothesis of normality.
`Testing for distributional assumptions in general and for normality in particular has been
`a major area of continuing statistical research—both theoretically and practically. A
`possible cause of such sustained interest is that many statistical procedures have been
`derived based on particular distributional assumptions—especially that of normality.
`Although in many cases the techniques are more robust than the assumptions underlying
`them, still a knowledge that the underlying assumption is incorrect may temper the use
`and application Of the methods. Moreover, the study of a body Of data With the stimulus
`of a distributional test may encourage consideration of, for example, normalizing trans-
`formations and the use Of alternate methods such as distribution-free techniques, as well as
`detection of gross peculiarities such as outliers or errors.
`The test procedure developed in this paper is defined and some of its analytical properties
`described in §2. Operational information and tables useful in employing the test are detailed
`
`in §3 (which may be read independently of the rest of the paper). Some examples are given
`in §4. Section 5 consists of an extract from an empirical sampling study Of the comparison of
`the effectiveness of various alternative tests. Discussion and concluding remarks are given
`in §6.
`
`2. THE W TEST FOR NORMALITY (COMPLETE SAMPLES)
`
`2- 1. Motivation and early work
`
`This study was initiated, in part, in an attempt to summarize formally certain indications
`of probability plots. In particular, could one condense departures from statistical linearity
`of probability plots into one or a few ‘degrees Of freedom’ in the manner of the application
`of analysis of variance in regression analysis?
`In a probability plot, one can consider the regression of the ordered Observations on the
`expected values of the order statistics from a standardized version of the hypothesized
`distribution—the plot tending to be linear if the hypothesis is true. Hence a possible method
`of testing the distributional assumptionis by means Of an analysis of variance type procedure.
`Using generalized least squares (the ordered variates are correlated) linear and higher-order
`models can be fitted and an F-type ratio used to evaluate the adequacy of the linear fit.
`
`1' Part of this research was supported by the Office of Naval Research while both authors were at
`Rutgers University.
`
`This content downloaded from 128.255.6.125 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://about.jstor.org/terms
`
`AstraZeneca Exhibit 2171 p. 1
`InnOPharma Licensing LLC v. AstraZeneca AB
`IPR2017-00904
`
`Fresenius-Kabi USA LLC v. AstraZeneca AB IPR2017-01910
`
`
`
`592
`
`S. S. SHAPIRO AND M. B. WILK
`
`This approach was investigated in preliminary work. While some promising results
`were obtained, the procedure is subject to the serious shortcoming that the selection of the
`higher—order model is, practically speaking, arbitrary. However, research is continuing
`along these lines.
`Another analysis of variance viewpoint which has been investigated by the present
`authors is to compare the squared slope of the probability plot regression line, which under
`the normality hypothesis is an estimate of the population variance multiplied by a constant,
`with the residual mean square about the regression line, which is another estimate of the
`variance. This procedure can be used with incomplete samples and has been described
`elsewhere (Shapiro & Wilk, 1965b).
`As an alternative to the above, for complete samples, the squared slope may be com-
`pared with the usual symmetric sample sum of squares about the mean which is independent
`of the ordering and easily computable. It is this last statistic that is discussed in the re-
`mainder of this paper.
`
`2-2. Derivation of the W statistic
`
`Let m’ = (m1, m2, ..., m”) denote the vector of expected values of standard normal
`order statistics, and let V = (vii) be the corresponding n x n covariance matrix. That is, if
`901 < .752 <
`x7, denotes an ordered random sample of size n from a normal distribution with
`mean 0 and variance 1, then
`
`and
`
`cov(x,.,xj) = v“.
`
`(i,j = 1,2,...,’n).
`
`E’(x),. = mi
`
`(i = 1, 2, ...,n),
`
`Let y’ = (y1,...,yn) denote a vector of ordered random observations. The objective is
`to derive a test for the hypothesis that this is a sample from a normal distribution with
`unknown mean ,u and unknown variance 0'2.
`Clearly, if the {3%.} are a normal sample then 3/,- may be expressed as
`
`y, = ”+0113,
`
`(i = 1,2, ...,n).
`
`It follows from the generalized least-squares theorem (Aitken, 1935; Lloyd, 1952) that the
`best linear unbiased estimates of [t and 0' are those quantities that minimize the quadratic
`form (y—Ml —o‘m)’ V*1(y—/i1~—crm), where 1’ = (1,1, ..., 1). These estimates are, respec—
`
`tlvely,
`a — m/ V_1(m1/ __ 1m!) 17—1y
`‘ “ 1’V—11m’V—1m—— (1’V-1m)2
`
`1’ V—1(1m' ~m1’) V‘ly
`1’ V—1 1m’ 17—1 m — (1’ 17*1m)2'
`
`and
`
`q)
`
`For symmetric distributions, 1’ V—lm = 0, and hence
`
`A
`In
`A
`fl=fizy¢=a and 0‘:
`1
`
`
`m’ 1743/
`m’ V“1m'
`
`Let
`
`32 = § (%-37)2
`1
`
`denote the usual symmetric unbiased estimate of (ii —— 1) U2.
`The W test statistic for normality is defined by
`
`R462
`[,2
`(a’
`2
`
`W:
`
`n
`
`02,312 = g2 = 8%) = (iéllaz‘yz’) E (gt—9V,
`
`2
`
`n
`i=1
`
`_
`
`This content downloaded from l28.255.6.l25 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://about.jstor.org/terms
`
`AstraZeneca Exhibit 2171 p. 2
`
`
`
`An analysis of variance test for normality
`
`593
`
`where
`
`R2 = m’V‘lm,
`
`and
`
`02 = m’ V‘1 V“1m,
`I
`a — (a
`a )— -———~le~1
`_ 1,...) n — (m,V_1-V_1m)%
`b= 1326/0.
`
`Thus, b is, up to the normalizing constant 0, the best linear unbiased estimate of the slope
`of a linear regression of the ordered observations, yi, on the expected values, mi, of the stand—
`ard normal order statistics. The constant 0 is so defined that the linear coefficients are
`normalized.
`
`It may be noted that if one is indeed sampling from a normal population then the numer—
`ator, b2, and denominator, 82,01‘ W are both, up to a constant, estimating the same quantity,
`namely 0'2. For non—normal populations, these quantities would not in general be estimating
`the same thing. Heuristic considerations augmented by some fairly extensive empirical
`sampling results (Shapiro & Wilk, 1964a) using populations with a wide range of «A61 and
`[i2 values, suggest that the mean values of W for non—null distributions tends to shift
`to the left of that for the null case. Further it appears that the variance of the null dis-
`tribution of W tends to be smaller than that of the non-null distribution. It is likely
`that this is due to the positive correlation between the numerator and denominator for a
`normal population being greater than that for non-normal populations.
`Note that the coefficients {ai} are just the normalized ‘best linear unbiased’ coeflicients
`tabulated in Sarhan & Greenberg (1956).
`
`LEMMA 1. W is scale and origin invariant
`
`2-3. Some analytical properties of W
`
`Proof. This follows from the fact that for normal (more generally symmetric) distribu-
`tions,
`_
`_at — an—i+1
`
`COROLLARY 1. W has a distribution which depends only on the sample size n, for samples
`from a normal distribution.
`
`COROLLARY 2. W is statistically independent of 82 and of 37, for samples from a normal
`distribution.
`
`Proof. This follows from the fact that 37 and S2 are sufficient for ,a and (72 (Hogg & Craig,
`1956}
`
`COROLLARY 3. E WT = EMT/ES”, for any r.
`
`LEMMA 2. The maximum value of W is 1.
`
`Proof. Assume 37 = 0 since W is origin invariant by Lemma 1. Hence
`
`W = [Z “May/Z 3/?
`
`Since
`
`(2 din-)2 s 2932 at = Z 9%,
`
`because 2 a? = a’a = 1, by definition, then W is bounded by 1. This maximum is in fact
`'12
`
`achieved when 3/2‘ = 77%: for arbitrary 77.
`
`LEMMA 3. The minimum value of W is nafi/(n— 1).
`
`This content downloaded from 128.255.6.125 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://about.jstor.org/terms
`
`AstraZeneca Exhibit 2171 p. 3
`
`
`
`594
`
`S. S. SHAPIRO AND M. B. WILK
`
`Proof.T (Due to C. L. Mallows.) Since W is scale and origin invariant, it suffices to con-
`7L
`
`sider the maximization of 23/? subject to the constraints 23/,- = O, Zaiyi = 1. Since this
`i=1
`
`is a convex region and 23/? is a convex function, the maximum of the latter must occur at
`one of the (n —— 1) vertices of the region. These are
`
`
`(N -—1 ~ -—1>
`(
`n—2
`(72—2)
`
`not;l
`’nal’
`'nal
`
`
`”(5514412), ”((11+a2) , n(a1+d2) ,
`
`—2
`
`~2
`
`’ n(a1+a2)
`
`)
`
`n(a1+...+an_1)’n(a1+...+dn_1)’ ""n(a1+...+an_1) '
`
`
`
` ( 1 1 —(n—1)
`
`
`
`
`
`)
`
`It can now be checked numerically, for the values of the specific coefficients {0%}, that the
`77/
`
`maximum of 2, 3;? occurs at the first of these points and the corresponding minimum value
`i=1
`
`of W is as given in the Lemma.
`
`LEMMA 4. The half andfirst moments of W are given by
`
`and
`
`_ R2 I‘{%(n— 1)}
`WE ‘ aren) J2
`
`_ R2(R2+ 1)
`EW__02(n—1)’
`
`Where R2 = m’ 17.1772, and 02 = m'V—l V—lm.
`
`Proof. Using Corollary 3 of Lemma 1,
`
`EW% = Eb/ES and EW = Ebz/ES2.
`
`Now,
`
`E8 = 0421‘ (9/11 (13—1)
`
`and E82 = (n— 1) (72.
`
`From the general least squares theorem (see e.g. Kendall & Stuart, vol. II (1961)),
`
`and
`
`R2
`
`A
`
`R2
`
`A
`A
`R4
`A
`R4
`E52 = (72an = CTZ{V&I‘(O')+(EO’)2}
`
`= 02122 (122+ 1)/02,
`
`since var (6) = 0‘2/m’ V—lm = 02/132, and hence the results of the lemma follow.
`Values of these moments are shown in Fig. 1 for sample sizes n = 3(1) 20.
`
`LEMMA 5. A joint distribution involving W is defined by
`
`h( W, 62, ..., 6W2) = K W—%(1 — Wfim—‘D cow—4192
`
`cos 6,14,
`
`over a region T on which the 61’s and W are not independent, and where K is a constant.
`
`1' Lemma 3 was conjectured intuitively and verified by certain numerical studies. Subsequently
`the above proof was given by C. L. Mallows.
`
`This content downloaded from 128.255.6.l25 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://ab0ut.jst0r.org/terms
`
`AstraZeneca Exhibit 2171 p. 4
`
`
`
`An analysis of variance test for normality
`
`595
`
`Proof. Consider an orthogonal transformation B such that y = Bu, Where
`7!,
`70’
`
`“1 = 2191/«M and “2 =i21aiy'i = b-
`2:
`IL:
`
`The ordered yi’s are distributed as
`
`%n
`
`exp{—%Z(y‘—fl)}
`
`2
`
`._
`0'
`
`(—oo<y1<...<yn<oo).
`
`”!(27,102)
`
`
`
`-1
`
`After integrating out, al, the joint density for a2, ..., an is
`
`K*
`
`1
`n
`2
`exp { — 2—03 i§2ui}
`
`over the appropriate region T*. Changing to polar co-ordinates such that
`
`u2 = psin 61, etc,
`
`and then integrating over p, yields the joint density of 61, ..., 6n_2 as
`
`K ** cos"-3 61 cos ”—4 (92. .. cos 0n_3,
`
`over some region T **.
`From these various transformations
`
`
`“2
`,02 sin2 6
`b2
`2 2 mp21
`W = ~——«
`2 = n
`2 n?
`S
`i=1
`
`= sin2 61,
`
`from which the lemma follows. The 65s and W are not independent, they are restricted
`in the sample space T.
`'
`
`
`
` 090
`
`3
`
`5
`
`7
`
`9
`
`11
`
`13
`
`15
`
`17
`
`19
`
`21
`
`Sample size, n
`
`Fig. l. Moments of W, E(WT), n = 3(l)20, r = 1}, l.
`
`COROLLARY 4. For n = 3, the density of W is
`
`3 —
`
`7-T(1—W)-%'W-%, is W s 1.
`
`This content downloaded from l28.255.6.l25 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://about.jstor.org/terms
`
`AstraZeneca Exhibit 2171 p. 5
`
`
`
`596
`
`S. S. SHAPIRO AND M. B. WILK
`
`Note that for n = 3, the W statistic is equivalent (up to a constant multiplier) to the
`statistic (range/standard deviation) advanced by David, Hartley & Pearson (1954) and
`the result of the corollary is essentially given by Pearson &; Stephens (1964).
`It has not been possible, for general n, to integrate out of the 65s of Lemma 5 to obtain
`an explicit form for the distribution of W. However, explicit results have also been given
`for n = 4, Shapiro (1964).
`
`2-4. Approximations associated with the W test
`
`The {ai} used in the W statistic are defined by
`
`a, = .Elmjvfi/O (j = 1,2, ...,n),
`
`9:
`
`where mi, 7),.)- and 0 have been defined in §2- 2. To determine the a,- directly it appears necessary
`to know both the vector of means m and the covariance matrix V. However, to date, the
`
`elements of V are known only up to samples of size 20 (Sarhan & Greenberg, 1956). Various
`approximations are presented in the remainder of this section to enable the use of W for
`samples larger than 20.
`By definition,
`
`m’ V—1
`
`m’ V‘1
`
`(m’V—lV—lm)21‘ -
`
`C
`
`is such that a’a = 1. Let a* = m’V‘l, then 02 = a*’a*. Suggested approximations are
`
`A: = 2m (i = 2,3,...,n—1),
`
`and
`
`=
`
`Ni”)
`
`x_
`
`= meow» ‘
`
`n g 20 ,
`
`’
`
`(/2P(%n+1)
`
`'
`
`A comparison of a? (the exact values) and a: for various values of i + 1 and n = 5, 10,
`15, 20 is given in Table 1. (Note a, = —an_i+1.) It will be seen that the approximation is
`generally in error by less than 1 %, particularly as 7; increases. This encourages one to trust
`the use of this approximation for n > 20. Necessary values of the mi for this approximation
`are available in Harter (1961).
`
`Table 1. Comparison of [afl and my) = [2mg], for selected values of
`i( =i= 1) and n
`
`n
`
`5
`
`10
`
`15
`
`20
`
`i =
`
`Exact
`
`Approx.
`Exact
`Approx.
`Exact
`Approx.
`Exact
`Approx.
`
`2
`
`1-014
`
`09%
`2-035
`2.003
`2530
`2-496
`2'849
`2815
`
`3
`
`0-0
`
`0-0
`1-324
`1-312
`1909
`1-895
`2'277
`2-262
`
`4
`
`——
`
`—
`0-757
`0-752
`L437
`1-430
`1-850
`1-842
`
`5
`
`—
`
`~—
`0247
`0245
`1036
`1-031
`1496
`1-491
`
`8
`
`—~
`
`—
`—
`—~
`0'0
`00
`0631
`0-630
`
`10
`
`—
`
`——
`——
`—-
`~—
`—~
`0124
`0-124
`
`This content downloaded from l28.255.6.l25 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://ab0ut.jst0r.0rg/terms
`
`AstraZeneca Exhibit 2171 p. 6
`
`
`
`An analysis of variance test for normality
`
`597
`
`A comparison of a1 and a? for n = 6(1) 201s given in Table 2. While the errors of this
`approximation are quite small for n< 20, the approximation and true values appear to
`cross over at n= 19. Further comparisons with other approximations, discussed below,
`suggested the changed formulation ofa? for n > 20 given above.
`
`n
`6
`7
`8
`9
`10
`11
`12
`
`C2
`
`60
`
`50
`
`40
`
`30
`
`20
`
`Table 2. Comparison of a? and 6%
`
`Exact
`0-414
`-388
`366
`347
`-329
`-314
`'300
`
`Approximate
`0-426
`-392
`-365
`-343
`324
`'308
`-295
`
`n
`13
`14
`15
`16
`17
`18
`19
`20
`
`Exact
`0-287
`276
`-265
`256
`247
`'239
`-231
`-224
`
`Approximate
`0-283
`-272
`-261
`254
`245
`-237
`231
`-226
`
`0
`
`
`
`R2: —2-41+1-98n
`
`Fig. 2. Plot of 02 = m’V—l V‘lm and R2 = m’V‘lm as functions of the sample size n.
`
`Sample size, n
`
`What is required for the W test are the normalized coefficients {ai}. Thus 6% is directly
`usable but the a: (i = 2, ..., n— 1), must be normalized by division by 0 = (m’V‘1 V—lmfi‘.
`A plot of the values of 02 and of R2 = m’ V—lm as a function of n is given in Fig. 2. The
`linearity of these may be summarized by the following least—squares equations:
`
`02 = — 2-722 + 4-083n,
`
`which gave a regression mean square of 7331-6 and a residual mean square of 00186, and
`
`R2 = —2-411 + 1-981n,
`
`with a regression mean square of 1725-7 and a residual mean square of 00016.
`38
`
`Biom. 52
`
`This content downloaded from 128.255.6.125 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://about.jstor.org/terms
`
`AstraZeneca Exhibit 2171 p. 7
`
`
`
`598
`
`S. S. SHAPIRO AND M. B. WILK
`
`These results encourage the use of the extrapolated equations to estimate 02 and R2
`for higher values of n.
`
`A comparison can now be made between values of 02 from the extrapolation equation
`TL
`
`and from 2 632, using
`1
`
`“*2 _
`a —
`1
`
`
`"*2
`a'
`a?
`n—l
`]
`2d? 2 ’L
`
`.
`
`For the case n = 30, these give values of 119-77 and 120-47, respectively. This concordance
`of the independent approximations increases faith in both.
`Plackett (1958) has suggested approximations for the elements of the vector at and R2.
`While his approximations are valid for a wide range of distributions and can be used with
`censored samples, they are more complex, for the normal case, than those suggested above.
`For the normal case his approximations are
`
`@333 = ”mj[F(mj+1)—F(mj—1)l
`, 2
`
`
`(j = 2> 3> ...,n— 1),
`
`7“: = nimjjflégzgl +m§f(mj) —f(mj) +mj[F(mj+1) _F(mj)]:
`
`(.9 = 1),
`
`where
`
`F(mj) = cumulative distribution evaluated at m],
`
`f(mj) = density function evaluated at my,
`
`and
`Plackett’s approximation to R2 is
`
`(if = —d;‘;.
`
`
`
`R2 = 2 {m§f(ml)f + m§f(m1)+ m1f(m1) — 2F(m1) + 1}.
`
`F(m1)
`
`Plackett’s 07,?“ approximations and the present fig" approximations are compared with the
`exact values, for sample size 20, in Table 3. In addition a consistency comparison of the
`two approximations is given for sample size 30. Plackett’s result for a1 (11, = 20) was the
`only case where his approximation was closer to the true value than the simpler approxima-
`tions suggested above. The differences in the two approximations for (11 were negligible,
`being less than 0-5 %. Both methods give good approximations, being off no more than
`three units in the second decimal place. The comparison of the two methods for n = 30
`shows good agreement, most of the difl'erences being in the third decimal place. The largest
`discrepancy occurred for i = 2; the estimates differed by six units in the second decimal
`place, an error of less than 2 %.
`The two methods of approximating R2 were compared for n = 20. Plackett’s method
`gave a value of 36-09, the method suggested above gave a value of 37-21 and the true
`value was 37-26.
`
`The good practical agreement of these two approximations encourages the belief that
`there is little risk in reasonable extrapolations for n > 20. The values of constants, for
`n > 20, given in §3 below, were estimated from the simple approximations and extrapola-
`tions described above.
`
`As a further internal check the values of an, (in—1 and an_4 were plotted as a function of
`n for n = 3(1) 50. The plots are shown in Fig. 3 which is seen to be quite smooth for each
`of the three curves at the value n = 20. Since values for n < 20 are ‘exact’ the smooth
`
`transition lends credence to the approximations for n > 20.
`
`This content downloaded from 128.255.6.125 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://about.jstor.org/terms
`
`AstraZeneca Exhibit 2171 p. 8
`
`
`
`An analysis of variance test for normality
`
`599
`
`Table 3. Comparison of approximate values of a* = m’V—l
`
`n
`20
`
`30
`
`i
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`l2
`13
`14
`15
`
`Present approx.
`—4-223
`—2-815
`—2-262
`— 1-842
`— 1-491
`—1-181
`— 0-897
`—0-630
`—0-374
`~0-l24
`
`Exact
`—4-2013
`—2-8494
`—2-2765
`—— 1-8502
`— 1-4960
`~1-1841
`— 0-8990
`—0-6314
`—0-3784
`—0-1243
`
`~ 4-655
`— 3-231
`— 2-730
`— 2-357
`— 2-052
`— 1-789
`—— 1-553
`— 1-338
`— 1-137
`— 0-947
`—0-765
`— 0-589
`—0-418
`—— 0-249
`— 0-083
`
`—
`~—
`——
`——
`—
`—
`w
`——
`—-
`-
`—
`fl
`——
`—
`—
`
`Plackett
`—4-215
`~2-764
`-—2-237
`—- 1-820
`—1-476
`—1-169
`— 0-887
`—O-622
`—0-370
`—0-l23
`
`~4-671
`— 3-170
`—— 2-768
`— 2-369
`— 2-013
`—— 1-760
`~1-528
`— 1-334
`—— 1-132
`— 0-941
`~0-759
`— 0-582
`~0-4l3
`— 0-249
`—0-082
`
`A
`
`
`
`0-7
`
`0-6
`
`0-5
`
`0-4
`
`0-2
`
`
`
`
`
`
`50
`
`
`
`IIIII]-III--
`III-IE.-
`I.Ill-fl-illII llIlllli
`EIIIII-‘IifllIII:In.In
`
`II.II;
`Ill
`
`0-1
`III
`
`
`1 5
`
`
`2'5
`
`20
`
`30
`
`35
`
`
`40
`
`45
`
`Sample size, n
`
`Fig. 3. at plotted as a function of sample size, n = 2(1)50, for
`i = n,n——l,n——4 (n > 8).
`
`38‘:
`
`This content downloaded from 128.255.6.125 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://ab0ut.jst0r.0rg/terms
`
`AstraZeneca Exhibit 2171 p. 9
`
`
`
`600
`
`S. S. SHAPIBO AND M. B. WILK
`
`-'~’I
`-I5135']
`-mu
`
`'I'IIIII
`
`
`1~00
`
`Fig. 5. Selected empirical percentage points of W, n = 3(l)50.
`
`Sample size, n
`
`This content downloaded from 128.255.6.125 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://about.jstor.org/terms
`
`AstraZeneca Exhibit 2171 p. 10
`
`
`
`An analysis of variance test for normality
`
`601
`
`Table 4. Some theoretical moments (m) and Monte Carlo moments (221-) of W
`3
`
`e
`0-9549
`-9486
`-9494
`
`0-9521
`-9547
`-9574
`-9600
`'9622
`
`0-9643
`-9661
`-967 8
`-9692
`-9706
`
`0-9718
`-9730
`-9741
`'9750
`-9757
`
`fig.
`0-9547
`-9489
`-9491
`
`0-9525
`-9545
`-9575
`-9596
`-9620
`
`0-9639
`-9661
`-9678
`-9693
`-9705
`
`0-9717
`-9730
`-9741
`-9750
`-9760
`
`0-9771
`-9776
`-9782
`-9787
`-9789
`
`0-9796
`-9801
`-9805
`-9810
`-9811
`
`0-9816
`-9819
`-9823
`-9825
`-9827
`
`0-9829
`-9833
`-9837
`-9837
`-9839
`
`0-9840
`-9844
`-9846
`-9846
`-9849
`
`0-9850
`-9854
`-9853
`-9855
`-9855
`
`M1
`0-9135
`-9012
`-9026
`
`0-9072
`-9123
`-9174
`-9221
`-9264
`
`0-9303
`-9337
`-9369
`-9398
`-9424
`
`0-9447
`-9470
`-9491
`-9508
`-9523
`
`——
`4
`~
`~
`~
`
`——
`—
`—
`~—
`—-
`
`——
`—
`—
`——
`#
`
`—
`M
`——
`~
`——
`
`——
`—
`~—
`—
`—
`
`—«
`-
`—
`—
`~—
`
`[4’1
`0-9130
`-9019
`-9021
`
`0-9082
`-9120
`-9175
`-9215
`-9260
`
`0-9295
`-9338
`-9369
`-9399
`-9422
`
`0-9445
`-9470
`-9492
`-9509
`-9527
`
`0-9549
`-9558
`-9570
`-9579
`-9584
`
`0-9598
`-9607
`-9615
`-9624
`-9626
`
`0-9636
`-9642
`-9650
`-9654
`-9658
`
`09662
`-9670
`-9677
`'9678
`-9682
`
`0-9684
`-9691
`-9694
`~9695
`-9701
`
`0-9703
`~9710
`-9708
`-9712
`-9714
`
`fie
`0-005698
`-005166
`-004491
`
`0-003390
`002995
`-002470
`-002293
`-00197 2
`
`0-001717
`-001483
`-001316
`-001168
`-001023
`
`0-000964
`-000823
`-000810
`-000711
`-000651
`
`0-000594
`-000568
`-000504
`-000504
`-000458
`
`0-000421
`'000404
`-000382
`-000369
`-000344
`
`0-000336
`-000326
`-000308
`-000293
`-000268
`
`0000264
`-000253
`000235
`-000239
`-000229
`
`0-000227
`-000212
`'000196
`-000193
`-000192
`
`0-000184
`-000170
`-000179
`'000165
`-000154
`
`its/fl?
`— 0-5930
`-— -8944
`— -8176
`
`— 1-1790
`—— 1-3229
`— 1-3841
`—— 1-5987
`-—-1-6655
`
`— 1-7494
`—1-7744
`— 1-7581
`—— 1-9025
`— 1-8876
`
`— 1-7968
`— 1-9468
`—2-1391
`— 2-1305
`— 2-2761
`
`— 2-2827
`—~ 2-3984
`— 2-1862
`— 2-3517
`—2-3448
`
`—2-4978
`—2-5903
`—2-6964
`— 2-6090
`—2-7288
`
`— 2-7997
`—— 2-6900
`—— 3-0181
`— 3-0166
`-— 2-8574
`
`—2-7965
`—— 3-1566
`— 3-0679
`——-3-3283
`—3-1719
`
`— 3-0740
`— 3-2885
`—— 32646
`-— 30803
`— 3-1645
`
`— 3-3742
`— 3-3353
`— 32972
`—— 3-2810
`— 3-3240
`
`fidfi:
`2-3748
`3-7231
`7-8126
`
`5-4295
`6-4104
`7-1092
`8-4482
`9-2812
`
`11-0547
`11-9185
`13-0769
`14-0568
`16-7383
`
`17-6669
`22-1972
`24-7776
`29-7333
`32-5906
`
`36-0382
`44-5617
`40-7507
`43-4926
`46-3318
`
`58-9446
`60-5200
`64-1702
`68-9591
`71-7714
`
`77-4744
`76-8384
`93-2496
`100-4419
`108-5077
`
`91-7985
`120-0005
`118-2513
`134-3110
`136-4787
`
`129-9604
`136-3814
`151-7350
`140-2724
`137-2297
`
`176-0635
`179-2792
`173-6601
`183-9433
`212-4279
`
`
`
`._.ocoooqcn0114;”3
`
`11
`12
`13
`14
`15
`
`16
`17
`18
`19
`20
`
`21
`22
`23
`24
`25
`
`26
`27
`28
`29
`30
`
`31
`32
`33
`34
`35
`
`36
`37
`38
`39
`40
`
`41
`42
`43
`44
`45
`
`46
`47
`48
`49
`50
`
`This content downloaded from 128.255.6.125 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://about.jstor.org/terms
`
`AstraZeneca Exhibit 2171 p. 11
`
`
`
`602
`
`S. S. SHAPIRO AND M. B. WILK
`
`2-5. Approximation to the distribution of W
`
`The complexity in the domain of the joint distribution of W and the angles {01} in Lemma 5
`necessitates consideration of an approximation to the null distribution of W. Since only
`the first and second moments of normal order statistics are, practically, available, it follows
`that only the one-half and first moments of W are known. Hence a technique such as the
`Cornish—Fisher expansion cannot be used.
`In the circumstance it seemed both appropriate and efficient to employ empirical samp-
`ling to obtain an approximation for the null distribution.
`Accordingly, normal random samples were obtained from the Rand Tables (Rand Corp.
`(1955)). Repeated values of W were computed for n = 3(1) 50 and the empirical percentage
`points determined for each value of n. The number of samples, m, employed was as follows:
`
`for n = 3(1)20,
`
`n = 21(1)50, m =[
`
`m = 5000,
`
`100,000
`n
`
`.
`
`Fig. 4 gives the empirical c.D.F.’s for values of n: 5, 10, 15, 20, 35, 50. Fig. 5
`gives a plot of the 1, 5, 10, 50, 90, 95, and 99 empirical percentage points of W for
`n = 3(1)50.
`
`A check on the adequacy of the sampling study is given by comparing the empirical
`one-half and the first moments of the sample with the corresponding theoretical moments
`of W for n = 3(1)20. This comparison is given in Table 4, which provides additional
`assurance of the adequacy of the sampling study. Also in Table 4 are given the sample
`variance and the standardized third and fourth moments for n = 3(1) 50.
`After some preliminary investigation, the SB system of curves suggested by Johnson
`(1949) was selected as a basis for smoothing the empirical null W distribution. Details of
`this procedure and its results are given in Shapiro & Wilk (1965 a). The tables of percentage
`points of W given in §3 are based on these smoothed sampling results.
`
`3. SUMMARY OF OPERATIONAL INFORMATION
`
`The objective of this section is to bring together all the tables and descriptions needed
`to execute the W test for normality. This section may be employed independently of
`notational or other information from other sections.
`
`The object of the W test is to provide an index or test statistic to evaluate the supposed
`normality of a complete sample. The statistic has been shown to be an effective measure
`of normality even for small samples (n < 20) against a wide spectrum of non-normal alter-
`natives (see §5 below and Shapiro & Wilk (1964a)).
`The W statistic is scale and origin invariant and hence supplies a test of the composite
`null hypothesis of normality.
`To compute the value of W, given a complete random sample of size n, x1, x2, ...,xn,
`one proceeds as follows:
`(i) Order the observations to obtain an ordered sample yl S yz S
`(ii) Compute
`
`S yn.
`
`32 = Z:(yi_?7)2 : 1%(9611—3—02'
`
`This content downloaded from 128.255.6.125 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://about.jstor.org/terms
`
`AstraZeneca Exhibit 2171 p. 12
`
`
`
`An analysis of variance test for normality
`
`603
`
`(iii)
`
`(a) If n is even, n = 2k, compute
`k
`
`b = ‘2 an—i+1(yn—’L+1 — gt):
`’L=1
`
`where the values of an_i+1 are given in Table 5.
`(b) If n is odd, n = 219 + 1, the computation is just as in (iii) (a), since ah.+1 = 0 when
`n = 210+ 1. Thus one finds
`
`= “144%; “ 91) + --- + ak+2(yk+2 _ 3/15):
`
`where the value of yk+1, the sample median, does not enter the computation of b.
`(iv) Compute W = 192/82.
`(v)
`1, 2, 5, 10, 50, 90, 95, 98 and 99 % points of the distribution of W are given in Table 6.
`Small values of W are significant, i.e. indicate non-normality.
`(vi) A more precise significance level may be associated with an observed W value by
`using the approximation detailed in Shapiro & Wilk (1965a).
`
`Table 5. Oceflicients {an_,-+1} for the W test for normality,
`for n = 2(1)50.
`8
`6
`7
`5
`4
`3
`2
`06052
`0-6431
`0-6233
`06646
`06872
`07071
`0-7071
`3164
`4
`0000
`-1677
`-2413
`2806
`3031
`—
`_
`»—
`0000
`0875
`-1401
`-1743
`
`0000
`0561
`
`
`\X’
`1
`2
`3
`4
`5 4
`
`—
`
`9
`0-5888
`-3244
`-1976
`0947
`.0000
`
`10
`05739
`3291
`-2141
`-1224
`0399
`
`>\”
`l
`
`11
`0-5601
`-3315
`-2260
`-1429
`-0695
`
`12
`0-5475
`-3325
`-2347
`-1586
`-0922
`
`13
`0-5359
`-3325
`-2412
`-1707
`-1099
`
`14
`0-5251
`3318
`2460
`-1802
`-1240
`
`15
`0-5150
`-3306
`-2495
`1878
`-1353
`
`16
`0-5056
`-3290
`-2521
`-1939
`-1447
`
`17
`0-4968
`-3273
`-2540
`1988
`-1524
`
`18
`0-4886
`-3253
`-2553
`2027
`~1587
`
`01197
`0-1109
`0-1005
`0-0880
`0-0727
`0-0539
`0-0303
`0-0000
`-0837
`-0725
`0593
`-0433
`-0240
`-0000
`—
`——
`
`-0496
`-0359
`-0196
`0000
`
`—
`—
`-0163
`-0000
`
`
`19
`0-4808
`-3232
`-2561
`2059
`-1641
`
`01271
`-0932
`-0612
`'0303
`-0000
`
`20
`0-4734
`-3211
`72565
`2085
`-1686
`
`01334
`-1013
`-0711
`-0422
`-0140
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`29
`
`30
`
`1‘9‘1—
`
`0-4643
`-3185
`-2578
`-2119
`~1736
`
`0-1399
`-1092
`0804
`0530
`-0263
`
`0-4590
`-3156
`-2571
`-2131
`-1764
`
`0-1443
`~1150
`-0878
`-0618
`-0368
`
`0-4542
`-3126
`2563
`-2139
`-1787
`
`0-1480
`-1201
`-0941
`-0696
`0459
`
`0-4493
`-3098
`-2554
`-2145
`-1807
`
`01512
`~1245
`-0997
`-0764
`-0539
`
`0-4450
`-3069
`-2543
`-2148
`~1822
`
`01539
`1283
`-1046
`0823
`0610
`
`0-4407
`-3043
`-2533
`2151
`1836
`
`01563
`-1316
`-1089
`-0876
`0672
`
`0-0321
`00228
`ll 0-0000 00122
`0-0403
`0-0476
`0107
`0000
`12
`—
`—
`0200
`0284
`
`—
`0000
`0094
`13
`
`14
`15
`
`—
`
`4.
`
`—
`
`—
`
`—
`
`——
`
`0-4366
`3018
`2522
`2152
`-1848
`
`0-1584
`-1346
`-1128
`-0923
`-0728
`
`0-0540
`0358
`0178
`0000
`—
`
`0-4328
`2992
`-2510
`-2151
`-1857
`
`0-1601
`'1372
`-1162
`-0965
`-0778
`
`0-0598
`0424
`0253
`0084
`—
`
`0-4291
`-2968
`-2499
`-2150
`-1864
`
`0-1616
`~1395
`-1192
`1002
`-0822
`
`0-0650
`0483
`0320
`0159
`0000
`
`0-4254
`-2944
`-2487
`-2148
`-1870
`
`O~1630
`-1415
`-1219
`-1036
`-0862
`
`0-0697
`0537
`0381
`0227
`0076
`
`This content downloaded from l28.255.6.l25 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://about.jstor.org/terms
`
`AstraZeneca Exhibit 2171 p. 13
`
`
`
`604
`
`S. S. SHAPIRO AND M. B. WILK
`
`Table 5. Ooefl‘icients {an_i+1} for the W test for normality,
`for n = 2(1)50 (com?)
`34
`35
`36
`
`37
`
`38
`
`39
`
`40
`
`32
`
`33
`
`31
`
`0-4220
`-2921
`-2475
`-2145
`-1874
`
`0-1641
`-1433
`-1243
`-1066
`-0899
`
`0-4188
`-2898
`-2463
`-2141
`-1878
`
`0-1651
`-1449
`-1265
`-1093
`-0931
`
`0-4156
`-2876
`2451
`-2137
`'1880
`
`0-1660
`-1463
`1284
`-1118
`'0961
`
`0-0812
`-0669
`-0530
`-0395
`-0262
`
`0-4127
`-2854
`-2439
`2132
`'1882
`
`01667
`-1475
`-1301
`-1140
`-0988
`
`00844
`-0706
`-0572
`-0441
`-0314
`
`0-4096
`-2834
`-2427
`-2127
`-1883
`
`01673
`1487
`-1317
`-1160
`-1013
`
`00873
`-0739
`'0610
`-0484
`-0361
`
`0-4068
`-2813
`-2415
`-2121
`1883
`
`0-1678
`1496
`~1331
`-1179
`-1036
`
`0-0900
`'0770
`-0645
`'0523
`'0404
`
`0-4040
`-2794
`-2403
`-2116
`-1883
`
`0-1683
`-1505
`-1344
`-1196
`-1056
`
`0-0924
`'0798
`-0677
`-0559
`-0444
`
`0-4015
`-2774
`-2391
`2110
`-1881
`
`0-1686
`-1513
`-1356
`-1211
`-1075
`
`0-0947
`-0824
`-0706
`-0592
`-0481
`
`00739 00777
`-0585
`0629
`-0435
`-0485
`-0289
`-0344
`-0144
`-0206
`
`0-0187
`00131
`0-0000 00068
`00287
`00239
`00372
`00331
`'0062
`-0000
`——
`H
`-0172
`-0119
`-0264
`-0220
`
`-0057
`-0000
`-0158
`'0110
`
`-0053
`-0000
`
`
`0-3989
`-2755
`-2380
`-2104
`~1880
`
`0-1689
`-1520
`~1366
`-1225
`-1092
`
`00967
`-0848
`-0733
`-0622
`-0515
`
`0-0409
`-0305
`-0203
`'0101
`-0000
`
`03964
`2737
`-2368
`-2098
`~1878
`
`0-1691
`-1526
`-1376
`-1237
`-1108
`
`00986
`-0870
`'0759
`-0651
`-0546
`
`0-0444
`-0343
`'0244
`-0146
`-0049
`
`
`
`mistaken/3cech~unhara:5ew~xa~mpmuu3)—"°'NHI-lI-Ih-lhull-ls'cocoqo
`
`
`
`
`41
`
`42
`
`43
`
`44
`
`45
`
`46
`
`47
`
`48
`
`49
`
`50
`
`0-3917
`0-3940
`-2701
`-2719
`-2345
`-2357
`-2085
`'2091
`-1874
`-1876
`01693 01694
`-1531
`-1535
`-1384
`-1392
`-1249
`-1‘259
`-1123
`-1136
`
`11 01004 0-1020
`12
`-0891
`'0909
`13
`'0782
`'0804
`14
`-0677
`-0701
`15
`-0575
`'0602
`
`16 0-0476 00506
`17
`-0379
`-0411
`18
`-0283
`-0318
`19
`-0188
`-0227
`20
`'0094
`'0136
`
`0-3894
`-2684
`'2334
`-2078
`-1871
`0-1695
`-1539
`-1398
`-1269
`-1149
`
`01035
`-0927
`-0824
`-0724
`-0628
`
`00534
`-0442
`-0352
`-0263
`-0175
`
`03872
`-2667
`-2323
`-2072
`-1868
`0-1695
`-1542
`-1405
`-1278
`-1160
`
`01049
`-0943
`-0842
`'0745
`-0651
`
`00560
`0471
`-0383
`-0296
`-0211
`
`03850
`-2651
`-2313
`'2065
`-1865
`0-1695
`-1545
`-1410
`-1286
`-1170
`
`0-1062
`-0959
`-0860
`-0765
`-0673
`
`.
`
`00584
`-0497
`-0412
`-0328
`-0245
`
`03830
`-2635
`-2302
`-2058
`-1862
`0-1695
`-1548
`-1415
`-1293
`-1180
`
`0-1073
`-0972
`-0876
`-0783
`-0694
`
`0-0607
`-0522
`-0439
`-0357
`-0277
`
`21
`22
`23
`24
`25
`
`0-0000
`
`0-0197
`0-0163
`0-0126
`0-0087
`0-0045
`-0118
`-0081
`—
`-0000
`-0042
`
`-0000
`-0039
`
`~
`—
`~
`—«
`—«
`—
`—
`
`0-3808
`-2620
`-2291
`-2052
`-1859
`0-1695
`-1550
`-1420
`-1300
`-1189
`
`0-1085
`-0986
`-0892
`-0801
`-0713
`
`00628
`-0546
`-0465
`-0385
`-0307
`
`0-0229
`-0153
`'0076
`-0000
`~—
`
`0-3789
`-2604
`-2281
`-2045
`-1855
`0-1693
`-1551
`-1423
`-1306
`-1197
`
`01095
`-0998
`-0906
`-0817
`-0731
`
`00648
`-0568
`-0489
`-0411
`'0335
`
`0-0259
`'0185
`-0111
`-0037
`——
`
`0-3770
`-2589
`-2271
`-2038
`-1851
`0-1692
`-1553
`-1427
`-1312
`-1205
`
`01105
`-1010
`-0919
`'0832
`-0748
`
`00667
`-0588
`-0511
`-0436
`-0361
`
`0-0288
`-0215
`-0143
`-0071
`-0000
`
`03751
`-2574
`-2260
`-2032
`-1847
`0-1691
`-l554
`-1430
`-1317
`~1212
`
`01113
`'1020
`-0932
`-0846
`-0764
`
`00685
`'0608
`-0532
`-0459
`-0386
`
`00314
`-0244
`-0174
`-0104
`-0035
`
`This content downloaded from 128.255.6.125 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://ab0ut.jst0r.0rg/terms
`
`AstraZeneca Exhibit 2171 p. 14
`
`
`
`An analysis of variance test for normality
`
`605
`
`n
`
`3
`4
`5
`
`6
`7
`8
`9
`10
`
`11
`12
`13
`14
`15
`
`16
`17
`18
`19
`20
`
`Table 6. Percentage points of the W test* for n = 3(1) 50
`Level
`
`
`f_'_
`001
`
`0'98
`
`N
`0'99
`
`0'02
`
`0'05
`
`010
`
`0'50
`
`090
`
`0-95
`
`0-753
`-687
`-686
`
`0-713
`-730
`-749
`-764
`-781
`
`0-792
`-805
`-814
`-825
`-835
`
`0-844
`-851
`-858
`-863
`-868
`
`0-756
`-707
`-715
`
`0-743
`-760
`-778
`-791
`-806
`
`0-817
`-828
`-837
`-846
`-855
`
`0-863
`-869
`-874
`-879
`-884
`
`0-767
`-748
`-762
`
`0-788
`-803
`-818
`-829
`-842
`
`0-850
`-859
`-866
`-874
`-881
`
`0-887
`-892
`-897
`-901
`-905
`
`0-789
`-792
`-806
`
`0-826
`-838
`-851
`-859
`-869
`
`0-876
`-883
`-889
`-895
`-901
`
`0-906
`-910
`-914
`-917
`-920
`
`0-923
`-926
`-928
`-930
`-931
`
`0-959
`-935
`-927
`
`0-927
`-928
`-932
`-935
`'938
`
`0-940
`-943
`-945
`-947
`-950
`
`0952
`954
`-956
`'957
`-959
`
`0-960
`'961
`-962
`-963
`-964
`
`0-998
`-987
`-979
`
`0-974
`-972
`-972
`-972
`-972
`
`0-973
`-973
`-974
`-975
`-975
`
`0-976
`-977
`-978
`-978
`-979
`
`0-980
`-980
`-981
`-981
`-931
`
`0-999
`-992
`-986
`
`0-981
`-979
`-978
`-978
`-978
`
`0-979
`-979
`-979
`'980
`'980
`
`0-981
`-981
`-982
`'982
`-983
`
`0-983
`'984
`-984
`-984
`-985
`
`1-000
`-996
`-991
`
`0-986
`-985
`-984
`-984
`-983
`
`0-984
`-984
`-984
`-984
`-984
`
`0-985
`-985
`-986
`-986
`-986
`
`0-987
`-987
`-987
`-987
`-988
`
`1-000
`-997
`-993
`
`0-989
`'988
`-987
`-986
`-986
`
`0-986
`-986
`-986
`-986
`-987
`
`0-987
`-987
`-988
`-988
`-988
`
`0-989
`-989
`'989
`'989
`-989
`
`21
`22
`23
`24
`25
`
`26
`27
`28
`29
`30
`
`31
`32
`33
`34
`35
`
`36
`37
`38
`39
`40
`
`41
`42
`43
`44
`45
`
`46
`47
`48
`49
`50
`
`0-873
`-878
`-881
`884
`-888
`
`0-891
`-894
`-896
`-898
`-900
`
`0-902
`-904
`-906
`-908
`-910
`
`0-912
`-914
`-916
`-917
`-919
`
`0-920
`-922
`-923
`-924
`-926
`
`0-927
`-928
`-929
`-929
`-930
`
`0-888
`-892
`-895
`-898
`-901
`
`0-904
`-906
`-908
`-910
`-