`With 5 text—figures
`Printed in Great Britain
`
`591
`
`An analysis of variance test for normality
`(complete samples)T
`
`BY S. S. SHAPIRO AND M. B. WILK
`
`General Electric 00. and Bell Telephone Laboratories, Ino.
`
`1.
`
`INTRODUCTION
`
`The main intent of this paper is to introduce a new statistical procedure for testing a
`complete sample for normality. The test statistic is obtained by dividing the square of an
`appropriate linear combination of the sample order statistics by the usual symmetric
`estimate of variance. This ratio is both scale and origin invariant and hence the statistic
`is appropriate for a test of the composite hypothesis of normality.
`Testing for distributional assumptions in general and for normality in particular has been
`a major area of continuing statistical research—both theoretically and practically. A
`possible cause of such sustained interest is that many statistical procedures have been
`derived based on particular distributional assumptions—especially that of normality.
`Although in many cases the techniques are more robust than the assumptions underlying
`them, still a knowledge that the underlying assumption is incorrect may temper the use
`and application of the methods. Moreover, the study of a body of data With the stimulus
`of a distributional test may encourage consideration of, for example, normalizing trans-
`formations and the use of alternate methods such as distribution-free techniques, as well as
`detection of gross peculiarities such as outliers or errors.
`The test procedure developed in this paper is defined and some of its analytical properties
`described in §2. Operational information and tables useful in employing the test are detailed
`
`in §3 (which may be read independently of the rest of the paper). Some examples are given
`in §4. Section 5 consists of an extract from an empirical sampling study of the comparison of
`the effectiveness of various alternative tests. Discussion and concluding remarks are given
`in §6.
`
`2. THE W rnsr FOR NORMALITY (COMPLETE SAMPLES)
`
`2-1. Motivation and early work
`
`This study was initiated, in part, in an attempt to summarize formally certain indications
`of probability plots. In particular, could one condense departures from statistical linearity
`of probability plots into one or a few ‘degrees of freedom’ in the manner of the application
`of analysis of variance in regression analysis?
`In a probability plot, one can consider the regression of the ordered observations on the
`expected values of the order statistics from a standardized version of the hypothesized
`distribution—the plot tending to be linear if the hypothesis is true. Hence a possible method
`of testing the distributional assumptionis by means of an analysis of variance type procedure.
`Using generalized least squares (the ordered variates are correlated) linear and higher-order
`models can be fitted and an F-type ratio used to evaluate the adequacy of the linear fit.
`
`1' Part of this research was supported by the Office of Naval Research while both authors were at
`Rutgers University.
`
`This content downloaded from 128.255.6.125 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://about.jstor.org/terms
`
`AstraZeneca Exhibit 2171 p. 1
`lnnoPharma Licensing LLC v. AstraZeneca AB
`lPR2017-00905
`
`
`
`592
`
`S. S. SHAPIRO AND M. B. WILK
`
`This approach was investigated in preliminary work. While some promising results
`were obtained, the procedure is subject to the serious shortcoming that the selection of the
`higher—order model is, practically speaking, arbitrary. However, research is continuing
`along these lines.
`Another analysis of variance viewpoint which has been investigated by the present
`authors is to compare the squared slope of the probability plot regression line, which under
`the normality hypothesis is an estimate of the population variance multiplied by a constant,
`with the residual mean square about the regression line, which is another estimate of the
`variance. This procedure can be used with incomplete samples and has been described
`elsewhere (Shapiro & Wilk, 1965b).
`As an alternative to the above, for complete samples, the squared slope may be com-
`pared with the usual symmetric sample sum of squares about the mean which is independent
`of the ordering and easily computable. It is this last statistic that is discussed in the re-
`mainder of this paper.
`
`2-2. Derivation of the W statistic
`
`Let m’ = (m1, m2, ..., m”) denote the vector of expected values of standard normal
`order statistics, and let V = (vii) be the corresponding n x n covariance matrix. That is, if
`901 g .752 <
`x7, denotes an ordered random sample of size n from a normal distribution with
`mean 0 and variance 1, then
`
`’L
`E’Wh‘ = m‘
`
`(i = 1,2,...,n),
`
`and
`
`cov(:c,.,xj) = v“.
`
`(i,j = 1,2,...,’n).
`
`Let y’ = (y1,...,yn) denote a vector of ordered random observations. The objective is
`to derive a test for the hypothesis that this is a sample from a normal distribution with
`unknown mean ,u and unknown variance 0'2.
`Clearly, if the
`are a normal sample then 3/,- may be expressed as
`
`y,- = lit—tow,
`
`(i = 1,2, ...,n).
`
`It follows from the generalized least-squares theorem (Aitken, 1935; Lloyd, 1952) that the
`best linear unbiased estimates of [i and 0' are those quantities that minimize the quadratic
`form (y—Ml —o‘m)’ V*1(y—/i1 ~—(Tm), where 1’ = (1,1, ..., 1). These estimates are, respec—
`
`tlvely,
`A — m/ V_1(m1/ __ 1m!) 17—1y
`M “ 1’V—11m’V—1m——(1’V—1m)2
`
`1’ V—1(1m' ~m1’) V“1y
`1’ 17—1 lm’ V—1 m — (1’ V4722)?"
`
`and
`
`Q)
`
`For symmetric distributions, 1’ V—lm = 0, and hence
`
`In
`A
`fl=fi2yi=y,
`1
`
`A
`and 0‘:
`
`
`m’ 1743/
`m’ V‘lm'
`
`Let
`
`82 = § (%‘?7)2
`1
`
`denote the usual symmetric unbiased estimate of (ii —— 1) U2.
`The W test statistic for normality is defined by
`
`
`[,2
`R462
`a!
`2
`=0232“Sz* 82 "
`
`W
`
`—
`
`_ ( y) _ (
`
`2
`
`n
`’b=1
`
`(17:97:)
`
`_
`
`n
`i=1
`
`‘2 (gt—9V,
`
`This content downloaded from l28.255.6.l25 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://ab0ut.jst0r.0rg/terms
`
`AstraZeneca Exhibit 2171 p. 2
`
`
`
`An analysis of variance test for normality
`
`593
`
`where
`
`R2 = m’V‘lm,
`
`and
`
`02 = m’ V‘1 V“1m,
`I
`a — (a
`a )— -———~le~1
`_ 1,...) n — (m,V_1-V_1m)%
`b= 1326/0.
`
`Thus, b is, up to the normalizing constant 0, the best linear unbiased estimate of the slope
`of a linear regression of the ordered observations, yi, on the expected values, mi, of the stand—
`ard normal order statistics. The constant 0 is so defined that the linear coefficients are
`normalized.
`
`It may be noted that if one is indeed sampling from a normal population then the numer—
`ator, b2, and denominator, 82, of W are both, up to a constant, estimating the same quantity,
`namely 0'2. For non—normal populations, these quantities would not in general be estimating
`the same thing. Heuristic considerations augmented by some fairly extensive empirical
`sampling results (Shapiro & Wilk, 1964a) using populations with a wide range of Jfll and
`[$2 values, suggest that the mean values of W for non—null distributions tends to shift
`to the left of that for the null case. Further it appears that the variance of the null dis-
`tribution of W tends to be smaller than that of the non-null distribution. It is likely
`that this is due to the positive correlation between the numerator and denominator for a
`normal population being greater than that for non-normal populations.
`Note that the coefficients {ai} are just the normalized ‘best linear unbiased’ coeflicients
`tabulated in Sarhan & Greenberg (1956).
`
`LEMMA 1. W is scale and origin invariant
`
`2-3. Some analytical properties of W
`
`Proof. This follows from the fact that for normal (more generally symmetric) distribu-
`tions,
`
`—ai = an—i+1
`
`COROLLARY 1. W has a distribution which depends only on the sample size n, for samples
`from a normal distribution.
`
`COROLLARY 2. W is statistically independent of 82 and of 37, for samples from a normal
`distribution.
`
`Proof. This follows from the fact that 37 and S2 are suflicient for ,a and (72 (Hogg & Craig,
`1956}
`
`COROLLARY 3. E WT = EMT/ES”, for any r.
`
`LEMMA 2. The maximum value of W is 1.
`
`Proof. Assume 37 = 0 since W is origin invariant by Lemma 1. Hence
`
`W = [Z aiyi]2/2
`
`Since
`
`(2 din-)2 < 2932 21% = Z 21%,
`
`because 2 a? = a’a = 1, by definition, then W is bounded by 1. This maximum is in fact
`'12
`
`achieved when yi = nai, for arbitrary 77.
`
`LEMMA 3. The minimum value of W is nafi/(n— 1).
`
`This content downloaded from l28.255.6.l25 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://about.jstor.org/terms
`
`AstraZeneca Exhibit 2171 p. 3
`
`
`
`594
`
`S. S. SHAPIRO AND M. B. WILK
`
`Proof.T (Due to C. L. Mallows.) Since W is scale and origin invariant, it suffices to con-
`7L
`
`sider the maximization of 23/3 subject to the constraints 2y,- = O, Zaiyi = 1. Since this
`i=1
`
`is a convex region and 23/3 is a convex function, the maximum of the latter must occur at
`one of the (n —— 1) vertices of the region. These are
`
`
`not;l
`’nal’
`'nal
`
`
`—2
`
`~2
`
`’ n(a1+a2)
`
`)
`
`-—1 ~ -—1>
`(72—2)
`
`(
`
`n—2
`
`“(514412), n(a1+a2) , n(a1+a2) ’
`
` ( 1 1 —(n—1)
`
`
`
`n(a1+...+an_1)’ n(a1+... +dn_1)’
`
`
`
`
`
`’ n(a1+... +an_1) '
`
`)
`
`It can now be checked numerically, for the values of the specific coefficients {(1);}, that the
`77/
`
`maximum of 2, 3;? occurs at the first of these points and the corresponding minimum value
`i=1
`
`of W is as given in the Lemma.
`
`LEMMA 4. The half andfirst moments of W are given by
`
`and
`
`_ R2 I‘{%(n— 1)}
`WE ‘ aren) J2
`
`_ R2(R2+ 1)
`EW——02(n_1),
`
`Where R2 = m’ 17.1772, and 02 = m'V—l V—lm.
`
`Proof. Using Corollary 3 of Lemma 1,
`
`EW% = Eb/ES and EW = Eb2/ES2.
`
`Now,
`
`E8 = M211 /P
`
`and E82 = (n— 1) (72.
`
`From the general least squares theorem (see e.g. Kendall & Stuart, vol. II (1961)),
`
`and
`
`R2
`
`A
`
`R2
`
`A
`A
`R4
`A
`R4
`E?)2 = (72an = @{var(o)+(Eo)2}
`
`= 02122 (122+ 1)/02,
`
`since var (6) = 0‘2/m’ V—lm = (72/132, and hence the results of the lemma follow.
`Values of these moments are shown in Fig. 1 for sample sizes n = 3(1) 20.
`
`LEMMA 5. A joint distribution involving W is defined by
`
`h( W, 62, ..., 6W2) = K W—%(1 — Wfim—‘D cow—4192
`
`cos 6,14,
`
`over a region T on which the 61’s and W are not independent, and where K is a constant.
`
`1' Lemma 3 was conjectured intuitively and verified by certain numerical studies. Subsequently
`the above proof was given by C. L. Mallows.
`
`This content downloaded from 128.255.6.l25 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://about.jstor.org/terms
`
`AstraZeneca Exhibit 2171 p. 4
`
`
`
`An analysis of variance test for normality
`
`595
`
`Proof. Consider an orthogonal transformation B such that y = Bu, Where
`7!,
`70’
`
`“1 = 2191/«M and “2 =lzlai3/t = b-
`2:
`IL:
`
`The ordered yi’s are distributed as
`
`%n
`
`exp{—%2(y‘ M);
`
`2
`
`
`._
`0'
`
`(—oo<y1<...<yn<oo).
`
`721(27’102)
`
`
`
`-1
`
`After integrating out, al, the joint density for a2, ..., an is
`1
`11.
`
`K* exp { — 2—03 i§2u§}
`
`over the appropriate region 51"“. Changing to polar co-ordinates such that
`
`and then integrating over p, yields the joint density of 61, ..., 6n_2 as
`
`K ** cosn-i“ 61 cos “‘4 (92. .. cos 0n_3,
`
`u2 = psin 61, etc,
`
`over some region T **.
`From these various transformations
`2
`
`1)
`W=-
`82=n
`
`2'2
`2
`
`2L2 =ps11161
`p2
`
`= s1n2 61,
`
`from which the lemma follows. The 65s and W are not independent, they are restricted
`in the sample space T.
`'
`
`
`
` 090
`
`3
`
`5
`
`7
`
`9
`
`11
`
`13
`
`15
`
`17
`
`19
`
`21
`
`Sample size, n
`
`Fig. l. Moments of W, E(WT), n = 3(1)20, r = 1}, l.
`
`COROLLARY 4. For n = 3, the density of W is
`
`3 —
`
`7-T(1—W)-%'W-i, is W s 1.
`
`This content downloaded from 128.255.6.125 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://about.jstor.org/terms
`
`AstraZeneca Exhibit 2171 p. 5
`
`
`
`596
`
`S. S. SHAPIRO AND M. B. WILK
`
`Note that for n = 3, the W statistic is equivalent (up to a constant multiplier) to the
`statistic (range/standard deviation) advanced by David, Hartley & Pearson (1954) and
`the result of the corollary is essentially given by Pearson &; Stephens (1964).
`It has not been possible, for general n, to integrate out of the 65s of Lemma 5 to obtain
`an explicit form for the distribution of W. However, explicit results have also been given
`for n = 4, Shapiro (1964).
`
`2-4. Approximations associated with the W test
`
`The {a1} used in the W statistic are defined by
`
`a, = Elmjvfi/O (j = 1,2, ...,n),
`
`9:
`
`where mi, 7),.)- and 0 have been defined in §2- 2. To determine the a,- directly it appears necessary
`to know both the vector of means in and the covariance matrix V. However, to date, the
`
`elements of V are known only up to samples of size 20 (Sarhan & Greenberg, 1956). Various
`approximations are presented in the remainder of this section to enable the use of W for
`samples larger than 20.
`By definition,
`
`m’ V—1
`
`m’ V-1
`
`(m’V—lV—lmfil‘ -
`
`C
`
`is such that a’a = 1. Let a* = m’V‘l, then 02 = a*’a*. Suggested approximations are
`
`A: =
`
`(i = 2,3,...,n—1),
`
`and
`
`a2 = dz =
`1
`n
`
`Win)
`“W n s 20 ,
`
`F{%(n+ 1)}
`
`(
`
`)
`
`J2 1‘(%n+ 1)
`
`(n >
`
`'
`
`A comparison of a? (the exact values) and a;- for various values of i + 1 and n = 5, 10,
`15, 20 is given in Table 1. (Note a, = —an_i+1.) It will be seen that the approximation is
`generally in error by less than 1 %, particularly as 7; increases. This encourages one to trust
`the use of this approximation for n > 20. Necessary values of the mi for this approximation
`are available in Harter (1961).
`
`= [21211.], for selected values of
`Table 1. Comparison of [afl and
`i( =i= 1) and n
`
`n
`
`5
`
`10
`
`15
`
`20
`
`i =
`
`Exact
`
`Approx.
`Exact
`Approx.
`Exact
`Approx.
`Exact
`Approx.
`
`2
`
`1-014
`
`09%
`2-035
`2003
`2-530
`2-496
`2-849
`2815
`
`3
`
`0-0
`
`0-0
`1-324
`1-312
`1909
`1-895
`2277
`2-262
`
`4
`
`——
`
`—
`0-757
`0-752
`1437
`1-430
`1-850
`1-842
`
`5
`
`—
`
`~—
`0247
`0245
`1-036
`1-031
`1496
`1-491
`
`8
`
`—~
`
`—
`—
`—~
`0'0
`00
`0-631
`0-630
`
`10
`
`—
`
`——
`——
`—-
`~—
`—~
`0-124
`0-124
`
`This content downloaded from l28.255.6.l25 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://ab0ut.jst0r.0rg/terms
`
`AstraZeneca Exhibit 2171 p. 6
`
`
`
`An analysis of variance test for normality
`
`597
`
`A comparison of a% and 6% for n = 6(1) 20 is given in Table 2. While the errors of this
`approximation are quite small for n s 20, the approximation and true values appear to
`cross over at n = 19. Further comparisons with other approximations, discussed below,
`suggested the changed formulation of 62% for n > 20 given above.
`
`Table 2. Comparison of a% and 6%
`
`Exact
`0-414
`-388
`366
`347
`-329
`-314
`'300
`
`Approximate
`0-426
`-392
`-365
`-343
`324
`'308
`-295
`
`n
`13
`14
`15
`16
`17
`18
`19
`20
`
`Exact
`0-287
`276
`-265
`256
`247
`'239
`-231
`-224
`
`Approximate
`0-283
`-272
`-261
`254
`245
`-237
`231
`-226
`
`n
`6
`7
`8
`9
`10
`11
`12
`
`C2
`
`70
`
`60
`
`50
`
`40
`
`30
`
`0
`
`20
`
`10
`
`
`
`Fig. 2. Plot of 02 = m’V—l V‘lm and R2 = m’V‘lm as functions of the sample size n.
`
`Sample size, n
`
`What is required for the W test are the normalized coefficients {ai}. Thus 6% is directly
`usable but the a: (i = 2, ..., n— 1), must be normalized by division by 0 = (m’V‘1 V—lmfi.
`A plot of the values of 02 and of R2 = m’ V—lm as a function of n is given in Fig. 2. The
`linearity of these may be summarized by the following least—squares equations:
`
`02 = — 2-722 + 4-083n,
`
`which gave a regression mean square of 7331-6 and a residual mean square of 00186, and
`
`R2 = —2-411 + 1-981n,
`
`With a regression mean square of 1725-7 and a residual mean square of 0-0016.
`38
`
`Biom. 52
`
`This content downloaded from l28.255.6.l25 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://about.jstor.org/terms
`
`AstraZeneca Exhibit 2171 p. 7
`
`
`
`598
`
`S. S. SHAPIRO AND M. B. WILK
`
`These results encourage the use of the extrapolated equations to estimate 02 and R2
`for higher values of n.
`
`A comparison can now be made between values of 02 from the extrapolation equation
`TL
`
`and from E 6;”, using
`1
`
`_
`—
`
`A>k2a
`1
`
`
`
`1 — 2a;
`
`A*2
`n—l
`Z 00,-
`
`.
`
`For the case n = 30, these give values of 119-77 and 12047, respectively. This concordance
`of the independent approximations increases faith in both.
`Plackett (1958) has suggested approximations for the elements of the vector at and R2.
`While his approximations are valid for a wide range of distributions and can be used with
`censored samples, they are more complex, for the normal case, than those suggested above.
`For the normal case his approximations are
`
`@333 = nmj[F(mj+1)_F(mj—1)]
`, 2
`
`
`7‘ = n
`
`= 2> 3> “wfl— 1),
`
`—f(mj) +mj[F(mj+1)
`
`= 1),
`
`where
`
`F(mj) = cumulative distribution evaluated at m],
`
`f(mj) = density function evaluated at my,
`
`and
`Plackett’s approximation to R2 is
`
`(if = —d;‘;.
`
`
`
`R2 = 2 {mif(“12 + mif(m1)+ m1f(m1)— 2mm.) + 1}.
`
`F(m1)
`
`Plackett’s 07,?“ approximations and the present fig" approximations are compared with the
`exact values, for sample size 20, in Table 3. In addition a consistency comparison of the
`two approximations is given for sample size 30. Plackett’s result for a1 (11, = 20) was the
`only case where his approximation was closer to the true value than the simpler approxima-
`tions suggested above. The differences in the two approximations for (11 were negligible,
`being less than 0-5 %. Both methods give good approximations, being off no more than
`three units in the second decimal place. The comparison of the two methods for n = 30
`shows good agreement, most of the differences being in the third decimal place. The largest
`discrepancy occurred for i = 2; the estimates differed by six units in the second decimal
`place, an error of less than 2 %.
`The two methods of approximating R2 were compared for n = 20. Plackett’s method
`gave a value of 36-09, the method suggested above gave a value of 37-21 and the true
`value was 37-26.
`
`The good practical agreement of these two approximations encourages the belief that
`there is little risk in reasonable extrapolations for n > 20. The values of constants, for
`n > 20, given in §3 below, were estimated from the simple approximations and extrapola-
`tions described above.
`
`As a further internal check the values of an, an_l and an_4 were plotted as a function of
`n for n = 3(1) 50. The plots are shown in Fig. 3 which is seen to be quite smooth for each
`of the three curves at the value n = 20. Since values for n < 20 are ‘exact’ the smooth
`
`transition lends credence to the approximations for n > 20.
`
`This content downloaded from 128.255.6.125 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://about.jstor.org/terms
`
`AstraZeneca Exhibit 2171 p. 8
`
`
`
`An analysis of variance test for normality
`
`599
`
`Table 3. Comparison of approximate values of a* = m’V—l
`
`n
`20
`
`30
`
`i
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`
`1
`2
`3
`4
`5
`6
`7
`8
`9
`10
`11
`l2
`13
`l4
`15
`
`Present approx.
`—4'223
`—2-815
`—2-262
`—l-842
`—1491
`—1-181
`— 0-897
`—0-630
`—0-374
`~0-124
`
`Exact
`—4-2013
`—2-8494
`—2-2765
`~1-8502
`—1-4960
`~1-1841
`— 0-8990
`—0-6314
`—0-3784
`—0-1243
`
`~ 4-655
`— 3-231
`— 2-730
`— 2'357
`— 2-052
`— 1-789
`—— 1-553
`— 1-338
`— 1-137
`— 0947
`—0-765
`— 0-589
`—0-418
`—— 0-249
`— 0-083
`
`—
`~—
`——
`———
`—
`—
`w
`——
`—-
`—~
`—
`fl
`#—
`—
`—
`
`Plackett
`—4215
`~2-764
`-—2-237
`~-l-820
`—1-476
`—1-169
`— 0-887
`—0-622
`—0-370
`—0-l23
`
`~4-671
`— 3170
`—— 2-768
`— 2369
`— 2-013
`—— 1-760
`~1-528
`— 1.334
`—— 1-132
`— 0-941
`~0-759
`— 0-582
`~0-413
`— 0249
`—0-082
`
`07
`
`0-6
`
`05
`
`0-4
`
`0-2
`
`0-1
`
`00
`
`
`
`
`
`I
`inM
`
`I
`
`3
`
`A
`
`
`2'5
`
`20
`
`30
`
`35
`
`
`40
`
`45
`
`
`
`
`
`
`50
`
` "II“IIIII-I
` Im-mum-
`IIII
`II”-
`“III
`I?-1'mm
`III!I“.E”I:.I
`
`
`“I
`
`5
`
`10
`
`15
`
`Sample size, n
`
`Fig. 3. at plotted as a function of sample size, n = 2(1)50, for
`i = n,n——l,n——4 (n > 8).
`
`38‘:
`
`This content downloaded from 128.255.6.125 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://ab0ut.jst0r.0rg/terms
`
`AstraZeneca Exhibit 2171 p. 9
`
`
`
`600
`
`S. S. SHAPIBO AND M. B. WILK
`
`Fig. 4. Empirical C.D.F. of W for n = 5, 10, 15, 20, 35, 50.
`
`
`
`
`1'00
`
`“mMfigz:'-‘;::::.-._ . . . . . . - - - - - - --
`‘IMHooaanofi‘
`
`Fig. 5. Selected empirical percentage points of W, n = 3(l)50.
`
`Sample size, n
`
`This content downloaded from 128.255.6.125 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://about.jstor.org/terms
`
`AstraZeneca Exhibit 2171 p. 10
`
`
`
`An analysis of variance test for normality
`
`601
`
`Table 4. Some theoretical moments (M) and Monte Carlo moments (221-) of W
`3
`
`,_ocmqamgm3
`
`11
`12
`13
`14
`15
`
`16
`17
`18
`19
`20
`
`21
`22
`23
`24
`25
`
`26
`27
`28
`29
`30
`
`31
`32
`33
`34
`35
`
`36
`37
`38
`39
`40
`
`41
`42
`43
`44
`45
`
`46
`47
`48
`49
`50
`
`6%
`0'9549
`'9486
`~9494
`
`09521
`'9547
`-9574
`-9600
`'9622
`
`09643
`-9661
`-9678
`-9692
`-9706
`
`09718
`-9730
`~9741
`'9750
`'9757
`
`7625
`0-9547
`'9489
`-9491
`
`0-9525
`-9545
`-9575
`-9596
`-9620
`
`0~9639
`~9661
`~9678
`-9693
`-9705
`
`0-9717
`-9730
`~9741
`-9750
`~9760
`
`0'9771
`-9776
`'9782
`'9787
`-9789
`
`09796
`-9801
`-9805
`-9810
`-9811
`
`09816
`~9819
`-9823
`~9825
`-9827
`
`09829
`-9833
`-9837
`~9837
`-9839
`
`0-9840
`-9844
`-9846
`-9846
`-9849
`
`0~9850
`-9854
`'9853
`-9855
`'9855
`
`M1
`0-9135
`-9012
`-9026
`
`09072
`'9123
`-9174
`~9221
`'9264
`
`0-9303
`~9337
`-9369
`~9398
`-9424
`
`0-9447
`'9470
`-9491
`'9508
`'9523
`
`m
`0-9130
`~9019
`-9021
`
`09082
`-9120
`-9175
`.9215
`-9260
`
`0-9295
`.9338
`-9369
`-9399
`~9422
`
`09445
`~9470
`-9492
`-9509
`-9527
`
`0-9549
`-9558
`-9570
`'9579
`-9584
`
`09598
`-9607
`-9615
`-9624
`-9626
`
`0'9636
`'9642
`'9650
`-9654
`'9658
`
`0-9662
`-9670
`'9677
`'9678
`'9682
`
`0-9684
`-9691
`-9694
`~9695
`'9701
`
`0-9703
`~9710
`~9708
`-9712
`-9714
`
`fie
`0-005698
`'005166
`'004491
`
`0-003390
`-002995
`-002470
`-002293
`-001972
`
`0-001717
`'001483
`7001316
`'001168
`-001023
`
`0-000964
`-000823
`-000810
`~000711
`-00065l
`
`0-000594
`~000568
`'000504
`~000504
`-000458
`
`0000421
`'000404
`-000382
`-000369
`-000344
`
`0-000336
`-000326
`-000308
`-000293
`~000268
`
`0-000264
`-000253
`~000235
`'000239
`-000229
`
`0~000227
`'000212
`'000196
`-000193
`-000192
`
`0000184
`-000170
`-000179
`'000165
`-000154
`
`its/fl?
`— 0-5930
`— -8944
`— -8176
`
`— 1-1790
`— 1-3229
`— 1-3841
`- 1-5987
`—1-6655
`
`— 1-7494
`—1-7744
`— 1-7581
`— 1-9025
`— 1-8876
`
`—1-7968
`— 1-9468
`— 2-1391
`— 2-1305
`— 2-2761
`
`— 2-2827
`4 2-3984
`— 2-1862
`— 2-3517
`—-2-3448
`
`—2-4978
`—2-5903
`—2-6964
`— 2-6090
`—2-7288
`
`— 2-7997
`~ 2-6900
`- 3-0181
`— 3-0166
`— 2-8574
`
`— 2-7965
`— 3-1566
`— 3-0679
`«33283
`—3-1719
`
`— 3-0740
`— 3-2885
`— 3-2646
`— 3-0803
`— 3-1645
`
`— 3-3742
`— 33353
`— 32972
`- 3-2810
`— 3-3240
`
`iii/fl:
`2-3748
`3-7231
`7-8126
`
`5-4295
`6-4104
`7-1092
`8-4482
`9-2812
`
`11-0547
`11-9185
`13-0769
`14-0568
`16-7383
`
`17-6669
`22-1972
`24-7776
`29-7333
`32-5906
`
`36-0382
`44-5617
`40-7507
`43-4926
`46-3318
`
`58-9446
`60-5200
`64-1702
`68-9591
`71-7714
`
`77-4744
`76-8384
`93-2496
`100-4419
`108-5077
`
`91-7985
`120-0005
`118-2513
`134-3110
`136-4787
`
`129-9604
`136-3814
`151-7350
`140-2724
`137-2297
`
`176-0635
`179-2792
`173-6601
`183-9433
`212-4279
`
`This content downloaded from 128.255.6.125 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://about.jstor.org/terms
`
`AstraZeneca Exhibit 2171 p. 11
`
`
`
`602
`
`S. S. SHAPIRO AND M. B. WILK
`
`2-5. Approximation to the distribution of W
`
`in Lemma 5
`The complexity in the domain of the joint distribution of W and the angles
`necessitates consideration of an approximation to the null distribution of W. Since only
`the first and second moments of normal order statistics are, practically, available, it follows
`that only the one-half and first moments of W are known. Hence a technique such as the
`Cornish—Fisher expansion cannot be used.
`In the circumstance it seemed both appropriate and efficient to employ empirical samp-
`ling to obtain an approximation for the null distribution.
`Accordingly, normal random samples were obtained from the Rand Tables (Rand Corp.
`(1955)). Repeated values of W were computed for n = 3(1) 50 and. the empirical percentage
`points determined for each value of n. The number of samples, m, employed was as follows:
`
`for n = 3(1)20,
`
`n = 21(1)50, m =[
`
`m = 5000,
`
`100,000
`n
`
`.
`
`Fig. 4 gives the empirical c.D.F.’s for values of n: 5, 10, 15, 20, 35, 50. Fig. 5
`gives a plot of the 1, 5, 10, 50, 90, 95, and 99 empirical percentage points of W for
`n = 3(1)50.
`
`A check on the adequacy of the sampling study is given by comparing the empirical
`one-half and the first moments of the sample with the corresponding theoretical moments
`of W for n = 3(1)20. This comparison is given in Table 4, which provides additional
`assurance of the adequacy of the sampling study. Also in Table 4 are given the sample
`variance and the standardized third and fourth moments for n = 3(1) 50.
`After some preliminary investigation, the SB system of curves suggested by Johnson
`(1949) was selected as a basis for smoothing the empirical null W distribution. Details of
`this procedure and its results are given in Shapiro & Wilk (1965 a). The tables of percentage
`points of W given in §3 are based on these smoothed sampling results.
`
`3. SUMMARY OF OPERATIONAL INFORMATION
`
`The objective of this section is to bring together all the tables and descriptions needed
`to execute the W test for normality. This section may be employed independently of
`notational or other information from other sections.
`
`The object of the W test is to provide an index or test statistic to evaluate the supposed
`normality of a complete sample. The statistic has been shown to be an effective measure
`of normality even for small samples (n < 20) against a wide spectrum of non-normal alter-
`natives (see §5 below and Shapiro & Wilk (1964a)).
`The W statistic is scale and origin invariant and hence supplies a test of the composite
`null hypothesis of normality.
`To compute the value of W, given a complete random sample of size n, x1, x2, ...,xn,
`one proceeds as follows:
`(i) Order the observations to obtain an ordered sample yl s yz s
`(ii) Compute
`
`s yn.
`
`32 = Zj<yi_?7)2 =1§(xi-i)2-
`
`This content downloaded from 128.255.6.125 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://ab0ut.jst0r.0rg/terms
`
`AstraZeneca Exhibit 2171 p. 12
`
`
`
`An analysis of variance test for normality
`
`603
`
`(iii)
`
`(a) If n is even, n = 2k, compute
`k
`
`b = ‘2 an—i+1(yn—’L+1 — yr):
`’L=1
`
`where the values of an_i+1 are given in Table 5.
`(b) If n is odd, n = 219 + 1, the computation is just as in (iii) (a), since ah.+1 = 0 when
`n = 210+ 1. Thus one finds
`
`= dull/n “ 91) + --- + ak+2(yk+2 _ 3/15):
`
`where the value of yk+1, the sample median, does not enter the computation of b.
`(iv) Compute W = 192/82.
`(v)
`1, 2, 5, 10, 50, 90, 95, 98 and 99 % points of the distribution of W are given in Table 6.
`Small values of W are significant, i.e. indicate non-normality.
`(vi) A more precise significance level may be associated with an observed W value by
`using the approximation detailed in Shapiro & Wilk (1965a).
`
`Table 5. Ooeflioients {an_i+1} for the W test for normality,
`for n = 2(1)50.
`6
`7
`00431
`00233
`2806
`3031
`0875
`-1401
`0000
`
`8
`00052
`3104
`-1743
`-0561
`
`9
`0-5888
`3244
`-1976
`0947
`.0000
`
`10
`05739
`3291
`-2141
`-1224
`0399
`
`2
`0-7071
`H
`—
`
`1
`2
`3
`4
`5 ~
`
`3
`0-7071
`0000
`_
`
`—
`
`4
`0-6872
`-1077
`»—
`
`5
`0-6646
`-2413
`0000
`
`
`
`19
`091808
`-3232
`-2561
`2059
`-1641
`
`01271
`-0932
`-0612
`-0303
`-0000
`
`20
`0-4734
`-3211
`-2565
`-2085
`-1686
`
`01334
`-1013
`-0711
`0422
`-0140
`
`l
`
`11
`0-5601
`-3315
`-2260
`-1429
`-0695
`
`12
`0-5475
`-3325
`-2347
`-1586
`-0922
`
`13
`0-5359
`-3325
`-2412
`-1707
`-1099
`
`14
`0-5251
`3318
`2460
`-1802
`-1240
`
`15
`0-5150
`-3306
`-2495
`~1878
`-1353
`
`10
`0-5056
`-3290
`-2521
`-1939
`-1447
`
`17
`0-4968
`-3273
`-2540
`1988
`-1524
`
`18
`041886
`-3253
`-2553
`2027
`~1587
`
`0-0727
`0-0539
`0-0303
`0-0000
`-0240
`-0000
`—
`——
`
`
`—
`
`—
`
`0-0880
`-0433
`0000
`
`0-1005
`0593
`-0196
`
`0-1109
`-0725
`-0359
`-0000
`
`01197
`-0837
`-0496
`-0163
`
`
`
`21
`
`22
`
`23
`
`24
`
`25
`
`26
`
`27
`
`28
`
`29
`
`30
`
`0-4366
`3018
`2522
`2152
`-1848
`
`0-1584
`-1346
`-1128
`-0923
`-0728
`
`00540
`0358
`0178
`0000
`—
`
`0-4328
`2992
`-2510
`-2151
`-1857
`
`0-1601
`-1372
`-1162
`-0965
`-0778
`
`00598
`0424
`0253
`0084
`—
`
`0-4291
`-2968
`-2499
`-2150
`-1864
`
`0-1616
`~1395
`-1192
`1002
`-0822
`
`0-0650
`0483
`0320
`0159
`0000
`
`0-4254
`-2944
`-2487
`-2148
`-1870
`
`01630
`-1415
`-1219
`-1036
`-0862
`
`0-0697
`0537
`0381
`0227
`0076
`
`041643
`-3185
`-2578
`-2119
`~1736
`
`0-4590
`-3156
`-2571
`-2131
`-1764
`
`0-1399 01443
`-1092
`~1150
`0804
`-0878
`0530
`-0618
`-0263
`-0368
`
`041542
`-3126
`2563
`-2139
`-1787
`
`01480
`-l201
`-0941
`-0696
`0459
`
`04493
`-3098
`'2554
`-2145
`-l807
`
`01512
`~1245
`-0997
`-0764
`-0539
`
`0-4450
`-3069
`-2543
`-2148
`~1822
`
`01539
`1283
`-1046
`0823
`0610
`
`0-4407
`-3043
`-2533
`2151
`1836
`
`0-1563
`-1316
`-1089
`-0876
`0672
`
`0-0000 00122
`—
`—
`—
`
`00228
`0000
`
`0-0321
`0107
`
`
`
`00403
`0200
`0000
`
`00476
`0284
`0094
`
`—
`
`H
`
`—
`
`—
`
`—
`
`——
`
`1‘9‘)—
`
`ll
`12
`13
`14
`15
`
`This content downloaded from 128.255.6125 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://ab0ut.jst0r.0rg/terms
`
`AstraZeneca Exhibit 2171 p. 13
`
`
`
`604
`
`S. S. SHAPIRO AND M. B. WILK
`
`Table 5. Ooefl‘icients {an_i+1} for the W test for normality,
`for n = 2(1)50 (com?)
`34
`35
`36
`
`37
`
`38
`
`39
`
`40
`
`31
`
`32
`
`33
`
`03989
`-2755
`-2380
`'2104
`~1880
`
`0-1689
`1520
`~1366
`'1225
`-1092
`
`00967
`0848
`-0733
`-0622
`-0515
`
`00409
`-0305
`-0203
`'0101
`-0000
`
`03964
`~2737
`-2368
`-2098
`~1878
`
`0-1691
`‘1526
`-1376
`~1237
`-1108
`
`00986
`-0870
`'0759
`0651
`0546
`
`00444
`-0343
`'0244
`0146
`~0049
`
`04220 04188
`-2921
`-2898
`-2475
`-2463
`-2145
`-2141
`-1874
`'1878
`
`0-1641
`'1433
`-1243
`-1066
`0899
`
`0-1651
`'1449
`-1265
`-1093
`'0931
`
`00739 00777
`0585
`0629
`-0435
`-0485
`-0289
`-0344
`-0144
`-0206
`
`0-4156
`-2876
`~2451
`‘2137
`'1880
`
`01660
`-1463
`1284
`-1118
`'0961
`
`00812
`-0669
`-0530
`-0395
`-0262
`
`04127
`-2854
`'2439
`~2132
`'1882
`
`01667
`~1475
`-1301
`~1140
`-0988
`
`00844
`-0706
`-0572
`-0441
`-0314
`
`04096
`'2834
`-2427
`-2127
`‘1883
`
`01673
`1487
`-1317
`-1160
`~1013
`
`00873
`-0739
`'0610
`-0484
`0361
`
`04068
`-2813
`-2415
`-2121
`~1883
`
`01678
`1496
`~1331
`-1179
`-1036
`
`00900
`'0770
`-0645
`'0523
`'0404
`
`04040
`-2794
`-2403
`-2116
`-1883
`
`01683
`-1505
`-1344
`-1196
`-1056
`
`00924
`'0798
`-0677
`-0559
`-0444
`
`04015
`-2774
`‘2391
`~2110
`-1881
`
`01686
`~1513
`.1356
`-1211
`-1075
`
`00947
`0824
`-0706
`-0592
`-0481
`
`00239
`-0119
`-0000
`
`00287
`0172
`-0057
`
`00331
`0220
`'0110
`-0000
`
`00372
`-0264
`0158
`-0053
`
`
`
`00000 00068
`——
`H
`
`00131
`-0000
`
`00187
`'0062
`
`41
`
`42
`
`43
`
`44
`
`45
`
`46
`
`47
`
`48
`
`49
`
`50
`
`
`
`)—"°'NHI-lI-Ih-lhull-ls'
`
`03940 03917
`-2719
`-2701
`-2357
`-2345
`'2091
`-2085
`-1876
`'1874
`01693 01694
`-1531
`-1535
`-1384
`-1392
`-1249
`-1‘259
`~1123
`-1136
`
`11 01004 01020
`12
`-0891
`'0909
`13
`'0782
`'0804
`14
`-0677
`0701
`15
`-0575
`'0602
`
`16 00476 00506
`17
`-0379
`0411
`18
`-0283
`-0318
`19
`-0188
`-0227
`20
`'0094
`'0136
`
`03894
`-2684
`'2334
`-2078
`-1871
`01695
`-1539
`-1398
`-1269
`-1149
`
`01035
`'0927
`-0824
`0724
`-0628
`
`00534
`-0442
`-0352
`-0263
`-0175
`
`03872
`-2667
`-2323
`2072
`-1868
`01695
`-1542
`-1405
`-1278
`-1160
`
`01049
`-0943
`-0842
`'0745
`-0651
`
`00560
`0471
`0383
`-0296
`-0211
`
`03850
`-2651
`-2313
`'2065
`-1865
`01695
`'1545
`-1410
`1286
`-1170
`
`01062
`-0959
`-0860
`-0765
`-0673
`
`.
`
`00584
`-0497
`-0412
`-0328
`0245
`
`03830
`-2635
`-2302
`-2058
`-1862
`01695
`~1548
`-1415
`-1293
`-1180
`
`01073
`-0972
`0876
`-0783
`-0694
`
`00607
`-0522
`-0439
`0357
`-0277
`
`00197
`00163
`00126
`0008'!
`21 00000 00045
`-0118
`-0081
`22
`—
`0000
`-0042
`
`-0000
`0039
`23
`
`~
`24
`—
`~
`—«
`—«
`—
`—
`25
`
`03808
`2620
`-2291
`-2052
`-1859
`01695
`'1550
`~1420
`-1300
`~1189
`
`01085
`'0986
`-0892
`-0801
`-0713
`
`00628
`0546
`-0465
`-0385
`0307
`
`00229
`0153
`'0076
`-0000
`~—
`
`03789
`-2604
`2281
`'2045
`-1855
`01693
`-1551
`-1423
`-1306
`~1197
`
`01095
`-0998
`-0906
`0817
`-0731
`
`00648
`0568
`0489
`-0411
`'0335
`
`00259
`'0185
`-0111
`0037
`——
`
`03770
`-2589
`2271
`-2038
`-1851
`01692
`-1553
`-1427
`~1312
`-1205
`
`01105
`-1010
`-0919
`'0832
`-0748
`
`00667
`0588
`0511
`-0436
`-0361
`
`00288
`-0215
`-0143
`-0071
`-0000
`
`03751
`-2574
`-2260
`-2032
`-1847
`01691
`-l554
`-1430
`-1317
`~1212
`
`01113
`'1020
`-0932
`-0846
`0764
`
`00685
`'0608
`-0532
`-0459
`-0386
`
`0.0314
`-0244
`-0174
`-0104
`-0035
`
`This content downloaded from 128.255.6.125 on Mon, 29 May 2017 20:00:22 UTC
`All use subject to http://about.jstor.org/terms
`
`AstraZeneca Exhibit 2171 p. 14
`
`
`
`An analysis of variance test for normality
`
`605
`
`n
`
`3
`4
`5
`
`6
`7
`8
`9
`10
`
`11
`12
`13
`14
`15
`
`16
`17
`18
`19
`20
`
`Table 6. Percentage points of the W test* for n = 3(1) 50
`Level
`
`
`f_'_
`001
`
`0'98
`
`N
`0'99
`
`0'02
`
`0'05
`
`010
`
`0'50
`
`0-90
`
`0-95
`
`0-753
`-687
`-686
`
`0-713
`-730
`-749
`-764
`~781
`
`0-792
`-805
`814
`-825
`-835
`
`0-844
`-851
`858
`-863
`-868
`
`0-756
`-707
`-715
`
`0743
`-760
`~778
`-791
`-806
`
`0-817
`-828
`-837
`-846
`-855
`
`0-863
`-869
`-874
`-879
`-884
`
`0-767
`-748
`-762
`
`0-788
`-803
`-818
`829
`842
`
`0-850
`-859
`-866
`-874
`-881
`
`0-887
`-892
`-897
`-901
`-905
`
`0-789
`~792
`-806
`
`0826
`-838
`-851
`-859
`-869
`
`0876
`-883
`-889
`-895
`-901
`
`0-906
`-910
`-914
`-917
`-920
`
`0-923
`-926
`-928
`-930
`-931
`
`0-959
`-935
`-927
`
`0927
`-928
`-932
`~935
`'938
`
`0-940
`~943
`-945
`-947
`-950
`
`0952
`~954
`-956
`'957
`-959
`
`0-960
`'961
`-962
`-963
`-964
`
`0-998
`-987
`-979
`
`0-974
`-972
`~972
`-972
`-972
`
`0-973
`-973
`-974
`-975
`-975
`
`0976
`-977
`~978
`-978
`-979
`
`0980
`-980
`-981
`-981
`-981
`
`0999
`-992
`~986
`
`0-981
`-979
`~978
`-978
`-978
`
`0-979
`-979
`-979
`'980
`'980
`
`0-981
`~981
`-982
`'982
`-983
`
`0-983
`'984
`-984
`-984
`-985
`
`1-000
`-996
`-991
`
`0-986
`-985
`-984
`~984
`-983
`
`0-984
`-984
`-984
`-984
`-984
`
`0-985
`-985
`-986
`-986
`-986
`
`0-987
`-987
`-987
`-987
`-988
`
`1-000
`997
`-993
`
`0-989
`'988
`-987
`-986
`-986
`
`0-986
`~986
`-986
`-986
`-987
`
`0-987
`-987
`-988
`-988
`-988
`
`0-989
`-989
`'989
`'989
`-989
`
`21
`22
`23
`24
`25
`
`26
`27
`28
`29
`30
`
`31
`32
`33
`34
`35
`
`36
`37
`38
`39
`40
`
`41
`42
`43
`44
`45
`
`46
`47
`48
`49
`50
`
`0-873
`-878
`-881
`884
`-888
`
`0-891
`-894
`896
`-898
`~900
`
`0-902
`-904
`-906
`-908
`~910
`
`0-912
`-914
`~916
`-917
`-919
`
`0-920
`-922
`~923
`-924
`-926
`
`0927
`-928
`-929
`-929
`-930
`
`0-888
`-892
`-895
`-898
`-901
`
`0904
`~906
`-908
`~910
`-912
`
`0-914
`~915
`-917
`-919
`-920
`
`0-922
`-924
`-925
`~927
`~928
`
`0-929
`-930
`~932
`-933
`~934
`
`0-935
`~936
`-937
`-937
`-938
`
`0-908
`-911
`'914
`-916
`-918
`
`0-920
`-923
`-924
`-926
`-927
`
`0-929
`-930
`-931
`-933
`-934
`
`0-935
`-936
`~938
`-939
`~940
`
`0-941
`~942
`-943
`~944
`-945
`
`0-945
`-946
`-947
`-9