throbber
BASIC STATISTICS AND
`PHARMACEUTICAL
`STATISTICAL
`APPLICATIONS
`
`James E. De Muth
`
`University of Wisconsin—Madison
`Ma dison, Wisconsin
`
`MARCH].
`
`fl MARCEL DEKKER. INC.
`
`DEKKER
`
`NEW YORK - BASEL .
`Mylan v. MonoSol
`Mylan V. MonoSol
`IPR2017-00200
`IPR2017-00200
`MonoSol Ex. 2030
`MonoSol EX. 2030
`Page 1
`Page 1
`
`f]
`
`

`

`Library of Congress Cataloging—in-Publication Data
`
`De Muth, James E.
`Basic statistics and pharmaceutical statistical applications E James E. De Muth.
`p. cm.-(Biostatistics; 2)
`Includes bibliographical references and index.
`ISBN 0-8247-1967-0 (alk. paper)
`1. Phamracy—Statistical methods.
`Biostatistics (New York, N. Y.); 2.
`RS57.D46 ' 1999
`
`2. Statistics.
`
`I. Title
`
`II. Series:
`
`61 5.1‘072-dc21
`
`99-30733
`CIP
`
`This book is printed on acid-free paper.
`
`Headquarters
`Marcel Dekker, Inc.
`
`270 Madison Avenue, New York, NY 10016
`tel: 212-696-9000; fax: 212-685—4540
`
`Eastern Hemisphere Distribution
`Marcel Dekker AG
`
`Hutgasse 4, Postfach 812, CH-4001 Basel, Switzerland
`tel: 41-61-261-8482; fax: 41-61-261-8896
`
`World Wide Web
`
`http:r'fwww.dekker.com
`
`The publisher offers discounts on this book when ordered in bulk quantities. For more infor-
`mation, write to Special Salesfl’rofcssional Marketing at the headquarters address above.
`
`Copyright © 1999 by Marcel Dekker, Inc. All Rights Reserved.
`
`Neither this book nor any part may be reproduced or transmitted in any form or by any
`means, electronic or mechanical, including photocopying, microfilming, and recording, or
`by any information storage and retrieval system, without permission in’ writing-from the
`publisher.
`.
`,
`
`Current printing (last digit):
`)0 9
`8 7 6 5 4 3 2 l
`
`PRINTED IN THE UNITED STATES OF AMERICA
`
`Page 2
`Page 2
`
`

`

`6
`
`The Normal Distribution and
`
`Confidence Intervals
`
`the normal distribution is a
`shaped” curve,
`Described as a “bell
`symmetrical distribution which is one of the most commonly occurring
`outcomes in nature and its presence is assumed in several of the most
`commonly used statistical tests. Properties of the normal distribution have a
`very important role in the statistical
`theory of drawing inferences about
`population parameters (estimating confidence intervals) based on samples
`drawn from that population.
`
`The Normal Distribution
`
`The normal distribution is the most important distribution in statistics.
`This curve is a special frequency distribution that describes the population
`distribution of many continuously distributed biological
`traits. The normal
`distribution is often referred to as the Gaussian distribution, after
`the
`
`mathematician Carl Friedreich Gauss, even though it was first discovered by
`the French mathematician Abraham DeMoivre (Porter, 1986).
`
`Mean
`
`115
`
`Page 3
`Page 3
`
`

`

`116
`
`Chapter 5
`
`to realize that we are focusing our initial
`this point
`is critical at
`It
`discussion on the total popuiat‘ion, not a sample. As mentioned in the previous
`chapter, in the population, the mean is expressed as g and standard deviation as
`O".
`
`The characteristics of a normal distribution are as follows.
`
`First,
`
`the
`
`normal distribution is continuous and the curve is symmetrical about the mean.
`Second, the mode, median, and mean are equal and represent the middle of the
`distribution. Third, since the mean and median are the same,
`the 50th
`percentile is at the mean with an equal amount of area under the curve, above
`and below the mean. Fourth, the probability of all possible outcomes is equal to
`1.0, therefore, the total area under the curve is equal to 1.0. Since the mean is
`the 50th percentle, the area to left or right of the mean equals 0.5. Fifth, by
`definition, the area under the curve between one standard deviation above and
`one standard deviation below the mean contains an area equal to approximately
`63% of the total area under the curve. At two standard deviations this area is
`
`Sixth, as distance from the mean (in the positive or
`approximately 95%.
`approaches
`infinity,
`the
`frequency of occurrences
`negative direction)
`approaches zero. This last point illustrates the fact that most observations
`cluster around the center of the distribution and very few occur at the extremes
`of the distribution. Also, if the curve is infinite in its bounds we cannot set
`absolute external limits on the distribution.
`
`The frequency distribution (curve) for a normal distribution is defined as
`follows:
`
`f, —
`
`
`1
`
`022:
`
`e"txf”‘)2’9""
`
`Eq. 6.1
`
`yz(pi) = 3.14159 and e = 2.71328 (the base of natural logarithms).
`where:
`In a normal distribution, the area under the curve between the mean and
`
`one standard deviation is approximately 34%. Because of the symmetry of the
`distribution, 68% of the curve would be divided equally above and below the
`mean. Why 34%? Why not a nice round number like 35%, 30%, or even
`better 25%? The standard deviation is that point of inflection on the normal
`curve where the frequency distribution stops its descent to the baseline and
`begins to pull parallel with the x-axis. Areas or proportions of the normal
`distribution associated with various standard deviations are seen in Figure 6.1.
`The term “the bell-shaped curve” is a misnomer since there are many bell-
`shaped curves, ranging from those which are extremely peaked With very small
`ranges to those which are much flatter with wide distributions (Figure 6.2). A
`normal distribution is completely dependent on its parameters of p and a. A
`
`Page 4
`Page 4
`
`

`

`The Normal Distribution and Confidence Intervals
`
`117
`
`
`
`Figure 6.1 Proportions between various standard deviations under a normal
`distribution.
`
`
`
`Figure 6.2 Example of three normal distributions with different means and
`difl‘erent standard deviations.
`
`standardized normal distribution has been created to compare and compute
`variations in such a distribution regardless of center or spread from the center.
`In this standard normal distribution the mean equals 0 (Figure 6.3). The spread
`of the distribution is also standardized by setting one standard deviation equal
`to +1 or -1, and two standard deviations equal to +2 or —2 (Figure 6.4).
`As seen previously, the area between +2 and -2 is approximately 95%.
`Additionally,
`fractions of a standard deviation are calculated and their
`equivalent areas presented.
`If such a distribution can be created (with a mean
`equal to zero and standard deviation equal to one) then the equation for the
`frequency distribution can be simplified to:
`
`Page 5
`Page 5
`
`

`

`118
`
`Chapter 6
`
`Figure 6.3 Example of three normal distributions with the same mean and
`different standard deviations.
`
`
`
`
`
`
`1",.
`
`I
`
`~ 2. 5066272
`
`2.7i828'tnm
`
`f,- = 1. 08443 71""‘1’2’2
`
`'
`
`Eq. 6.2
`
`Table 6.1 is an abbreviation of a Standard Normal Distribution (a more
`complete distribution is presented in Table B2 in Appendix B, where every
`hundredth of the z—distribution is defined between 0.01 to 3.09). An important
`
`Page 6
`Page 6
`
`

`

`The Normal Distribution and Confidence Intervals
`
`119
`
`Table 6.1 Selected Areas of a Normal Standardized Distribution
`
`(Proportion of the Curve Between 0 and z)
`
`
`
`feature of the standard normal distribution is that the number of standard
`
`deviations away from the population mean can be expressed as a given percent
`or proportion of the area of the curve. The symbol 2, by convention, symbolizes
`the number of standard deviations away from the population mean. The
`numbers in these tables represent the area of the curve which falls between the
`
`mean (2 = 0) and that point on the distribution above the mean (i.e., z = +1.5,
`would be the point at 1.5 standard deviations above the mean). Since'the mean
`is the 50th percentile, the area of the curve that falls below the mean (or below
`zero) is .5000. Because a normal distribution is symmetrical, this table could
`also represent the various areas below the mean. For example, for z = —1.5 (or
`1.5 standard devialions below the mean), 2 represents the same area from 0 to
`-l.5, as the area from 0 to +1.5. A z-value tells us how far above and below the
`mean any given score is in units of the standard deviation.
`Using the information in Table 6.1, the area under the curve that falls
`below +2 would be the area between +2 and (1, plus the area below 0.
`
`Page 7
`Page 7
`
`

`

`120
`
`Chapter 6
`
`Area (<+2) = Area (between 0 and +2) + Area (below 0)
`Area (<+2) = .4772 + .5000 = .9772
`
`These probabilities can be summed because of the addition theorem discussed
`
`in Chapter 2.
`All possible events would fall within this standard normal distribution
`(pflx) = 1.00). Since the probability of all events equals 1.00 and the total area
`under the curve equals 1, then various areas within a normalized standard
`distribution can also represent probabilities of certain outcomes.
`In the above
`example, the area under the curve below two standard deviations (represented
`as +2) was .9972. This can also be thought of as the probability of an outcome
`being less than two standard deviations above the mean. Conversely,
`the
`probability of being two or more standard deviations above the mean would be
`10000 — .9772 or .0228.
`
`and below the mean,
`above
`standard deviations
`Between three
`approximately 99.8% of the observations will occur. Therefore, assuming a
`normal distribution, a quick method for roughly appr0ximating the standard
`deviation is to divide the range of the observations by six, since almost all
`observations will fall within these six intervals. For example, consider the data
`in Table 4.3. The true standard deviation for this data is 16.8 meg. The range
`of 86 meg, divided by six would give a rough approximation of 14.3 meg.
`It is possible to calculate the probability of any particular outcome within a
`normal distribution. The areas within specified portions of our curve represent
`the probability of the values of interest lying between the vertical lines. To
`illustrate this, consider a large container of tablets (representing a total
`population) which is expected to be normally distributed with respect to the
`tablet weight. What is the probability of randomly sampling a tablet that weighs
`within 1.5 standard deviations of the mean?
`
`
`
`-l.5
`
`+1.5
`
`Because weight is a continuous variable, we are concerned with p(>-1.5 or
`<+1.5). From Table 6.1:
`
`Page 8
`Page 8
`
`

`

`The Normal Distribution and Confidence Intervals
`
`121
`
`p(z<‘+l.5) # Area between 0 and +1.5 =
`p(z>-l.5) = Area between 0 and -l.5 =
`p(z —1.5 to +1.5)=
`
`.4332
`.4332
`.8664
`
`There is a probability of .8664 (or 87% chance) of sampling a tablet within 1.5
`standard deviations of the mean. What is the probability of sampling a tablet
`greater than 2.25 standard deviations above the mean?
`
`L2
`
`+2.25
`
`First, we know that the total area above the mean is .5000. By reading Table
`6.1, the area between 225 standard deviations (z = 2.25) and the mean is .4878
`(the area between 0 and +2.25). Therefore the probability of sampling a tablet
`weighing more than 2.25 standard deviations above the mean weight is:
`
`p(z>+2.25) = .5000 - .4878 = .0122
`
`If we wish to know the probability of a tablet being less than 2.25 standard
`deviations above the mean, the complement probability of being less than a 2-
`value of +2.25 is:
`
`p(z<+2.25) = l - p(z>+2.25) = 1.000 - .0122 = .9878
`
`Also calculated as:
`
`- p(z<+2.25) = p(z<0) + p(z<2.25) = .5000 + .4878 = .9878
`
`If the mean and stande deviation of a population are known, the exact
`location (2 above or below the mean) for any observation can be caICulated
`using the following formula:
`
`
`z= x7”
`
`Eq. 6.3
`
`Page 9
`Page 9
`
`

`

`122
`
`Chapter 6
`
`Because values in a normal distribution are on a continuous scale and are
`
`handled as continuous variables, we must correct for continuity. Values for x
`
`would be as follows:
`
`Likelihood of being: greater then 185 mg =
`
`p(>185.5);
`
`less than 200 mg =
`200 mg or greater =
`between and including 185 and 200 mg;
`
`p(< 199.5);
`p(>199.5); and
`p(>184.5 and <200.5).
`
`To examine this, consider a sample from a known population with expected
`population parameters (previous estimates of the population mean and standard
`deviation, for example based on prior production runs). With an expected
`population mean assay of 750 mg and a population standard deviation of 60
`mg, what is the probability of sampling a capsule with an assay greater than
`850 mg?
`
`L
`
`850.5
`
`As 'a continuous variable, the p(>850 mg) is actually p(>850.5 mg) when
`corrected for continuity.
`
`2
`
`
`_ x—y _ 8505-750 _ 100.5
`—+1.68
`0'
`60
`60
`
`p(z>+1.68)
`
`IJ-
`
`.5000 - p(z<l.68)
`.5000 - .4535 = .0465
`
`Given the same population as above, what is the probability of randomly
`sampling a capsule with an assay between 700 and 825 mg?
`
`Page 10
`Page 10
`
`

`

`The Normal Distribution and Confidence Intervals
`
`123
`
`
`
`699.5
`
`825.5
`
`Once again correcting for continuity, p(<825 mg) is rewritten as p(<825.5 mg)
`and p(>700 mg) is really p(>699.5 mg).
`
`_ x—y _ 825.5-750_ 75.5
`:—
`_—__=+1.26
`‘0
`50
`60
`
`
`z_ x—# _ 699-5'750 — '50'5 —-0.84
`a
`60
`60
`
`p(z_<+l.26) + p(z>-o.34)
`p(between 699.5 and 325.5) =
`: .3962 + .2995 = .5957
`
`Given the same population, what is the probabiiity of randomly sampling a
`capsule with an assay less than 600 mg?
`
`599.5
`
`As a continuous variable, p(<600 mg) is p(>599.5 mg):
`
`450.5
`599.5-750
`-
`22H:—:—=_2I5
`0'
`60
`60
`
`Page 11
`Page 1 1
`
`

`

`124
`
`Chapter 6
`
`p(z<—2.5) = .5000 -p(z<2.5)
`= 5000-4938 = .0062
`
`in these examples with the given population mean and standard
`Thus,
`deviation, the likelihood of randomly sampling a capsule greater than 350 mg
`is approximately 5%, a capsule less than 600 mg is less than 1%, and a capsule
`between 700 and 825 mg is almost 70%.
`Lastly,‘ the probability of obtaining any one particular value is zero, but we
`can determine probabilities for specific ranges. Correcting for continuity, the
`value 750 (the mean) actually represents an infinite number of possible values
`betvveen 749.5 and 750.5 mg. The area under the curve between the center and
`the upper limit would be
`
`Z;x-p =750.5-750=£=0‘01
`or
`60
`60
`
`p(z<0.01) = .004
`
`Since there would be an identical area between 749.5 and the mean, the total
`
`proportion associated with 750 mg would be
`
`p(750 mg) 2 .003
`
`In the previous examples we knew both the population mean (y) and the
`population standard deviation (a). However, in most statistical investigations
`this information is not available and formulas must be employed that use
`estimates of these parameters based on the sample results.
`' Important z—values related to other areas under the curve for a normal
`distribution include:
`
`90% -l.64 < z < +1.64
`
`95% -1.96 < z < +1.96
`
`99% -2.57 < z < +2.57
`
`Determining if the Distribution is Normal
`
`The sample selected is our best guess at the composition ofthe population:
`the center, the dispersion, and the shape of the distribution. Therefore, the
`appearance of the sample is our best estimate, if the population is normally
`distributed. In the absence of any information which would disprove normality,
`it is assumed that a normal distribution exists (i.e., initial sample results look
`
`Page 12
`Page 12
`
`

`

`The Normal Distribution and Confidence Intervals
`
`125
`
`extremely skewed or'bimodal).
`One quick method to determine if the population is normally
`distributed is to determine if the sample mean and median are approximately
`equal;
`If .they are about the same (similar in value) then the population
`probably has a normal distribution. If the mean is substantially greater than the
`median,
`the population is probably positively skewed and if the mean is
`substantially less than the median, a negatively skewed population probably
`exists. Other methods for determining normality would be to:
`l) plot the
`distribution on graph paper; 2) plot histogram cumulative frequencies on
`probability paper; or 3) do a chi square goodness-of—fit
`test (described in
`Chapter 15).
`Simple visual methods to determine if sample data is consistent with
`expectations for a normal distribution is to plot a cumulative frequency curve
`or use special graph paper known as probability paper.
`In a cumulative
`frequency curve, data is arranged in order of increasing size and plotted on
`normal graph paper:
`
`% cumulative frequency:
`
`cumulative fi-equency
`n
`
`x100
`
`If the data came from a normally distributed population, the result will be an S-
`
`shaped curve.
`Probability paper has a unique non—linear scale (i.e., National #12-083
`or Keuffel and Esser #46-8000) on the cumulative frequency axis which will
`convert the S—shaped curve to a straight line.
`If a straight line can be drawn
`through the percent cumulative frequency data points, the estimated population
`is normally distributed.
`If the curvilinearly relationship exists, the population
`is skewed.
`
`Sampling Distribution
`
`If we have a population and withdraw a random sample of observations
`from that population, we could calculate a sample mean and a sample standard
`deviation. As mentioned previously,
`this information would be our best
`estimate of the true population parameters.
`
`X sample as
`
`“population
`
`S sample
`
`53 0 population
`
`The characteristics of dispersion or variability are not unique to samples alone.
`Individual samples can also vary around the population mean. Just by chance,
`
`Page 13
`Page 13
`
`

`

`126
`
`Chapter 6
`
`or luck, we could have sampled from the upper or lower ends of the population
`distribulion and calculated a sample mean that was too high or too low.
`Through no fault of our own, our estimate of the population mean would be
`erroneous.
`.
`
`To illustrate this point, let us return to the pharmacokinctic data used in
`Chapter 4. From this example, we will assume that the data in Table 4.3
`represented the entire population of pharmacokinetic studies ever conducted on
`this drug. ' Due to budgetary restraints or time, we were only able to analyze
`five samples from this population. How many possible ways could five samples
`be randomly selected from this data? Based on the combination formula (Eq.
`2.11) there would be
`
`
` 125 l
`= 125' = 234,53I,275
`52'120!
`
`5
`
`possible ways. Thus, it is possible to sample these 125 values in over 234
`million different ways and because they are sampled at random, each possible
`combination has an equal likelihood of being selected. Therefore, by chance
`alone we could sample the smallest five values in our population (Sample A) or
`the largest five (Sample D) or any combination in between these extremes
`(Table 6.2). Samples B and C were generated using the Random Numbers
`Table B1 in Appendix B.
`The mean" is a more efiicient estimate measure of the center, because with
`
`repeated samples of the same size from a given population, the mean will show
`less variation than either the mode or the median. Statisticians have defined
`
`this outcome as the central limit theorem and its derivation is beyond the scope
`of this book. However, there are three important characteristics that will be
`utilized in future statistical tests.
`
`1. The mean of all possible sample means is equal to the mean -
`of the original population from which they were sampled.
`
`Ear
`
`Eq. 6.4
`
`If we averaged all 234,531,275 possible sample means, this
`grand mean or mean of the mean would eQual the population
`mean (a = 752.4 meg for N = 125) from which they were
`sampled.
`
`Page 14
`Page 14
`
`

`

`The Normal Distribution and Confidence Intervals
`
`127
`
`Table 6:2 Possible Samples from Population Presented as Table 4.3
`
`Sample A
`706
`
`Sample B
`731
`
`Sample C
`724
`
`Sample D
`778
`
`714
`718
`
`720
`
`E
`
`716.4
`6.8
`
`‘
`
`760
`752
`
`736
`
`E
`
`752.8
`21.5
`
`752
`762
`
`734
`
`E
`
`749.4
`20.6
`
`785
`788
`
`790
`
`m
`
`786.8
`5.7
`
`Mean=
`SD. =
`
`2. The standard deviation for all possible sample means is equal
`to the population standard deviation divided by the square
`root of the sample size.
`
`—'-—'—
`
`Eq. 6.5
`
`the standard
`Similar to the mean of the sample means,
`deviation for all
`the possible means would equal
`the
`population standard deviation divided by the square root of
`the sample size. The standard deviation for the means is
`referred to as the standard error of the mean or SEM.
`
`3. Regardless of whether the population is normally distributed
`or skewed,
`if we plot all
`the possible sample means,
`the
`frequency distribution will approximate that of a normal
`distribution, based on the central
`limit
`theorem.
`This
`
`to many statistical formulas because it
`theorem is critical
`justifies
`the assumption of normality.
`The
`sampling
`distribution will
`approximate
`a
`normal
`distribution,
`regardless of the distribution of the original population, when
`the sample size is relatively large. However, a sample size as
`small as n=30 will often result in a near normal sampling
`distribution (Kachigan, 1991),
`
`they would produce a
`if all 234,531,275 possible means were plotted,
`frequency distribution which is normally distributed. Because the sample
`means are normally distributed, values in the normal standardized distribution
`(2 distribution), will also apply to the distribution of sample means. For
`example, of all the possible sample means:
`
`Page 15
`Page 15
`
`

`

`128
`
`Chapter 6
`
`68% fall within + or - 1.00 SEM
`
`90% fall within + or - 1.64 SEM
`
`95% fall within + or - 1.96 SEM
`
`99% fall within + or - 2.57 SEM
`
`The distribution of the mean will be a probability distribution, consisting of
`various values and their associated probabilities, and if we sample from any
`population, the resultant means will be distributed on a normal bell shaped
`curve. Most will be near the center and 5% will be outside 1.96 standard
`
`deviations of the distribution.
`
`Standard Error of the Mean versus the Standard Deviation
`
`in a sampling distribution, the overall
`As seen in the previous section,
`mean of the means would be equal to the population mean and the dispersion
`would depend on the amount of variance in the population. Obviously, the
`more we know about our population (the larger the sample size), the better our
`estimate of the population center. The best estimate of the population standard
`deviation is the sample standard deviation, which can be used to replace the o-
`in Eq. 6.5 to produce a standard error of the mean based on sample data:
`
`3;: is SEM
`J;
`
`Eq, 6.6
`
`The standard deviation (S or SD) describes the variability within a sample;
`whereas,
`the standard error of the mean (SEM)
`represents the possible
`variability of the mean itself. The SEM is sometimes referred to as the
`standard error (SE) and describes the variation of all possible sample means
`and equals the SD of the sample data divided by the square root of the sample
`size. As can be seen by the formula, the distribution of sample means (the
`standard error of the mean) will always be smaller than the dispersion of the
`sample (the standard deviation).
`Authors may erroneously present the distribution of sample results by using
`the SEM to represent information because there appears to be less variability.
`This may be misleading since the SEM has a different meaning than the SD.
`The SEM is smaller than the SD and the intentional presentation of the SEM
`instead of the larger SD is a manipulation to make data look more precise. The
`SEM is extremely important in the estimation of a true population mean, based
`on sample results. However, because it
`is disproportionately 10w, it should
`never be used as a measure of the distribution of sample results. For example,
`the SEM from our previous example of liquid fill volumes (Table 5.1) is much
`
`Page 16
`Page 16
`
`

`

`The Normal Distribution and Confidence Intervals
`
`129
`
`smaller (by a factor of almost six) than the calculated standard deviation:
`
`SEM=9£= 0.152
`J37)
`
`the term standard error refers to the variability of a
`By convention,
`sampling distribution. However, authors still use the standard error of the
`mean to present sample distributions, because the SEM is much smaller than
`the SD and presents a much smaller variation of the results. An even more
`troublesome occurrence is the failure of authors to indicate in reports or
`publications whether a result represents a SD or a SEM. For example, the
`reporting of 456.1 : 1.3, with no indication of what the term to the right of the
`signs represents. Is this a very tight SD? Is it the SEM? Could it even be the
`RSD?
`
`Standard error of the mean can be considered as a measure of precision.
`Obviously, the smaller the SEM, the more confident we can be that our sample
`mean is closer to the true population mean. However, at the same time, large
`increases in sample size produce relatively small changes in this measure of
`precision. For example, using a constant sample SD of 21.5 for sample B,
`presented above,
`the measure of SEM changes very little as sample sizes
`increase past 30 (Figure 6.5). A general rule of thumb is that with samples of
`30 or more observations, it is safe to use the sample standard deviation as an
`estimate of population standard deviation.
`
`Confidence Intervals
`
`As discussed in Chapter 5, using a random sample and independent
`
`measures, one can calculate measures of central tendency (X and S ). The
`result represents only one sample that belongs to a distribution of many possible
`sample means. Because we are dealing with a sample and in most cases don't
`know the true population parameters, we often must make a statistical “guess”
`at these parameters. For example, the previous samples A through D all have
`calculated means, any of which could be the true mean for the population from
`which they were randomly sampled.
`In order to define the true population
`mean, we need to allow for a range of possible means based on our estimate:
`
`Population 2
`Mean
`
`+ " Fudge"
`Estimate
`SampIeMean _ Factor
`
`Page 17
`Page 17
`
`

`

`130
`
`Chapter 6
`
`l4
`
`12
`
`10
`
`8
`
`5
`
`4
`
`2 0
`
`SEM
`
`12.40
`9.62
`6.80
`
`5.55
`
`4.81
`4.30
`
`3.93
`3.04
`2.15
`
`g
`
`l1
`
`3
`5
`10
`
`15
`
`20
`25
`
`30
`50
`. 100
`
`510 20
`
`3o
`
`50
`
`100
`
`Sample Size
`
`Figure 6.5 Variation in standard error of the means by sample size.
`
`This single estimate of the population mean (based on the sample) can be
`referred to as a point estimate. The result is a range of possible outcomes
`
`defined as boundary values or confidence limits. At the same time, we would
`like to have a certain amount of confidence in our statement that the population
`mean falls within these boundary values. For example, we may want to be 95%
`certain that we are correct, or 99% certain. Note again that because it is a
`sample, not an entire population, we cannot be 100% certain of our prediction.
`The only way to be 100% certain would be to measure every item in the
`population and in most cases that
`is either impractical or impossible to
`accomplish. Therefore, in order to have a certain confidence in our decision
`(i'.e., 95% or 99% certain) we need to add to our equation a factor to allow us
`this confidence:
`
`Population =
`Mean
`
`+ Reliability x Standard
`Estimate
`SampleMean _ Coeflicient
`' Error
`
`E 6 7
`(1'
`'
`
`This reliability coefficient can be obtained from the normal standardized
`
`distribution. For example if we want to be certain 95% of the time, we Will
`allow an error 5% of the time. We could error to the high side or low side and
`if we wanted our error divided equally between the two extremes, we would
`allow a 2.5% error of being too high in our estimation and 2.5% of being too
`low in our estimate of the true population mean.
`In Table 132 of Appendix B
`we find that 95% of the area under the curve falls between -1.96 z and +1.96 2.
`
`This follows the theory of the normal distribution where 95% of the values, or
`
`Page 18
`Page 18
`
`

`

`The Normal Distribution and Confidence Intervals
`
`131
`
`in this case sample means, fall within 1.96 standard deviations of the mean.
`The actual calculation for the 95% confidence interval would be:
`
`fl : a? i Z{}_a/2) x %.
`
`n
`
`or in the case of a 95% confidence interval:
`
`J J
`
`;
`
`p L E i (1.96)
`
`The standard error term or standard error of the mean term, is calculated based
`
`on the population standard deviation and specific sample size. If the confidence
`interval were to change to 99% or 90%, the reliability coefficient would change
`to 2.57 and 1.64 respectively (based on values in Table Bl where 0.99 and 0.90
`of the area fall under the curve).
`In creating a range of possible outcomes
`instead of one specific measure “it is better to be approximately correct, than
`precisely wrong” (Kachigan, 1991, p.99).
`Many of the following chapters will deal with the area of confidence
`intervals and tests involved in this area. But at this point let us assume that we
`
`know the population standard deviation (0%, possibly through historical data or
`previous tests.
`In the case of the pharmacokinetic data (Table 4.3),
`the
`population standard deviation is known to be 168, based on the raw data, and
`was calculated using the formula to calculate a population standard deviation
`(Eq. 5.5 and 5.6). Using the four samples in Table 6.2 it is possible to estimate
`the population mean based on data for each sample. For example, with Sample
`A:
`
`#5 = 716.4:15'6fl = 716.4i14.7
`,g
`
`7017 < ,u < 73].] mag.
`
`The best estimate of the population mean (for the researcher using Sample A)
`would be between 701.? and 731.1 meg. Note that the “fudge factor” would
`remain the same for all four samples since the reliability coefiicient will remain
`constant (1.96) and error term (the population standard deviation divided by
`square root of the sample size) does not change. Therefore the results for the
`other three samples would be:
`
`Page 19
`Page 19
`
`

`

`132
`
`Chapter 5
`
`Population mean = 752.4
`
`Sample A
`
`I Sample B
`
`690
`
`710
`
`730
`
`750
`
`770
`
`790
`
`810
`
`mcg
`
`Figure 6.6 Sample results compared with the population mean.
`
`Sample B:
`
`Sample C:
`
`Sample D:
`
`u = 752.8 4; 14.7
`738.] < p < 767.5 mcg
`
`ll. = 749.4 J; 14.7
`734.7 < p, < 764.1 mcg
`
`p = 786.8 i 14.7
`772.1 < p, < 801.5 mcg
`
`From our previous discussion of presentation mode, the true population mean
`for these 125 data points is a Cmax of 752.4 mcg.
`In the case of samples B and
`C, the true population mean did fall within the 95% confidence interval and we
`were correct in our prediction of this mean. However, with the extreme
`samples (A and D), the population mean falls outside thc confidence interval
`(Figure 6.6). With over 234 million possible samples and using the reliability
`coefficient (95%), almost 12 million possible samples will give us erroneous
`results.
`'
`_
`‘
`Adjusting the confidence interval can increase the likelihood of predicting
`the correct population mean. One more sample was drawn consisting of five
`outcomes and the calculated mean is 768.4.
`If a 95% confidence interval is
`
`calculated, the population mean falls outside the interval.
`
`Page 20
`Page 20
`
`

`

`The Normal Distribution and Confidence Intervals
`
`133
`
`,u — 768.4 i 1.96fl = 768.4 :t 14.?
`J?
`
`753.7 < ‘u < 783Jmcg
`
`However, if we decrease our confidence to 90%, the true population mean falls
`even further outside the interval.
`
`,u = 768.4 i 1.64fl = 768.4 i 12.4
`,5
`
`756.0 < p < 780.8 mag
`
`Similarly, if we increase our confidence to 99%, the true population mean will
`be found within the predicted limits.
`
`,1:
`
`-“- 768.4 i 257-133: = 768.4:l: 19.3
`J?
`
`749.] < p < 7877ng
`
`As seen in Figure 6.7, as the percentage of confidence increases, the width of
`the confidence interval increases. Creation and adjustment of the confidence
`intervals is the basis upon which statistical analysis and hypothesis testing is
`based.
`_
`to make a
`What we have accomplished is our first inferential statistic;
`statement about a population parameter based on a subset of that population.
`The z-test is the oldest of the statistical tests and was often called the critical
`
`ratio in early statistical literature. The interval estimate is our best guess,
`with a certain degree of confidence, where the actual parameter exists. We
`must allow for a certain amount of error (i.e., 5% or 1%) since we do not know
`the entire population. As shown in Figure 6.7, as our error decreases the width
`of our interval estimate will increase.
`In order to be 100% confident, our
`
`estimate of the interval would be from -oc to +0: (negative to positive infinity).
`Also as can be seen in the formula for the confidence interval estimate, with a
`
`large sample size, the standard error term will decrease and our interval width
`will decrease. Relating back to terms defined in Chapter 3, we can relate
`confidence interval in terms of precision and the confidence level is what we
`establish as our reliability.
`'
`
`Page 21
`Page 21
`
`

`

`134
`
`Chapter 6
`
`Population mean = 752.4
`
`95% Confidence Interval
`
`90% Confidence Interval
`
`
`
` 99% Confidence Interval
`
`690
`
`710
`
`730
`
`750
`
`770
`
`790
`
`810
`
`mcg
`
`Figure 6.7. Sample results with different confidence levels compared with the
`hue population mean.
`
`As we shall see in future chapters, a basic assumption for many statistical
`tests (i.e., student t—test, F-test, correlation) is that populations from which the
`samples are selected are composed of random outcomes that approximate a
`normal distribution.
`If this is true, then we know many characteristics about
`our population with respect to its mean and standard deviati

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket