`PHARMACEUTICAL
`STATISTICAL
`APPLICATIONS
`
`James E. De Muth
`
`University of Wisconsin—Madison
`Ma dison, Wisconsin
`
`MARCH].
`
`fl MARCEL DEKKER. INC.
`
`DEKKER
`
`NEW YORK - BASEL .
`Mylan v. MonoSol
`Mylan V. MonoSol
`IPR2017-00200
`IPR2017-00200
`MonoSol Ex. 2030
`MonoSol EX. 2030
`Page 1
`Page 1
`
`f]
`
`
`
`Library of Congress Cataloging—in-Publication Data
`
`De Muth, James E.
`Basic statistics and pharmaceutical statistical applications E James E. De Muth.
`p. cm.-(Biostatistics; 2)
`Includes bibliographical references and index.
`ISBN 0-8247-1967-0 (alk. paper)
`1. Phamracy—Statistical methods.
`Biostatistics (New York, N. Y.); 2.
`RS57.D46 ' 1999
`
`2. Statistics.
`
`I. Title
`
`II. Series:
`
`61 5.1‘072-dc21
`
`99-30733
`CIP
`
`This book is printed on acid-free paper.
`
`Headquarters
`Marcel Dekker, Inc.
`
`270 Madison Avenue, New York, NY 10016
`tel: 212-696-9000; fax: 212-685—4540
`
`Eastern Hemisphere Distribution
`Marcel Dekker AG
`
`Hutgasse 4, Postfach 812, CH-4001 Basel, Switzerland
`tel: 41-61-261-8482; fax: 41-61-261-8896
`
`World Wide Web
`
`http:r'fwww.dekker.com
`
`The publisher offers discounts on this book when ordered in bulk quantities. For more infor-
`mation, write to Special Salesfl’rofcssional Marketing at the headquarters address above.
`
`Copyright © 1999 by Marcel Dekker, Inc. All Rights Reserved.
`
`Neither this book nor any part may be reproduced or transmitted in any form or by any
`means, electronic or mechanical, including photocopying, microfilming, and recording, or
`by any information storage and retrieval system, without permission in’ writing-from the
`publisher.
`.
`,
`
`Current printing (last digit):
`)0 9
`8 7 6 5 4 3 2 l
`
`PRINTED IN THE UNITED STATES OF AMERICA
`
`Page 2
`Page 2
`
`
`
`6
`
`The Normal Distribution and
`
`Confidence Intervals
`
`the normal distribution is a
`shaped” curve,
`Described as a “bell
`symmetrical distribution which is one of the most commonly occurring
`outcomes in nature and its presence is assumed in several of the most
`commonly used statistical tests. Properties of the normal distribution have a
`very important role in the statistical
`theory of drawing inferences about
`population parameters (estimating confidence intervals) based on samples
`drawn from that population.
`
`The Normal Distribution
`
`The normal distribution is the most important distribution in statistics.
`This curve is a special frequency distribution that describes the population
`distribution of many continuously distributed biological
`traits. The normal
`distribution is often referred to as the Gaussian distribution, after
`the
`
`mathematician Carl Friedreich Gauss, even though it was first discovered by
`the French mathematician Abraham DeMoivre (Porter, 1986).
`
`Mean
`
`115
`
`Page 3
`Page 3
`
`
`
`116
`
`Chapter 5
`
`to realize that we are focusing our initial
`this point
`is critical at
`It
`discussion on the total popuiat‘ion, not a sample. As mentioned in the previous
`chapter, in the population, the mean is expressed as g and standard deviation as
`O".
`
`The characteristics of a normal distribution are as follows.
`
`First,
`
`the
`
`normal distribution is continuous and the curve is symmetrical about the mean.
`Second, the mode, median, and mean are equal and represent the middle of the
`distribution. Third, since the mean and median are the same,
`the 50th
`percentile is at the mean with an equal amount of area under the curve, above
`and below the mean. Fourth, the probability of all possible outcomes is equal to
`1.0, therefore, the total area under the curve is equal to 1.0. Since the mean is
`the 50th percentle, the area to left or right of the mean equals 0.5. Fifth, by
`definition, the area under the curve between one standard deviation above and
`one standard deviation below the mean contains an area equal to approximately
`63% of the total area under the curve. At two standard deviations this area is
`
`Sixth, as distance from the mean (in the positive or
`approximately 95%.
`approaches
`infinity,
`the
`frequency of occurrences
`negative direction)
`approaches zero. This last point illustrates the fact that most observations
`cluster around the center of the distribution and very few occur at the extremes
`of the distribution. Also, if the curve is infinite in its bounds we cannot set
`absolute external limits on the distribution.
`
`The frequency distribution (curve) for a normal distribution is defined as
`follows:
`
`f, —
`
`
`1
`
`022:
`
`e"txf”‘)2’9""
`
`Eq. 6.1
`
`yz(pi) = 3.14159 and e = 2.71328 (the base of natural logarithms).
`where:
`In a normal distribution, the area under the curve between the mean and
`
`one standard deviation is approximately 34%. Because of the symmetry of the
`distribution, 68% of the curve would be divided equally above and below the
`mean. Why 34%? Why not a nice round number like 35%, 30%, or even
`better 25%? The standard deviation is that point of inflection on the normal
`curve where the frequency distribution stops its descent to the baseline and
`begins to pull parallel with the x-axis. Areas or proportions of the normal
`distribution associated with various standard deviations are seen in Figure 6.1.
`The term “the bell-shaped curve” is a misnomer since there are many bell-
`shaped curves, ranging from those which are extremely peaked With very small
`ranges to those which are much flatter with wide distributions (Figure 6.2). A
`normal distribution is completely dependent on its parameters of p and a. A
`
`Page 4
`Page 4
`
`
`
`The Normal Distribution and Confidence Intervals
`
`117
`
`
`
`Figure 6.1 Proportions between various standard deviations under a normal
`distribution.
`
`
`
`Figure 6.2 Example of three normal distributions with different means and
`difl‘erent standard deviations.
`
`standardized normal distribution has been created to compare and compute
`variations in such a distribution regardless of center or spread from the center.
`In this standard normal distribution the mean equals 0 (Figure 6.3). The spread
`of the distribution is also standardized by setting one standard deviation equal
`to +1 or -1, and two standard deviations equal to +2 or —2 (Figure 6.4).
`As seen previously, the area between +2 and -2 is approximately 95%.
`Additionally,
`fractions of a standard deviation are calculated and their
`equivalent areas presented.
`If such a distribution can be created (with a mean
`equal to zero and standard deviation equal to one) then the equation for the
`frequency distribution can be simplified to:
`
`Page 5
`Page 5
`
`
`
`118
`
`Chapter 6
`
`Figure 6.3 Example of three normal distributions with the same mean and
`different standard deviations.
`
`
`
`
`
`
`1",.
`
`I
`
`~ 2. 5066272
`
`2.7i828'tnm
`
`f,- = 1. 08443 71""‘1’2’2
`
`'
`
`Eq. 6.2
`
`Table 6.1 is an abbreviation of a Standard Normal Distribution (a more
`complete distribution is presented in Table B2 in Appendix B, where every
`hundredth of the z—distribution is defined between 0.01 to 3.09). An important
`
`Page 6
`Page 6
`
`
`
`The Normal Distribution and Confidence Intervals
`
`119
`
`Table 6.1 Selected Areas of a Normal Standardized Distribution
`
`(Proportion of the Curve Between 0 and z)
`
`
`
`feature of the standard normal distribution is that the number of standard
`
`deviations away from the population mean can be expressed as a given percent
`or proportion of the area of the curve. The symbol 2, by convention, symbolizes
`the number of standard deviations away from the population mean. The
`numbers in these tables represent the area of the curve which falls between the
`
`mean (2 = 0) and that point on the distribution above the mean (i.e., z = +1.5,
`would be the point at 1.5 standard deviations above the mean). Since'the mean
`is the 50th percentile, the area of the curve that falls below the mean (or below
`zero) is .5000. Because a normal distribution is symmetrical, this table could
`also represent the various areas below the mean. For example, for z = —1.5 (or
`1.5 standard devialions below the mean), 2 represents the same area from 0 to
`-l.5, as the area from 0 to +1.5. A z-value tells us how far above and below the
`mean any given score is in units of the standard deviation.
`Using the information in Table 6.1, the area under the curve that falls
`below +2 would be the area between +2 and (1, plus the area below 0.
`
`Page 7
`Page 7
`
`
`
`120
`
`Chapter 6
`
`Area (<+2) = Area (between 0 and +2) + Area (below 0)
`Area (<+2) = .4772 + .5000 = .9772
`
`These probabilities can be summed because of the addition theorem discussed
`
`in Chapter 2.
`All possible events would fall within this standard normal distribution
`(pflx) = 1.00). Since the probability of all events equals 1.00 and the total area
`under the curve equals 1, then various areas within a normalized standard
`distribution can also represent probabilities of certain outcomes.
`In the above
`example, the area under the curve below two standard deviations (represented
`as +2) was .9972. This can also be thought of as the probability of an outcome
`being less than two standard deviations above the mean. Conversely,
`the
`probability of being two or more standard deviations above the mean would be
`10000 — .9772 or .0228.
`
`and below the mean,
`above
`standard deviations
`Between three
`approximately 99.8% of the observations will occur. Therefore, assuming a
`normal distribution, a quick method for roughly appr0ximating the standard
`deviation is to divide the range of the observations by six, since almost all
`observations will fall within these six intervals. For example, consider the data
`in Table 4.3. The true standard deviation for this data is 16.8 meg. The range
`of 86 meg, divided by six would give a rough approximation of 14.3 meg.
`It is possible to calculate the probability of any particular outcome within a
`normal distribution. The areas within specified portions of our curve represent
`the probability of the values of interest lying between the vertical lines. To
`illustrate this, consider a large container of tablets (representing a total
`population) which is expected to be normally distributed with respect to the
`tablet weight. What is the probability of randomly sampling a tablet that weighs
`within 1.5 standard deviations of the mean?
`
`
`
`-l.5
`
`+1.5
`
`Because weight is a continuous variable, we are concerned with p(>-1.5 or
`<+1.5). From Table 6.1:
`
`Page 8
`Page 8
`
`
`
`The Normal Distribution and Confidence Intervals
`
`121
`
`p(z<‘+l.5) # Area between 0 and +1.5 =
`p(z>-l.5) = Area between 0 and -l.5 =
`p(z —1.5 to +1.5)=
`
`.4332
`.4332
`.8664
`
`There is a probability of .8664 (or 87% chance) of sampling a tablet within 1.5
`standard deviations of the mean. What is the probability of sampling a tablet
`greater than 2.25 standard deviations above the mean?
`
`L2
`
`+2.25
`
`First, we know that the total area above the mean is .5000. By reading Table
`6.1, the area between 225 standard deviations (z = 2.25) and the mean is .4878
`(the area between 0 and +2.25). Therefore the probability of sampling a tablet
`weighing more than 2.25 standard deviations above the mean weight is:
`
`p(z>+2.25) = .5000 - .4878 = .0122
`
`If we wish to know the probability of a tablet being less than 2.25 standard
`deviations above the mean, the complement probability of being less than a 2-
`value of +2.25 is:
`
`p(z<+2.25) = l - p(z>+2.25) = 1.000 - .0122 = .9878
`
`Also calculated as:
`
`- p(z<+2.25) = p(z<0) + p(z<2.25) = .5000 + .4878 = .9878
`
`If the mean and stande deviation of a population are known, the exact
`location (2 above or below the mean) for any observation can be caICulated
`using the following formula:
`
`
`z= x7”
`
`Eq. 6.3
`
`Page 9
`Page 9
`
`
`
`122
`
`Chapter 6
`
`Because values in a normal distribution are on a continuous scale and are
`
`handled as continuous variables, we must correct for continuity. Values for x
`
`would be as follows:
`
`Likelihood of being: greater then 185 mg =
`
`p(>185.5);
`
`less than 200 mg =
`200 mg or greater =
`between and including 185 and 200 mg;
`
`p(< 199.5);
`p(>199.5); and
`p(>184.5 and <200.5).
`
`To examine this, consider a sample from a known population with expected
`population parameters (previous estimates of the population mean and standard
`deviation, for example based on prior production runs). With an expected
`population mean assay of 750 mg and a population standard deviation of 60
`mg, what is the probability of sampling a capsule with an assay greater than
`850 mg?
`
`L
`
`850.5
`
`As 'a continuous variable, the p(>850 mg) is actually p(>850.5 mg) when
`corrected for continuity.
`
`2
`
`
`_ x—y _ 8505-750 _ 100.5
`—+1.68
`0'
`60
`60
`
`p(z>+1.68)
`
`IJ-
`
`.5000 - p(z<l.68)
`.5000 - .4535 = .0465
`
`Given the same population as above, what is the probability of randomly
`sampling a capsule with an assay between 700 and 825 mg?
`
`Page 10
`Page 10
`
`
`
`The Normal Distribution and Confidence Intervals
`
`123
`
`
`
`699.5
`
`825.5
`
`Once again correcting for continuity, p(<825 mg) is rewritten as p(<825.5 mg)
`and p(>700 mg) is really p(>699.5 mg).
`
`_ x—y _ 825.5-750_ 75.5
`:—
`_—__=+1.26
`‘0
`50
`60
`
`
`z_ x—# _ 699-5'750 — '50'5 —-0.84
`a
`60
`60
`
`p(z_<+l.26) + p(z>-o.34)
`p(between 699.5 and 325.5) =
`: .3962 + .2995 = .5957
`
`Given the same population, what is the probabiiity of randomly sampling a
`capsule with an assay less than 600 mg?
`
`599.5
`
`As a continuous variable, p(<600 mg) is p(>599.5 mg):
`
`450.5
`599.5-750
`-
`22H:—:—=_2I5
`0'
`60
`60
`
`Page 11
`Page 1 1
`
`
`
`124
`
`Chapter 6
`
`p(z<—2.5) = .5000 -p(z<2.5)
`= 5000-4938 = .0062
`
`in these examples with the given population mean and standard
`Thus,
`deviation, the likelihood of randomly sampling a capsule greater than 350 mg
`is approximately 5%, a capsule less than 600 mg is less than 1%, and a capsule
`between 700 and 825 mg is almost 70%.
`Lastly,‘ the probability of obtaining any one particular value is zero, but we
`can determine probabilities for specific ranges. Correcting for continuity, the
`value 750 (the mean) actually represents an infinite number of possible values
`betvveen 749.5 and 750.5 mg. The area under the curve between the center and
`the upper limit would be
`
`Z;x-p =750.5-750=£=0‘01
`or
`60
`60
`
`p(z<0.01) = .004
`
`Since there would be an identical area between 749.5 and the mean, the total
`
`proportion associated with 750 mg would be
`
`p(750 mg) 2 .003
`
`In the previous examples we knew both the population mean (y) and the
`population standard deviation (a). However, in most statistical investigations
`this information is not available and formulas must be employed that use
`estimates of these parameters based on the sample results.
`' Important z—values related to other areas under the curve for a normal
`distribution include:
`
`90% -l.64 < z < +1.64
`
`95% -1.96 < z < +1.96
`
`99% -2.57 < z < +2.57
`
`Determining if the Distribution is Normal
`
`The sample selected is our best guess at the composition ofthe population:
`the center, the dispersion, and the shape of the distribution. Therefore, the
`appearance of the sample is our best estimate, if the population is normally
`distributed. In the absence of any information which would disprove normality,
`it is assumed that a normal distribution exists (i.e., initial sample results look
`
`Page 12
`Page 12
`
`
`
`The Normal Distribution and Confidence Intervals
`
`125
`
`extremely skewed or'bimodal).
`One quick method to determine if the population is normally
`distributed is to determine if the sample mean and median are approximately
`equal;
`If .they are about the same (similar in value) then the population
`probably has a normal distribution. If the mean is substantially greater than the
`median,
`the population is probably positively skewed and if the mean is
`substantially less than the median, a negatively skewed population probably
`exists. Other methods for determining normality would be to:
`l) plot the
`distribution on graph paper; 2) plot histogram cumulative frequencies on
`probability paper; or 3) do a chi square goodness-of—fit
`test (described in
`Chapter 15).
`Simple visual methods to determine if sample data is consistent with
`expectations for a normal distribution is to plot a cumulative frequency curve
`or use special graph paper known as probability paper.
`In a cumulative
`frequency curve, data is arranged in order of increasing size and plotted on
`normal graph paper:
`
`% cumulative frequency:
`
`cumulative fi-equency
`n
`
`x100
`
`If the data came from a normally distributed population, the result will be an S-
`
`shaped curve.
`Probability paper has a unique non—linear scale (i.e., National #12-083
`or Keuffel and Esser #46-8000) on the cumulative frequency axis which will
`convert the S—shaped curve to a straight line.
`If a straight line can be drawn
`through the percent cumulative frequency data points, the estimated population
`is normally distributed.
`If the curvilinearly relationship exists, the population
`is skewed.
`
`Sampling Distribution
`
`If we have a population and withdraw a random sample of observations
`from that population, we could calculate a sample mean and a sample standard
`deviation. As mentioned previously,
`this information would be our best
`estimate of the true population parameters.
`
`X sample as
`
`“population
`
`S sample
`
`53 0 population
`
`The characteristics of dispersion or variability are not unique to samples alone.
`Individual samples can also vary around the population mean. Just by chance,
`
`Page 13
`Page 13
`
`
`
`126
`
`Chapter 6
`
`or luck, we could have sampled from the upper or lower ends of the population
`distribulion and calculated a sample mean that was too high or too low.
`Through no fault of our own, our estimate of the population mean would be
`erroneous.
`.
`
`To illustrate this point, let us return to the pharmacokinctic data used in
`Chapter 4. From this example, we will assume that the data in Table 4.3
`represented the entire population of pharmacokinetic studies ever conducted on
`this drug. ' Due to budgetary restraints or time, we were only able to analyze
`five samples from this population. How many possible ways could five samples
`be randomly selected from this data? Based on the combination formula (Eq.
`2.11) there would be
`
`
` 125 l
`= 125' = 234,53I,275
`52'120!
`
`5
`
`possible ways. Thus, it is possible to sample these 125 values in over 234
`million different ways and because they are sampled at random, each possible
`combination has an equal likelihood of being selected. Therefore, by chance
`alone we could sample the smallest five values in our population (Sample A) or
`the largest five (Sample D) or any combination in between these extremes
`(Table 6.2). Samples B and C were generated using the Random Numbers
`Table B1 in Appendix B.
`The mean" is a more efiicient estimate measure of the center, because with
`
`repeated samples of the same size from a given population, the mean will show
`less variation than either the mode or the median. Statisticians have defined
`
`this outcome as the central limit theorem and its derivation is beyond the scope
`of this book. However, there are three important characteristics that will be
`utilized in future statistical tests.
`
`1. The mean of all possible sample means is equal to the mean -
`of the original population from which they were sampled.
`
`Ear
`
`Eq. 6.4
`
`If we averaged all 234,531,275 possible sample means, this
`grand mean or mean of the mean would eQual the population
`mean (a = 752.4 meg for N = 125) from which they were
`sampled.
`
`Page 14
`Page 14
`
`
`
`The Normal Distribution and Confidence Intervals
`
`127
`
`Table 6:2 Possible Samples from Population Presented as Table 4.3
`
`Sample A
`706
`
`Sample B
`731
`
`Sample C
`724
`
`Sample D
`778
`
`714
`718
`
`720
`
`E
`
`716.4
`6.8
`
`‘
`
`760
`752
`
`736
`
`E
`
`752.8
`21.5
`
`752
`762
`
`734
`
`E
`
`749.4
`20.6
`
`785
`788
`
`790
`
`m
`
`786.8
`5.7
`
`Mean=
`SD. =
`
`2. The standard deviation for all possible sample means is equal
`to the population standard deviation divided by the square
`root of the sample size.
`
`—'-—'—
`
`Eq. 6.5
`
`the standard
`Similar to the mean of the sample means,
`deviation for all
`the possible means would equal
`the
`population standard deviation divided by the square root of
`the sample size. The standard deviation for the means is
`referred to as the standard error of the mean or SEM.
`
`3. Regardless of whether the population is normally distributed
`or skewed,
`if we plot all
`the possible sample means,
`the
`frequency distribution will approximate that of a normal
`distribution, based on the central
`limit
`theorem.
`This
`
`to many statistical formulas because it
`theorem is critical
`justifies
`the assumption of normality.
`The
`sampling
`distribution will
`approximate
`a
`normal
`distribution,
`regardless of the distribution of the original population, when
`the sample size is relatively large. However, a sample size as
`small as n=30 will often result in a near normal sampling
`distribution (Kachigan, 1991),
`
`they would produce a
`if all 234,531,275 possible means were plotted,
`frequency distribution which is normally distributed. Because the sample
`means are normally distributed, values in the normal standardized distribution
`(2 distribution), will also apply to the distribution of sample means. For
`example, of all the possible sample means:
`
`Page 15
`Page 15
`
`
`
`128
`
`Chapter 6
`
`68% fall within + or - 1.00 SEM
`
`90% fall within + or - 1.64 SEM
`
`95% fall within + or - 1.96 SEM
`
`99% fall within + or - 2.57 SEM
`
`The distribution of the mean will be a probability distribution, consisting of
`various values and their associated probabilities, and if we sample from any
`population, the resultant means will be distributed on a normal bell shaped
`curve. Most will be near the center and 5% will be outside 1.96 standard
`
`deviations of the distribution.
`
`Standard Error of the Mean versus the Standard Deviation
`
`in a sampling distribution, the overall
`As seen in the previous section,
`mean of the means would be equal to the population mean and the dispersion
`would depend on the amount of variance in the population. Obviously, the
`more we know about our population (the larger the sample size), the better our
`estimate of the population center. The best estimate of the population standard
`deviation is the sample standard deviation, which can be used to replace the o-
`in Eq. 6.5 to produce a standard error of the mean based on sample data:
`
`3;: is SEM
`J;
`
`Eq, 6.6
`
`The standard deviation (S or SD) describes the variability within a sample;
`whereas,
`the standard error of the mean (SEM)
`represents the possible
`variability of the mean itself. The SEM is sometimes referred to as the
`standard error (SE) and describes the variation of all possible sample means
`and equals the SD of the sample data divided by the square root of the sample
`size. As can be seen by the formula, the distribution of sample means (the
`standard error of the mean) will always be smaller than the dispersion of the
`sample (the standard deviation).
`Authors may erroneously present the distribution of sample results by using
`the SEM to represent information because there appears to be less variability.
`This may be misleading since the SEM has a different meaning than the SD.
`The SEM is smaller than the SD and the intentional presentation of the SEM
`instead of the larger SD is a manipulation to make data look more precise. The
`SEM is extremely important in the estimation of a true population mean, based
`on sample results. However, because it
`is disproportionately 10w, it should
`never be used as a measure of the distribution of sample results. For example,
`the SEM from our previous example of liquid fill volumes (Table 5.1) is much
`
`Page 16
`Page 16
`
`
`
`The Normal Distribution and Confidence Intervals
`
`129
`
`smaller (by a factor of almost six) than the calculated standard deviation:
`
`SEM=9£= 0.152
`J37)
`
`the term standard error refers to the variability of a
`By convention,
`sampling distribution. However, authors still use the standard error of the
`mean to present sample distributions, because the SEM is much smaller than
`the SD and presents a much smaller variation of the results. An even more
`troublesome occurrence is the failure of authors to indicate in reports or
`publications whether a result represents a SD or a SEM. For example, the
`reporting of 456.1 : 1.3, with no indication of what the term to the right of the
`signs represents. Is this a very tight SD? Is it the SEM? Could it even be the
`RSD?
`
`Standard error of the mean can be considered as a measure of precision.
`Obviously, the smaller the SEM, the more confident we can be that our sample
`mean is closer to the true population mean. However, at the same time, large
`increases in sample size produce relatively small changes in this measure of
`precision. For example, using a constant sample SD of 21.5 for sample B,
`presented above,
`the measure of SEM changes very little as sample sizes
`increase past 30 (Figure 6.5). A general rule of thumb is that with samples of
`30 or more observations, it is safe to use the sample standard deviation as an
`estimate of population standard deviation.
`
`Confidence Intervals
`
`As discussed in Chapter 5, using a random sample and independent
`
`measures, one can calculate measures of central tendency (X and S ). The
`result represents only one sample that belongs to a distribution of many possible
`sample means. Because we are dealing with a sample and in most cases don't
`know the true population parameters, we often must make a statistical “guess”
`at these parameters. For example, the previous samples A through D all have
`calculated means, any of which could be the true mean for the population from
`which they were randomly sampled.
`In order to define the true population
`mean, we need to allow for a range of possible means based on our estimate:
`
`Population 2
`Mean
`
`+ " Fudge"
`Estimate
`SampIeMean _ Factor
`
`Page 17
`Page 17
`
`
`
`130
`
`Chapter 6
`
`l4
`
`12
`
`10
`
`8
`
`5
`
`4
`
`2 0
`
`SEM
`
`12.40
`9.62
`6.80
`
`5.55
`
`4.81
`4.30
`
`3.93
`3.04
`2.15
`
`g
`
`l1
`
`3
`5
`10
`
`15
`
`20
`25
`
`30
`50
`. 100
`
`510 20
`
`3o
`
`50
`
`100
`
`Sample Size
`
`Figure 6.5 Variation in standard error of the means by sample size.
`
`This single estimate of the population mean (based on the sample) can be
`referred to as a point estimate. The result is a range of possible outcomes
`
`defined as boundary values or confidence limits. At the same time, we would
`like to have a certain amount of confidence in our statement that the population
`mean falls within these boundary values. For example, we may want to be 95%
`certain that we are correct, or 99% certain. Note again that because it is a
`sample, not an entire population, we cannot be 100% certain of our prediction.
`The only way to be 100% certain would be to measure every item in the
`population and in most cases that
`is either impractical or impossible to
`accomplish. Therefore, in order to have a certain confidence in our decision
`(i'.e., 95% or 99% certain) we need to add to our equation a factor to allow us
`this confidence:
`
`Population =
`Mean
`
`+ Reliability x Standard
`Estimate
`SampleMean _ Coeflicient
`' Error
`
`E 6 7
`(1'
`'
`
`This reliability coefficient can be obtained from the normal standardized
`
`distribution. For example if we want to be certain 95% of the time, we Will
`allow an error 5% of the time. We could error to the high side or low side and
`if we wanted our error divided equally between the two extremes, we would
`allow a 2.5% error of being too high in our estimation and 2.5% of being too
`low in our estimate of the true population mean.
`In Table 132 of Appendix B
`we find that 95% of the area under the curve falls between -1.96 z and +1.96 2.
`
`This follows the theory of the normal distribution where 95% of the values, or
`
`Page 18
`Page 18
`
`
`
`The Normal Distribution and Confidence Intervals
`
`131
`
`in this case sample means, fall within 1.96 standard deviations of the mean.
`The actual calculation for the 95% confidence interval would be:
`
`fl : a? i Z{}_a/2) x %.
`
`n
`
`or in the case of a 95% confidence interval:
`
`J J
`
`;
`
`p L E i (1.96)
`
`The standard error term or standard error of the mean term, is calculated based
`
`on the population standard deviation and specific sample size. If the confidence
`interval were to change to 99% or 90%, the reliability coefficient would change
`to 2.57 and 1.64 respectively (based on values in Table Bl where 0.99 and 0.90
`of the area fall under the curve).
`In creating a range of possible outcomes
`instead of one specific measure “it is better to be approximately correct, than
`precisely wrong” (Kachigan, 1991, p.99).
`Many of the following chapters will deal with the area of confidence
`intervals and tests involved in this area. But at this point let us assume that we
`
`know the population standard deviation (0%, possibly through historical data or
`previous tests.
`In the case of the pharmacokinetic data (Table 4.3),
`the
`population standard deviation is known to be 168, based on the raw data, and
`was calculated using the formula to calculate a population standard deviation
`(Eq. 5.5 and 5.6). Using the four samples in Table 6.2 it is possible to estimate
`the population mean based on data for each sample. For example, with Sample
`A:
`
`#5 = 716.4:15'6fl = 716.4i14.7
`,g
`
`7017 < ,u < 73].] mag.
`
`The best estimate of the population mean (for the researcher using Sample A)
`would be between 701.? and 731.1 meg. Note that the “fudge factor” would
`remain the same for all four samples since the reliability coefiicient will remain
`constant (1.96) and error term (the population standard deviation divided by
`square root of the sample size) does not change. Therefore the results for the
`other three samples would be:
`
`Page 19
`Page 19
`
`
`
`132
`
`Chapter 5
`
`Population mean = 752.4
`
`Sample A
`
`I Sample B
`
`690
`
`710
`
`730
`
`750
`
`770
`
`790
`
`810
`
`mcg
`
`Figure 6.6 Sample results compared with the population mean.
`
`Sample B:
`
`Sample C:
`
`Sample D:
`
`u = 752.8 4; 14.7
`738.] < p < 767.5 mcg
`
`ll. = 749.4 J; 14.7
`734.7 < p, < 764.1 mcg
`
`p = 786.8 i 14.7
`772.1 < p, < 801.5 mcg
`
`From our previous discussion of presentation mode, the true population mean
`for these 125 data points is a Cmax of 752.4 mcg.
`In the case of samples B and
`C, the true population mean did fall within the 95% confidence interval and we
`were correct in our prediction of this mean. However, with the extreme
`samples (A and D), the population mean falls outside thc confidence interval
`(Figure 6.6). With over 234 million possible samples and using the reliability
`coefficient (95%), almost 12 million possible samples will give us erroneous
`results.
`'
`_
`‘
`Adjusting the confidence interval can increase the likelihood of predicting
`the correct population mean. One more sample was drawn consisting of five
`outcomes and the calculated mean is 768.4.
`If a 95% confidence interval is
`
`calculated, the population mean falls outside the interval.
`
`Page 20
`Page 20
`
`
`
`The Normal Distribution and Confidence Intervals
`
`133
`
`,u — 768.4 i 1.96fl = 768.4 :t 14.?
`J?
`
`753.7 < ‘u < 783Jmcg
`
`However, if we decrease our confidence to 90%, the true population mean falls
`even further outside the interval.
`
`,u = 768.4 i 1.64fl = 768.4 i 12.4
`,5
`
`756.0 < p < 780.8 mag
`
`Similarly, if we increase our confidence to 99%, the true population mean will
`be found within the predicted limits.
`
`,1:
`
`-“- 768.4 i 257-133: = 768.4:l: 19.3
`J?
`
`749.] < p < 7877ng
`
`As seen in Figure 6.7, as the percentage of confidence increases, the width of
`the confidence interval increases. Creation and adjustment of the confidence
`intervals is the basis upon which statistical analysis and hypothesis testing is
`based.
`_
`to make a
`What we have accomplished is our first inferential statistic;
`statement about a population parameter based on a subset of that population.
`The z-test is the oldest of the statistical tests and was often called the critical
`
`ratio in early statistical literature. The interval estimate is our best guess,
`with a certain degree of confidence, where the actual parameter exists. We
`must allow for a certain amount of error (i.e., 5% or 1%) since we do not know
`the entire population. As shown in Figure 6.7, as our error decreases the width
`of our interval estimate will increase.
`In order to be 100% confident, our
`
`estimate of the interval would be from -oc to +0: (negative to positive infinity).
`Also as can be seen in the formula for the confidence interval estimate, with a
`
`large sample size, the standard error term will decrease and our interval width
`will decrease. Relating back to terms defined in Chapter 3, we can relate
`confidence interval in terms of precision and the confidence level is what we
`establish as our reliability.
`'
`
`Page 21
`Page 21
`
`
`
`134
`
`Chapter 6
`
`Population mean = 752.4
`
`95% Confidence Interval
`
`90% Confidence Interval
`
`
`
` 99% Confidence Interval
`
`690
`
`710
`
`730
`
`750
`
`770
`
`790
`
`810
`
`mcg
`
`Figure 6.7. Sample results with different confidence levels compared with the
`hue population mean.
`
`As we shall see in future chapters, a basic assumption for many statistical
`tests (i.e., student t—test, F-test, correlation) is that populations from which the
`samples are selected are composed of random outcomes that approximate a
`normal distribution.
`If this is true, then we know many characteristics about
`our population with respect to its mean and standard deviati