`Vol. 67C, No. 2, April-June 1963
`
`Realistic Evaluation of the Precision and Accuracy
`of Instrument Calibration Systems’
`
`Churchill Eisenhart
`
`(November 28, 1962)
`
`Calibration of instruments and standards is a refined form of measurement. Measure-
`ment of some property of a thing is an operation that vields as an end result a numberthat
`indicates how much of the property the thing has. Measurement is ordinarily a repeatable
`operation, so that it is appropriate to regard measurement as a production process,
`the
`““produet” being the numbers, i.e., the measurements, that it yields; and to apply to meas-
`urement processes in the laboratory the concepts and techniquesof statistical process control
`that have proved so useful in the quality control of industrial production.
`Viewed thus it becomes evident that a particular measurement operation cannot be
`
`regarded as constituting a measurement process unless statistical stability of the type
`known as a state of statistical control has been attained.
`In order to determine whether
`a particular measurement operation is, or is not, in a state of statistical control it
`is neces-
`sary to be definite on what variations of procedure, apparatus, environmental conditions,
`observers, operators, ete., are allowable in “repeated applications” of what will be consid-
`ered to be the same measurement process applied to the measurement of the same quantity
`under the same conditions. To be realistic, the “allowable variations’? must be of sufficient
`scope to bracket the circumstances likely to be met in practice.
`Furthermore, any experi-
`mental program that aims to determine the standard deviation of a measurement process
`as an indication of its precision, must be based on appropriate random sampling of this
`likely range of circumstances.
`Ordinarily the accuracy of a measurement process may be characterized by giving (a)
`the standard deviation of the process and (b) credible bounds toits likely overall system-
`atic error. Determination of credible bounds to the combinedeffect of recognized poten-
`tial sources of systematic error always involves some arbitrariness, not only in the placing
`of reasonable bounds on the systematic error likely to be contributed by each particular
`assignable cause, but also in the manner in which these individual contributions are com-
`bined. Consequently,
`the “inaccuracy” of end results of measurement cannot be ex-
`pressed by “confidence limits’? corresponding to a definite numerical “confidence level,”
`except
`in those rare instances in which the possible overall systematic error of a final result
`is negligible in comparison with its imprecision.
`
`1. Introduction
`
`| each otherin accordancewith a definite experimental
`plan.
`In general,
`the purpose for which the answer
`Calibration of is|18 needed determines the accuracy required andinstruments and standards
`
`
`
`hasically-arotined formiohimessuremenk. Niehsure-
`ordinarily also the method of measurement employed.
`ment
`is
`the assignment of numbers to material
`Specification of
`the apparatus
`and auxiliary
`
`things to represent the relations existing among|¢dtipment to be used, the operations to be performed,
`them with respect
`to particular properties. One| the sequence in which they are to be executed, and
`always measures properties of things, not the things
`the conditions under which they are respectively to
`themselves.
`In practice, measurement
`of
`some
`be carried out—these instructions collectively serve
`
`property of a thing ordinarily takes the form of a|to define a method of measurement. A measure-
`sequenceof steps or operations that vields as an end|Ment process is the realization of a method of
`
`
`
`result a number that this|Measurementindicates how much of in terms of particular apparatus and
`property the thing has, for someone to use for a|equipment of the preseribed kinds, particular condi-
`
`specific purpose. The end result may be the out-|“ons that at best only approximate the conditions
`come of a single reading of an inst rument, More
`prescribed, and particular persons as operators and
`often it
`is some kind of average, e.g., the arithmetic
`observers.
`mean of a number of independent determinations of
`It has long been recogiized that,
`in undertaking
`
`the same magnitude, or the final result of a least|to apply a particular method of measurement, a
`squares “reduction”? of measurements of a number
`degree of consistency among repeated measurements
`of different quantities that bear known relations to
`of a single quantity needs to be attained before the
`method of measurement concerned can be regarded
`"Presented at the 162 Standards Laboratory Conference, National Bureau of
`aS meaningfully realized, see before C measurement
`Standards, Boulder, Colo,, August 810,192.
`process can be said to have been established that
`is
`
`
`
`161
`
`1
`
`APPLE 1026
`
`APPLE 1026
`
`1
`
`
`
`
`
`the method of measurement con-
`a realization of
`cerned.
`Indeed, consistency or statistical stability
`of a very special kind is required: to qualify as a
`measurement process a measurement operation must
`have attained what
`is known in industrial quality
`control
`language as a state of statistical control.
`Until a measurement operation has been “debugged”
`to the extent that it has attained astateofst atistical
`control it cannot be regarded in anylogical sense as
`measuring anything at all. And when it has attained
`a state of statistical control there may still remain
`the question of whetherit is faithful to the method
`of measurement of which it
`is intended to be a
`realization.
`The systematic error, or bias, of a measurement
`process refers to its tendency to measure something
`other than what was intended; andis determined by
`the magnitude of
`the difference u-r between the
`process average or limiting mean u associated with
`measurement
`of
`a particular quantity by the
`measurement process concerned and the true value
`rt of
`the magnitude of
`this quantity.
`On_
`first
`thought,
`the “true value’ of
`the magnitude of ¢
`particular quantity appears to be a simplestraight-
`forward concept. On careful analysis, however, it
`becomes evident that the ‘true value” of the magni-
`tude of a quantity is intimately linked to the pur-
`poses for which knowledge of the magnitude of this
`quantity is needed, and cannot, in the final analysis,
`be meaningfully and usefully defined in isolation
`from these needs.
`The precision of a measurement process refers to,
`and is determined by the degree of mutual agree-
`ment characteristic of independent measurements of
`a single quantity yielded by repeated applications
`of the process under specified conditions; and its
`accuracy refers to, and is determined by, the degree
`of agreement of such measurements with the true
`value of the magnitude of the quantity concerned.
`In brief “accuracy” has to do with closeness to the
`truth; “precision,’”’ only with closeness together.
`Systematic error, precision, and accuracy are in-
`herent characteristics of a measurement process and
`not of a particular measurement yielded by the
`process. We mayalso speak of the systematic error,
`precision, and accuracy of a particular method of
`measurement
`that has the capability of statistical
`control. But these terms are not defined for a meas-
`urement operation that is not in a state of statistical
`control,
`the imprecision
`The precision, or more correctly,
`of a measurement process is ordinarily summarized
`by the standard deviation of the process, which ex-
`presses the characteristic disagreement of repeated
`measurements of a single quantity by the process
`concerned, and thus serves to indicate by how much
`a particular measurement is likely to differ from other
`values that
`the same measurement process might
`have provided in this instance, or might vield on re-
`measurement of the same quantity on another occa-
`sion. Unfortunately, there does notexist anysingle
`comprehensive measure of the accuracy (or inaccu-
`racy) of a measurement process analogous to the
`standard deviation as a measure of its imprecision.
`162
`
`To characterize the accuracy of a measurement
`process it
`is necessary, therefore,
`to indicate (a)
`its
`systematic error or bias, (b) its precision (or impre-
`cision)—and, strictly speaking, also,
`(¢) the form of
`the distribution of
`the individual measurements
`about the process average. Such is the unavoidable
`situation if one is to concern one’s self with indi-
`vidual measurements yielded by any particular meas-
`urement
`process.
`Fortunately,
`however,
`“final
`results” are ordinarily some kind of average or ad-
`justed value derived from a set of
`independent
`measurements, and when four or more independent
`measurements are involyed, such adjusted values
`tend to be normally distributed to a very good ap-
`proximation, so that the accuracyof such final results
`can ordinarily be characterized satisfactorily by in-
`dicating (a)
`their imprecision as expressed by their
`standard error, and (b) the systematic error of the
`process by which they were obtained.
`The error of any single measurement or adjusted
`value of a particular quantity is, by definition,
`the
`difference between the measurement or adjusted
`value concerned and the true value of the magnitude
`of this quantity. The error of any particular meas-
`urement or adjusted valueis, therefore, a fixed num-
`ber; and this numberwill ordinarily be unknown and
`unknowable, because the true valueof the magnitude
`of the quantity concerned is ordinarily unknown and
`unknowable.
`‘Limits to the error ofa single meas-
`urement or adjusted value may, however, be in-
`ferred from (a) the precision, and (b) bounds on the
`systematic error of
`the measurement process by
`which it was produced—but not without risk of being
`incorrect, because, quite apart from the inexactness
`with which bounds are commonly placed on a sys-
`tematic error of a measurement process, such limits
`are applicable to the error of the single measurement
`or adjusted value, not as a unique individual out-
`come, but only as a typical case of the errors charac-
`teristic of such measurements of the same quantity
`that might have been, or might be, yielded by the
`same measurement process under the same condi-
`tions.
`Since the precision of a measurement process is de-
`termined by the characteristic “closeness together’’
`of successive independent measurements of a single
`magnitude generated by repeated application of the
`process under specified conditions, and its bias or
`systematic error is determined by the direction and
`amount by which such measurements tend to differ
`from the true value of the magnitude of the quantity
`concerned, it
`is necessary to be clear on what varia-
`tions of procedure, apparatus, environmental con-
`ditions, observers, etc., are allowable in “repeated
`applications’? or what will be considered to be the
`same measurement process applied to the measure-
`ment of the same quantity under the same conditions.
`If whatever measures of the precision and bias of a
`measurement process we may adopt are to provide
`arealistic indication of the accuracyof this process in
`practice, then the ‘allowable variations’? must be of
`sufficient scope to bracket the range of circumstances
`commonly met
`in practice. Furthermore, any ex-
`perimental program that aims to determine the pre-
`
`2
`
`
`
`seale. Thus, length measurements are usually made
`by directly comparing the length concerned with
`calibrated bar or tape; and mass measurements, by
`directly comparing the weight of a given mass with
`the weight of a set of standard masses, by means of
`a balance; but
`foree measurements are usually
`carried out
`in terms of some transform, such as by
`reading on a calibrated scale the extension that the
`force produces in a spring, or the deflection that it
`produces in a proving ring; and temperature measure-
`ments are usually performed in terms of some trans-
`form, such as by reading on a calibrated scale the
`expansion of a column of mercury, or the electrical
`resistance of a platinum wire.
`
` some transform of it, with a previously calibrated
`
`2.2. Qualitative and Quantitative Aspects
`
`As Walter A. Shewhart,
`trol charts, has remarked:
`
`father of statistical con-
`
`“Ttisimportant torcalize . .. that there are two aspects
`of an operation of measurement; one is quantitative and the
`other qualitative. One consists of numbers or pointer read-
`ings such as the observed lengths inn measurements of
`the
`length of a line, and the other consists of the physical manipu-
`lations of physical things by someone in accord with instrue-
`tions that we shall assume to be deseribable in words con-
`stituting a text.’’
`[Shewhart 1939, p. 130.]
`
`the qualitative factors involved
`More speciiically,
`in the measurement of a quantity are: the apparatus
`and auriliary equipment (e.g., reagents, batteries or
`other source of electrical energy, ete.) employed;
`the operators and observers,
`if any,
`involved;
`the
`operations performed,
`together with the sequence in
`which, and the conditions under which,
`they are
`respectively carried out.
`
`cision, and thence the accuracy of a measurement
`process, must be based on an appropriate random
`sampling of
`this “range of circumstances,”
`if
`the
`usual
`tools of statistical analysis are to be strictly
`applicable.
`When adequate random sampling of the appro-
`priate “range of circumstances” is not feasible, or
`even possible, then it is necessary (a) to compute, by
`extrapolation from available data, a more or less
`subjective estimate of the precision of the measure-
`ment process concerned, to serve as a substitute for
`a direct experimental measureof this characteristic,
`and (b) to assign more or less subjective bounds to
`the systematic error of
`the measurement process.
`To the extent that such at least partially subjective
`computations are involved, the resulting evaluation
`of the overall accuracy of a measurement process
`“is based on subject-matter knowledge and_ skill,
`general information, and intuition—but not on sta-
`tistical methodology”? [Cochran et al. 1953, p. 693].
`Consequently,
`in such cases the statistically precise
`concept of a family of “confidence intervals’ asso-
`ciated with a definite ‘confidence level” or “confidence
`coefficient” is not applicable,
`The foregoing points and certain other related
`matters are discussed in greater detail
`in the sue-
`ceeding sections,
`together with an indication of
`proc edures for the realistie evaluation of precision
`and accuracy of established procedures
`for
`the
`calibration of instruments and standards that mini-
`mize as much as possible the subjective elements of
`such an evaluation. To the extent
`that complete
`elimination of the subjective element
`is not always
`possible,
`the responsibility for an important and
`sometimes the most difficult part of the evaluation
`is shifted from the shoulders of the statistician to
`the shoulders of the subject matter “expert.”’
`
`2. Measurement
`
`2.1. Nature and Object
`
`the assignment of numbers to
`is
`Measurement
`material
`things to represent
`the relations existing
`among them with respect
`to particular properties.
`‘The number assigned to some particular property
`serves to represent the relative amount of this prop-
`erty associated with the object concerned.
`Measurement always pertains to properties of
`things, not
`to the things themselves. Thus we
`cannot measure a meter bar, but can and usually
`do, measureits length; and we could also measureits
`mass, its density, and perhaps, also its hardness.
`The object of measurement is twofold: first, sym-
`bolic representation of properties of
`things as a
`basis for conceptual analysis; and second, to effect
`the representation in a form amenable to the power-
`ful
`tools of mathematical analysis. The decisive
`feature is symbolic representation of properties, for
`which end numerals are not the only usable symbols.
`In practice the assignment of a numerical magni-
`tude to a particular property of a thing is ordinarily
`accomplished by comparison with a set of standards,
`or by comparison either of the quantity itself, or of
`
`2.3. Correction and Adjustment of Observations
`?
`The numbers obtained as “readings”? on a cali-
`brated seale are ordinarily the end product of every-
`day measurement
`in the trades and in the home.
`In scientific work there are usually two important
`additional quantitative aspects of measurement:
`(1) correction of the readings, or their transforms, to
`compensate for known deviations from ideal execu-
`tion of
`the preseribed operations, and for non-
`negligible effects of variations in uncontrolled vari-
`ables; and (2) adjustment of “raw” or corrected
`measurements of particular quantities to obtain
`values of these quantities that conform to restric-
`tions upon, or interrelations among, the magnitudes
`of
`these quantities imposed by the nature of the
`problem,
`Thus, it may not be practicable or economically
`feasible to take readings at exactly thepreseribevd
`temperatures; but quite practicable and feasible to
`bring and hold the temperature within narrowneigh-
`borhoods of the prescribed values and to record the
`actual temperatures to which the respective readings
`correspond.
`In such cases, if the deviations from the
`prescribed temperatures are not negligible, “temper-
`ature corrections’ based on appropriate theory are
`usually applied to the respective readings to bring
`163
`
`3
`
`
`
`Adj(A—B)=(A—
`
`—B)+(B—C)+
`
`(0—A)|
`
`
`
`B)—3((A
`=; [2(A—B)—(B—C)—(C—A)]
`=F2B 0)—(A—B)—(C—A)]
`Adj (B
`Adj (C—A)=5 12(C—.A)—(A—B)— (B—C©)}.
`Clearly, the sum of these three adjusted values must
`always be zero, as required, regardless of the values
`of
`the original
`individual measured differences.
`Furthermore, most persons,
`I believe, would con-
`sider this latter adjustment
`the better; and under
`certain conditions with respect to the “awof error’
`governing the original measured differences,
`it
`is
`indeedthe“best.”
`Note that no adjustment problem existed at
`the stage when only two of
`these differences had
`been measured whichever they were,
`for
`then the
`third could be obtained by subtraction. As a
`general principle, when no more observations are
`taken than are sufficient
`to provide one value of
`each of the unknown quantities involved, then the
`results so obtained are usable at
`least—they may
`not be “best.” On the other hand, when additional
`observations are taken, leading to ‘‘over determina-
`tion” and consequent contradiction of
`the funda-
`mental properties of, or the basic relationships among
`the quantities concerned, then the respective obser-
`vations must
`be regarded as contradicting one
`another. When
`this
`happens
`the observations
`themselves, or values derived from them, must be
`replaced by adjusted values such that all contradic-
`“This is a logical necessity, since
`tion is removed.
`we cannot accept for truth that which is contradic-
`tory or leads to contradictory results.”
`[Chauvenet
`1868, p. 472.]
`
` the excess or deficit
`
`them to the values that presumably would have been
`observed if the temperature in each instance had
`been exactly as prescribed.
`In practice, however,
`the objective just stated is
`rarely, if ever, actually achieved. Any‘‘temperature
`corrections” applied could be expected to bring the
`respective readings “to the values that presumably
`would have been observed if the temperature in each
`instance had been exactly as prescribed” if and only
`if these “temperature corrections” made appropriate
`allowances for al/ of the effects of the deviations of
`the actual
`temperatures
`from those prescribed.
`
`£7)Temperature corrections’
`ordinarily correct: only
`for particular effects of the deviations of the actual
`temperatures from their prescribed values; not for all
`of the effects on the readings traceable to deviations
`of
`the actual
`temperatures from those prescribed.
`Thus Michelson utilized “temperature corrections” in
`his 1879 investigation of the speed of light; but his
`results exhibit a dependence on poupaae alter
`“temperature correction.” The “temperature cor-
`rections” applied corrected only for the effects of
`thermal expansion due to variations in temperature
`and not also for changes in the indexof refraction of
`the air due to changes in the humidity of the air,
`which in June and July at Annapolis is highly cor-
`related with temperature. Corrections applied in
`practice are usually of more limited scope than the
`names that they are given appearto indicate.
`Adjustment
`of observations
`is
`fundamentally
`different from their ‘‘correction.’”’ When two or more
`related quantities are measured individually,
`the
`resulting measured values usually fail to satisfy the
`constraints on their magnitudes implied by the given
`interrelations among the quantities concerned.
`In
`such cases these “raw’? measured values are mutually
`contradictory, and require adjustment in order to be
`usable for the purpose intended.
`‘Thus, measured
`values of the three evelic differences (A—B), (B—C),
`and (C—A) between the lengths of three nominally
`equivalent gage blocks are mutually contradictory,
`and strictly speaking are not usable as values of
`these differences, unless they sum to zero.
`The primary goal of adjustment is to derive from
`such inconsistent measurements, if possible, adjusted
`values for the quantities concerned thatdo satisfy the
`constraints on their magnitudes imposed by the
`nature of
`the quantities themselves and by the
`existing interrelations among them. A second objec-
`tive is to select from all possible sets of adjusted
`values the set
`that
`is the ‘“‘best’’—or, at least, a set
`that is ‘‘good enough’ for the intended purpose—in
`somewell-defined sense. T hus, in the abovecaseof
`the measured differences between the lengths of
`three gage blocks, an adjustment could be effected
`by ignoring the measured value of oneof the differ-
`ences entirely, say, the difference (C—A), and taking
`the negative ‘of the sum of ie other two as its
`adjusted value,
`Adj(C—A)=—[(A— B)+(B—C)].
`This will certainly assure that the sum of all three
`values,
`(A—B)+(B—C)+Adj(C—A),
`is
`zero,
`as
`required, andis clearly equivalent to ascribing all of
`164
`
`to the replaced measurement,
`(C—A). Alternatively, one might prefer
`to dis-
`tribute the necessary total adjustment —|[(A—B)
`+ (B—C)+(C—A)]|
`equally over
`the
`individual
`ene differences, to obtain the following set of
`adjusted values:
`
`
`
`2.4. Scheduling the Taking of Measurements
`
`Having done what one can to remove extraneous
`sources of error, and to makethe basie measurements
`as precise and as free from systematic error as pos-
`sible, it is frequently possible not only to increase
`the precision of the end results of major interest but
`also to simultaneously decrease their sensitivity to
`sources of possible systematic error, by careful
`scheduling of
`the measurements
`required. An
`instance is provided bythetraditional procedure for
`calibrating liquid-in-glass
`thermometers
`[Waidner
`and Dickinson 1907, p. 702; NPL 1957, pp. 29-30;
`Swindells 1959, pp. 11-12]:
`Instead of attempting to
`hold the temperature of the comparison bath con-
`stant, a verydifficult objective to achieve, the heat
`
`4
`
`
`
`input to the bath is so adjusted that its temperature
`is slowly increasing at a steady rate, and then read-
`ines of,
`say,
`four
`test
`thermometers and two
`standards are taken in accordance with the schedule
`
`SiT) ToTs P8827 Ty ToTSi
`
`the readings being spaced uniformly in timeso that
`the arithmetic mean of the two readings of any one
`thermometer will correspond to the temperature of
`the comparison bath at
`the midpoint of the period.
`Such scheduling of measurement taking operations so
`that
`the effects of the specific types of departures
`from perfect control of conditions and procedurewill
`have an opportunity to balance out
`is one of
`the
`principal aims of
`the art and science of statistical
`design
`of experiments.
`For
`additional
`physical
`science examples, see, for instance, Youden |1951a;
`and 1954-1959],
`
`Specification of the apparatus and auxiliary equip-
`ment to be used, the operations to be performed, the
`sequence in which theyare to be carried out, and the
`conditions under which they are respectively to be
`carried out—these znstructions collectively serve to
`define a method of measurement.
`‘To the extent that
`corrections may be required they are an integral part
`of measurement. The types of corrections that will
`ordinarily need to be made, and specific procedures
`for making them, should be included among “the
`operations to be performed.’ Likewise,
`the essen-
`tial adjustments required should be noted, and
`specific procedures for making them incorporated in
`the specification of a» method of measurement.
`A measurement process
`is
`the realization of a
`method of measurement
`in
`terms of particular
`apparatus and equipment of
`the prescribed kinds,
`particular conditions that at best only approximate
`the conditions prescribed, and particular persons as
`2.5. Measurement as a Production Process
`operators and eo [ASTM 1961,
`p.
`1755;
`
`Murphy 1961,p.264). Of course, there will often
`be a question Fahaiber a particular measurement
`We may summarizeourdiscussion of measurement
`up to this point, as follows: Measurement of some
`process is loyal
`to the method of measurement of
`property of a thing in practice always takes the form
`which it
`is intended to be a realization; or whether
`of a sequenceof steps or operations that yield as an
`two different measurement processes can be con-
`end result a number that serves to represent
`the
`sidered to be realizations of
`the same method ol
`measurement.
`amount or quantity of some particular property of a
`thing—a number that indicates how much of
`this
`To begin with, written specifications of methods
`property the thing has,
`for someone to use for a
`of measurement often contain absolutely precise
`specific purpose. The end result may be the out-
`instructions which, however, cannot be carried out
`come of a single reading of an instrument, with or
`(repeatedly) with complete exactitude in practice;
`without corrections for departures from prescribed
`for example, “move the two parallel cross hairs of the
`conditions. More often it is some kind of average
`micrometer of the microscope until
`the graduation
`or adjusted value, e.g.,
`the arithmetic mean of a
`line of the standard is centered between them.” The
`number of independent determinations of the same
`accuracy with which such instructions can be carried
`magnitude, or the final result of, say, a least squares
`out
`in practice will always depend upon “the cir-
`“reduction” of measurements of a number of different
`cumstances’;
`in the case cited, on the skill of the
`quantities that have known relations to the quantity
`operator,
`the quality of
`the graduation line of the
`of interest.
`standard, the quality of the serew of the micrometer.
`Measurement of some property of a thing is ordi-
`the parallelism of the cross hairs, ete.
`‘To the extent
`narily a repeatable operation. This is certainly the
`that the written specification of a method of measure-
`case for the tvpes of measurement ordinarily met
`in
`ment
`involves absolutely precise instructions that
`the calibration of standards and instruments.
`It is
`cannot be carried out with complete exactitude in
`instructive,
`therefore,
`to regard measurement as a
`practice there are certain to be discrepancies between
`production process, the “product” being the numbers,
`a method of measurement and its realization by a
`that is. the measurements thatit yields: and to com-
`particular measurement process.
`a method of
`In addition,
`the specification of
`pare and contrast measurement processes in the
`laboratory with mass production processes in indus-
`measurementoften includes a number of imprecise
`try. For the momentit will suffice to note (a) that
`instructions, such as “raise the temperature slowly,”
`when successive amounts of units of “raw material”
`“stir well before taking a reading,” “make sure that
`are processed by a particular mass production
`the tubing is clean,” etc. Not only are suchin-
`structions inherently vague, but also in any given
`process, the output is a series of nominallyidentical
`instance they must be understood in terms of the
`items of product—of the particular type produced
`by the mass production operation,
`te., by the
`general
`level of
`refinement characteristic of
`the
`context in which they occur. Thus, ‘make sure that
`method of production concerned; and (b) that when
`the tubing is clean” is not an absolutely definite in-
`successive objects are measured by a_ particular
`measurement process, the individual items of ‘“prod-
`struction; to some people this would mean simply
`uct”? produced consist of the numbers assigned to
`that
`the tubing should be clean enough to drink
`liquids through; in some laboratory work it might be
`the respective objects to represent
`the relative
`amounts that
`they possess of the property deter-
`interpreted to mean mechanically washed
`and
`mined by the method of measurement involved.
`scoured so as to be free from dirt and other ordinary
`6TH0G8—Oa
`
`2.6. Methods of Measurement and Measurement
`Processes
`
`
`
`
`
`165
`
`5
`
`
`
`solid matter (but not cleansed also with chemical
`solvents to remove more stubborn contaminants) ;
`to an advanced experimental physicist it may mean
`not merely mechanically washed and chemically
`cleansed, but also “out gassed” by being heated to
`and held at a high temperature, near the softening
`point, for an hour or so. All will agree,
`I believe,
`that
`it would be exceedingly difficult to make such
`instructions absolutely definite with a convenient
`nuinber of words. To the extent that the specifica-
`tion of a method of measurement
`includes instruc-
`tions that are not absolutely definite, there will be
`room for differences between measurement processes
`that are intended to be realization of the very same
`method of measurement.
`Recognition of the difficulty of achieving absolute
`definiteness in the specification of a method of
`measurement does not imply that ‘any old set’’ of
`instructions will serve to define a method of measure-
`ment. Quite the contrary. To qualify as a specifi-
`cation of a method of measurement, a set of instruc-
`tions must besufficiently definite to insure statistical
`stability of
`repeated measurements of
`a single
`quantity,
`that
`is, derived measurement processes
`must be capable of meeting the criteria of statistical
`control [Shewhart 1939, p. 131; Murphy 1961, p. 265;
`ASTM 1961, p. 1758]. To elucidation of the mean-
`ing of, and need for this requirement we now turn.
`
`3. Properties of Measurement Processes
`
`3.1. Requirement of Statistical Control
`
`The need for attaining a degree of consistency
`among repeated measurements of a single quantity
`before the method of measurement concerned can be
`regarded as meaningful has certainly been recognized
`for a long, long time.
`‘Thus Galileo, describing his
`famous experiment on the acceleration of gravity
`in which he allowed a ball to roll different distances
`down an inclined plane wrote:
`“.
`.
`. Silasciava (como dico) scendere peril detto canale
`la palla, notando, nel modo che appresso dird,
`il temp che
`consumavanello scorrerlo tutto, replicando il medesimo atto
`molte volte per assicurarsi bene della quantita del temp, nel
`quale nonsi trovava mai differenza né anco della decima parte
`d’una battuta di polso. Fatta e stabilita precisamente tale
`operazione, facemmo seenderla medisima palla solamente per
`la quarta parte della aaEneass di
`esso ecanale .
`[Galileo 1638, Third Day; Nat'l. ed., p. 213.]
`Something more than mere “consistency” is re-
`quired, however, as Shewhart points out eloquently
`in his very important chapter on “The Specification
`of Accuracy and Precision” [Shewhart 1939, ch. LV].
`He begins by noting that
`the description given by
`R. A. Millikan [1903, pp. 195-196] of a method for
`determining the surface tension 7’ of a liquid from
`measurements of the force of tension F of a film of
`
`
`1T am grateful to my colleague Ugo Fanofor the following literal translation:
`eae WE LSE, as I wa
`vine, the ball descend through said channel, reeord-
`ing, in a manner presently to be described, the time it took in traversing it all,
`repeating the same action many times to makereally sure of the maenitude of
`time, in which one never founda difference ofeven a tenth ofa pulsebeat, Hay-
`inz doneand established precisely such operation, we let the same ball deseend
`only for the fourth part of the lengthof the same channel:
`
`regard to the basie readings from which measure-
`ments of J are derived: “Continue this operation
`until a number of consistent readings can be ob-
`tained.” Shewhart
`then comments on
`this
`as
`follows:
`the text describing the operation does not say to
`.
`“ ,.
`carry out such and such physical operations and call
`the
`result a measurement of T.
`Instead, it says in ef