`Reference Guide on Statistics
`daVid h. Kaye and daVid a. freedman
`David H. Kaye, M.A., J.D., is Distinguished Professor of Law and Weiss Family Scholar,
`The Pennsylvania State University, University Park, and Regents’ Professor Emeritus, Arizona
`State University Sandra Day O’Connor College of Law and School of Life Sciences, Tempe.
`David A. Freedman, Ph.D., was Professor of Statistics, University of California, Berkeley.
`[Editor’s Note: Sadly, Professor Freedman passed away during the production of this
` I. Introduction, 213
` A. Admissibility and Weight of Statistical Studies, 214
` B. Varieties and Limits of Statistical Expertise, 214
` C. Procedures That Enhance Statistical Testimony, 215
` 1. Maintaining professional autonomy, 215
` 2. Disclosing other analyses, 216
` 3. Disclosing data and analytical methods before trial, 216
` II. How Have the Data Been Collected? 216
` A. Is the Study Designed to Investigate Causation? 217
` 1. Types of studies, 217
` 2. Randomized controlled experiments, 220
` 3. Observational studies, 220
` 4. Can the results be generalized? 222
` B. Descriptive Surveys and Censuses, 223
` 1. What method is used to select the units? 223
` 2. Of the units selected, which are measured? 226
` C. Individual Measurements, 227
` 1. Is the measurement process reliable? 227
` 2. Is the measurement process valid? 228
` 3. Are the measurements recorded correctly? 229
` D. What Is Random? 230
` III. How Have the Data Been Presented? 230
` A. Are Rates or Percentages Properly Interpreted? 230
` 1. Have appropriate benchmarks been provided? 230
` 2. Have the data collection procedures changed? 231
` 3. Are the categories appropriate? 231
` 4. How big is the base of a percentage? 233
` 5. What comparisons are made? 233
` B. Is an Appropriate Measure of Association Used? 233
Reference Manual on Scientific Evidence: Third Edition
`Copyright © National Academy of Sciences. All rights reserved.


`Reference Manual on Scientific Evidence
` C. Does a Graph Portray Data Fairly? 236
` 1. How are trends displayed? 236
` 2. How are distributions displayed? 236
` D. Is an Appropriate Measure Used for the Center of a Distribution? 238
` E. Is an Appropriate Measure of Variability Used? 239
` IV. What Inferences Can Be Drawn from the Data? 240
` A. Estimation, 242
` 1. What estimator should be used? 242
` 2. What is the standard error? The confidence interval? 243
` 3. How big should the sample be? 246
` 4. What are the technical difficulties? 247
` B. Significance Levels and Hypothesis Tests, 249
` 1. What is the p-value? 249
` 2. Is a difference statistically significant? 251
` 3. Tests or interval estimates? 252
` 4. Is the sample statistically significant? 253
` C. Evaluating Hypothesis Tests, 253
` 1. What is the power of the test? 253
` 2. What about small samples? 254
` 3. One tail or two? 255
` 4. How many tests have been done? 256
` 5. What are the rival hypotheses? 257
` D. Posterior Probabilities, 258
` V. Correlation and Regression, 260
` A. Scatter Diagrams, 260
` B. Correlation Coefficients, 261
` 1. Is the association linear? 262
` 2. Do outliers influence the correlation coefficient? 262
` 3. Does a confounding variable influence the coefficient? 262
` C. Regression Lines, 264
` 1. What are the slope and intercept? 265
` 2. What is the unit of analysis? 266
` D. Statistical Models, 268
`Appendix, 273
` A. Frequentists and Bayesians, 273
` B. The Spock Jury: Technical Details, 275
` C. The Nixon Papers: Technical Details, 278
` D. A Social Science Example of Regression: Gender Discrimination in
`Salaries, 279
` 1. The regression model, 279
` 2. Standard errors, t-statistics, and statistical significance, 281
`Glossary of Terms, 283
`References on Statistics, 302
`Reference Guide on Statistics
`I. Introduction
`Statistical assessments are prominent in many kinds of legal cases, including
`antitrust, employment discrimination, toxic torts, and voting rights cases.1 This
`reference guide describes the elements of statistical reasoning. We hope the expla-
`nations will help judges and lawyers to understand statistical terminology, to see
`the strengths and weaknesses of statistical arguments, and to apply relevant legal
`doctrine. The guide is organized as follows:
`• Section I provides an overview of the field, discusses the admissibility
`of statistical studies, and offers some suggestions about procedures that
`encourage the best use of statistical evidence.
`• Section II addresses data collection and explains why the design of a study
`is the most important determinant of its quality. This section compares
`experiments with observational studies and surveys with censuses, indicat-
`ing when the various kinds of study are likely to provide useful results.
`• Section III discusses the art of summarizing data. This section considers the
`mean, median, and standard deviation. These are basic descriptive statistics,
`and most statistical analyses use them as building blocks. This section also
`discusses patterns in data that are brought out by graphs, percentages, and
`• Section IV describes the logic of statistical inference, emphasizing founda-
`tions and disclosing limitations. This section covers estimation, standard
`errors and confidence intervals, p-values, and hypothesis tests.
`• Section V shows how associations can be described by scatter diagrams,
`correlation coefficients, and regression lines. Regression is often used to
`infer causation from association. This section explains the technique, indi-
`cating the circumstances under which it and other statistical models are
`likely to succeed—or fail.
`• An appendix provides some technical details.
`• The glossary defines statistical terms that may be encountered in litigation.
`1. See generally Statistical Science in the Courtroom (Joseph L. Gastwirth ed., 2000); Statistics
`and the Law (Morris H. DeGroot et al. eds., 1986); National Research Council, The Evolving Role
`of Statistical Assessments as Evidence in the Courts (Stephen E. Fienberg ed., 1989) [hereinafter The
`Evolving Role of Statistical Assessments as Evidence in the Courts]; Michael O. Finkelstein & Bruce
`Levin, Statistics for Lawyers (2d ed. 2001); 1 & 2 Joseph L. Gastwirth, Statistical Reasoning in Law
`and Public Policy (1988); Hans Zeisel & David Kaye, Prove It with Figures: Empirical Methods in
`Law and Litigation (1997).
Reference Manual on Scientific Evidence: Third Edition
`Reference Manual on Scientific Evidence
`A. Admissibility and Weight of Statistical Studies
`Statistical studies suitably designed to address a material issue generally will be
`admissible under the Federal Rules of Evidence. The hearsay rule rarely is a
` serious barrier to the presentation of statistical studies, because such studies may
`be offered to explain the basis for an expert’s opinion or may be admissible under
`the learned treatise exception to the hearsay rule.2 Because most statistical methods
`relied on in court are described in textbooks or journal articles and are capable
`of producing useful results when properly applied, these methods generally satisfy
`important aspects of the “scientific knowledge” requirement in Daubert v. Merrell
`Dow Pharmaceuticals, Inc.3 Of course, a particular study may use a method that is
`entirely appropriate but that is so poorly executed that it should be inadmissible
`under Federal Rules of Evidence 403 and 702.4 Or, the method may be inappro-
`priate for the problem at hand and thus lack the “fit” spoken of in Daubert.5 Or
`the study might rest on data of the type not reasonably relied on by statisticians or
`substantive experts and hence run afoul of Federal Rule of Evidence 703. Often,
`however, the battle over statistical evidence concerns weight or sufficiency rather
`than admissibility.
`B. Varieties and Limits of Statistical Expertise
`For convenience, the field of statistics may be divided into three subfields: prob-
`ability theory, theoretical statistics, and applied statistics. Probability theory is the
`mathematical study of outcomes that are governed, at least in part, by chance.
`Theoretical statistics is about the properties of statistical procedures, including
`error rates; probability theory plays a key role in this endeavor. Applied statistics
`draws on both of these fields to develop techniques for collecting or analyzing
`particular types of data.
`2. See generally 2 McCormick on Evidence §§ 321, 324.3 (Kenneth S. Broun ed., 6th ed. 2006).
`Studies published by government agencies also may be admissible as public records. Id. § 296.
`3. 509 U.S. 579, 589–90 (1993).
`4. See Kumho Tire Co. v. Carmichael, 526 U.S. 137, 152 (1999) (suggesting that the trial court
`should “make certain that an expert, whether basing testimony upon professional studies or personal
`experience, employs in the courtroom the same level of intellectual rigor that characterizes the practice
`of an expert in the relevant field.”); Malletier v. Dooney & Bourke, Inc., 525 F. Supp. 2d 558, 562–63
`(S.D.N.Y. 2007) (“While errors in a survey’s methodology usually go to the weight accorded to the
`conclusions rather than its admissibility, . . . ‘there will be occasions when the proffered survey is so
`flawed as to be completely unhelpful to the trier of fact.’”) (quoting AHP Subsidiary Holding Co. v.
`Stuart Hale Co., 1 F.3d 611, 618 (7th Cir.1993)).
`5. Daubert, 509 U.S. at 591; Anderson v. Westinghouse Savannah River Co., 406 F.3d 248 (4th
`Cir. 2005) (motion to exclude statistical analysis that compared black and white employees without
`adequately taking into account differences in their job titles or positions was properly granted under
`Daubert); Malletier, 525 F. Supp. 2d at 569 (excluding a consumer survey for “a lack of fit between the
`survey’s questions and the law of dilution” and errors in the execution of the survey).
Reference Manual on Scientific Evidence: Third Edition
`Reference Guide on Statistics
`Statistical expertise is not confined to those with degrees in statistics. Because
`statistical reasoning underlies many kinds of empirical research, scholars in a
` variety of fields—including biology, economics, epidemiology, political science,
`and psychology—are exposed to statistical ideas, with an emphasis on the methods
`most important to the discipline.
`Experts who specialize in using statistical methods, and whose professional
`careers demonstrate this orientation, are most likely to use appropriate procedures
`and correctly interpret the results. By contrast, forensic scientists often lack basic
`information about the studies underlying their testimony. State v. Garrison6 illus-
`trates the problem. In this murder prosecution involving bite mark evidence, a
`dentist was allowed to testify that “the probability factor of two sets of teeth being
`identical in a case similar to this is, approximately, eight in one million,” even
`though “he was unaware of the formula utilized to arrive at that figure other than
`that it was ‘computerized.’”7
`At the same time, the choice of which data to examine, or how best to model
`a particular process, could require subject matter expertise that a statistician lacks.
`As a result, cases involving statistical evidence frequently are (or should be) “two
`expert” cases of interlocking testimony. A labor economist, for example, may
`supply a definition of the relevant labor market from which an employer draws
`its employees; the statistical expert may then compare the race of new hires to
`the racial composition of the labor market. Naturally, the value of the statistical
`analysis depends on the substantive knowledge that informs it.8
`C. Procedures That Enhance Statistical Testimony
`1. Maintaining professional autonomy
`Ideally, experts who conduct research in the context of litigation should proceed
`with the same objectivity that would be required in other contexts. Thus, experts
`who testify (or who supply results used in testimony) should conduct the analysis
`required to address in a professionally responsible fashion the issues posed by the
`litigation.9 Questions about the freedom of inquiry accorded to testifying experts,
`6. 585 P.2d 563 (Ariz. 1978).
`7. Id. at 566, 568. For other examples, see David H. Kaye et al., The New Wigmore: A Treatise
`on Evidence: Expert Evidence § 12.2 (2d ed. 2011).
`8. In Vuyanich v. Republic National Bank, 505 F. Supp. 224, 319 (N.D. Tex. 1980), vacated, 723
`F.2d 1195 (5th Cir. 1984), defendant’s statistical expert criticized the plaintiffs’ statistical model for an
`implicit, but restrictive, assumption about male and female salaries. The district court trying the case
`accepted the model because the plaintiffs’ expert had a “very strong guess” about the assumption, and
`her expertise included labor economics as well as statistics. Id. It is doubtful, however, that economic
`knowledge sheds much light on the assumption, and it would have been simple to perform a less
`restrictive analysis.
`9. See The Evolving Role of Statistical Assessments as Evidence in the Courts, supra note 1, at
`164 (recommending that the expert be free to consult with colleagues who have not been retained
Reference Manual on Scientific Evidence: Third Edition
`Reference Manual on Scientific Evidence
`as well as the scope and depth of their investigations, may reveal some of the
`limitations to the testimony.
`2. Disclosing other analyses
`Statisticians analyze data using a variety of methods. There is much to be said for
`looking at the data in several ways. To permit a fair evaluation of the analysis that
`is eventually settled on, however, the testifying expert can be asked to explain
`how that approach was developed. According to some commentators, counsel
`who know of analyses that do not support the client’s position should reveal them,
`rather than presenting only favorable results.10
`3. Disclosing data and analytical methods before trial
`The collection of data often is expensive and subject to errors and omissions.
`Moreover, careful exploration of the data can be time-consuming. To minimize
`debates at trial over the accuracy of data and the choice of analytical techniques,
`pretrial discovery procedures should be used, particularly with respect to the qual-
`ity of the data and the method of analysis.11
`II. How Have the Data Been Collected?
`The interpretation of data often depends on understanding “study design”—the
`plan for a statistical study and its implementation.12 Different designs are suited to
`answering different questions. Also, flaws in the data can undermine any statistical
`analysis, and data quality is often determined by study design.
`In many cases, statistical studies are used to show causation. Do food additives
`cause cancer? Does capital punishment deter crime? Would additional disclosures
`by any party to the litigation and that the expert receive a letter of engagement providing for these
`and other safeguards).
`10. Id. at 167; cf. William W. Schwarzer, In Defense of “Automatic Disclosure in Discovery,” 27
`Ga. L. Rev. 655, 658–59 (1993) (“[T]he lawyer owes a duty to the court to make disclosure of core
`information.”). The National Research Council also recommends that “if a party gives statistical data
`to different experts for competing analyses, that fact be disclosed to the testifying expert, if any.” The
`Evolving Role of Statistical Assessments as Evidence in the Courts, supra note 1, at 167.
`11. See The Special Comm. on Empirical Data in Legal Decision Making, Recommendations
`on Pretrial Proceedings in Cases with Voluminous Data, reprinted in The Evolving Role of Statistical
`Assessments as Evidence in the Courts, supra note 1, app. F; see also David H. Kaye, Improving Legal
`Statistics, 24 Law & Soc’y Rev. 1255 (1990).
`12. For introductory treatments of data collection, see, for example, David Freedman et al.,
`Statistics (4th ed. 2007); Darrell Huff, How to Lie with Statistics (1993); David S. Moore & William
`I. Notz, Statistics: Concepts and Controversies (6th ed. 2005); Hans Zeisel, Say It with Figures (6th
`ed. 1985); Zeisel & Kaye, supra note 1.
Reference Manual on Scientific Evidence: Third Edition
`Reference Guide on Statistics
`in a securities prospectus cause investors to behave differently? The design of
` studies to investigate causation is the first topic of this section.13
`Sample data can be used to describe a population. The population is the
`whole class of units that are of interest; the sample is the set of units chosen for
`detailed study. Inferences from the part to the whole are justified when the sample
`is representative. Sampling is the second topic of this section.
`Finally, the accuracy of the data will be considered. Because making and
`recording measurements is an error-prone activity, error rates should be assessed
`and the likely impact of errors considered. Data quality is the third topic of this
`A. Is the Study Designed to Investigate Causation?
`1. Types of studies
`When causation is the issue, anecdotal evidence can be brought to bear. So can
`observational studies or controlled experi ments. Anecdotal reports may be of
`value, but they are ordinarily more helpful in generating lines of inquiry than in
`proving causation.14 Observational studies can establish that one factor is associ-
`13. See also Michael D. Green et al., Reference Guide on Epidemiology, Section V, in this
`manual; Joseph Rodricks, Reference Guide on Exposure Science, Section E, in this manual.
`14. In medicine, evidence from clinical practice can be the starting point for discovery of
`cause-and-effect relationships. For examples, see David A. Freedman, On Types of Scientific Enquiry, in
`The Oxford Handbook of Political Methodology 300 (Janet M. Box-Steffensmeier et al. eds., 2008).
`Anecdotal evidence is rarely definitive, and some courts have suggested that attempts to infer causa-
`tion from anecdotal reports are inadmissible as unsound methodology under Daubert v. Merrell Dow
`Pharmaceuticals, Inc., 509 U.S. 579 (1993). See, e.g., McClain v. Metabolife Int’l, Inc., 401 F.3d 1233,
`1244 (11th Cir. 2005) (“simply because a person takes drugs and then suffers an injury does not show
`causation. Drawing such a conclusion from temporal relationships leads to the blunder of the post hoc
`ergo propter hoc fallacy.”); In re Baycol Prods. Litig., 532 F. Supp. 2d 1029, 1039–40 (D. Minn. 2007)
`(excluding a meta-analysis based on reports to the Food and Drug Administration of adverse events);
`Leblanc v. Chevron USA Inc., 513 F. Supp. 2d 641, 650 (E.D. La. 2007) (excluding plaintiffs’ experts’
`opinions that benzene causes myelofibrosis because the causal hypothesis “that has been generated by
`case reports . . . has not been confirmed by the vast majority of epidemiologic studies of workers being
`exposed to benzene and more generally, petroleum products.”), vacated, 275 Fed. App’x. 319 (5th
`Cir. 2008) (remanding for consideration of newer government report on health effects of benzene);
`cf. Matrixx Initiatives, Inc. v. Siracusano, 131 S. Ct. 1309, 1321 (2011) (concluding that adverse event
`reports combined with other information could be of concern to a reasonable investor and therefore
`subject to a requirement of disclosure under SEC Rule 10b-5, but stating that “the mere existence of
`reports of adverse events . . . says nothing in and of itself about whether the drug is causing the adverse
`events”). Other courts are more open to “differential diagnoses” based primarily on timing. E.g., Best v.
`Lowe’s Home Ctrs., Inc., 563 F.3d 171 (6th Cir. 2009) (reversing the exclusion of a physician’s opinion
`that exposure to propenyl chloride caused a man to lose his sense of smell because of the timing in this
`one case and the physician’s inability to attribute the change to anything else); Kaye et al., supra note
`7, §§ 8.7.2 & 12.5.1. See also Matrixx Initiatives, supra, at 1322 (listing “a temporal relationship” in a
`single patient as one indication of “a reliable causal link”).
Reference Manual on Scientific Evidence: Third Edition
`Reference Manual on Scientific Evidence
`ated with another, but work is needed to bridge the gap between association and
`causation. Randomized controlled experiments are ideally suited for demonstrat-
`ing causation.
`Anecdotal evidence usually amounts to reports that events of one kind are
`followed by events of another kind. Typically, the reports are not even sufficient
`to show association, because there is no comparison group. For example, some
`children who live near power lines develop leukemia. Does exposure to electrical
`and magnetic fields cause this disease? The anecdotal evidence is not compelling
`because leukemia also occurs among children without exposure.15 It is necessary
`to compare disease rates among those who are exposed and those who are not.
`If exposure causes the disease, the rate should be higher among the exposed and
`lower among the unexposed. That would be association.
`The next issue is crucial: Exposed and unexposed people may differ in ways
`other than the exposure they have experienced. For example, children who live
`near power lines could come from poorer families and be more at risk from other
`environmental hazards. Such differences can create the appearance of a cause-and-
`effect relationship. Other differences can mask a real relationship. Cause-and-effect
`relationships often are quite subtle, and carefully designed studies are needed to
`draw valid conclusions.
`An epidemiological classic makes the point. At one time, it was thought that
`lung cancer was caused by fumes from tarring the roads, because many lung cancer
`patients lived near roads that recently had been tarred. This is anecdotal evidence.
`But the argument is incomplete. For one thing, most people—whether exposed
`to asphalt fumes or unexposed—did not develop lung cancer. A comparison of
`rates was needed. The epidemiologists found that exposed persons and unexposed
`persons suffered from lung cancer at similar rates: Tar was probably not the causal
`agent. Exposure to cigarette smoke, however, turned out to be strongly associated
`with lung cancer. This study, in combination with later ones, made a compelling
`case that smoking cigarettes is the main cause of lung cancer.16
`A good study design compares outcomes for subjects who are exposed to
`some factor (the treatment group) with outcomes for other subjects who are
`15. See National Research Council, Committee on the Possible Effects of Electromagnetic Fields
`on Biologic Systems (1997); Zeisel & Kaye, supra note 1, at 66–67. There are problems in measur-
`ing exposure to electromagnetic fields, and results are inconsistent from one study to another. For
`such reasons, the epidemiological evidence for an effect on health is inconclusive. National Research
`Council, supra; Zeisel & Kaye, supra; Edward W. Campion, Power Lines, Cancer, and Fear, 337 New
`Eng. J. Med. 44 (1997) (editorial); Martha S. Linet et al., Residential Exposure to Magnetic Fields and Acute
`Lymphoblastic Leukemia in Children, 337 New Eng. J. Med. 1 (1997); Gary Taubes, Magnetic Field-Cancer
`Link: Will It Rest in Peace?, 277 Science 29 (1997) (quoting various epidemiologists).
`16. Richard Doll & A. Bradford Hill, A Study of the Aetiology of Carcinoma of the Lung, 2 Brit.
`Med. J. 1271 (1952). This was a matched case-control study. Cohort studies soon followed. See
`Green et al., supra note 13. For a review of the evidence on causation, see 38 International Agency
`for Research on Cancer (IARC), World Health Org., IARC Monographs on the Evaluation of the
`Carcinogenic Risk of Chemicals to Humans: Tobacco Smoking (1986).
Reference Manual on Scientific Evidence: Third Edition
`Reference Guide on Statistics
`not exposed (the control group). Now there is another important distinction to
`be made—that between controlled experiments and observational studies. In a
`controlled experiment, the investigators decide which subjects will be exposed
`and which subjects will go into the control group. In observational studies, by
`contrast, the subjects themselves choose their exposures. Because of self-selection,
`the treatment and control groups are likely to differ with respect to influential
`factors other than the one of primary interest. (These other factors are called lurk-
`ing variables or confounding variables.)17 With the health effects of power lines,
`family background is a possible confounder; so is exposure to other hazards. Many
`confounders have been proposed to explain the association between smoking and
`lung cancer, but careful epidemiological studies have ruled them out, one after
`the other.
`Confounding remains a problem to reckon with, even for the best observa-
`tional research. For example, women with herpes are more likely to develop cer-
`vical cancer than other women. Some investigators concluded that herpes caused
`cancer: In other words, they thought the association was causal. Later research
`showed that the primary cause of cervical cancer was human papilloma virus
`(HPV). Herpes was a marker of sexual activity. Women who had multiple sexual
`partners were more likely to be exposed not only to herpes but also to HPV.
`The association between herpes and cervical cancer was due to other variables.18
`What are “variables?” In statistics, a variable is a characteristic of units in a
`study. With a study of people, the unit of analysis is the person. Typical vari-
`ables include income (dollars per year) and educational level (years of schooling
`completed): These variables describe people. With a study of school districts, the
`unit of analysis is the district. Typical variables include average family income of
`district residents and average test scores of students in the district: These variables
`describe school districts.
`When investigating a cause-and-effect relationship, the variable that repre-
`sents the effect is called the dependent variable, because it depends on the causes.
`The variables that represent the causes are called independent variables. With a
`study of smoking and lung cancer, the independent variable would be smoking
`(e.g., number of cigarettes per day), and the dependent variable would mark the
`presence or absence of lung cancer. Dependent variables also are called outcome
`variables or response variables. Synonyms for independent variables are risk factors,
`predictors, and explanatory variables.
`17. For example, a confounding variable may be correlated with the independent variable and
`act causally on the dependent variable. If the units being studied differ on the independent variable,
`they are also likely to differ on the confounder. The confounder—not the independent variable—could
`therefore be responsible for differences seen on the dependent variable.
`18. For additional examples and further discussion, see Freedman et al., supra note 12, at 12–28,
`150–52; David A. Freedman, From Association to Causation: Some Remarks on the History of Statistics, 14
`Stat. Sci. 243 (1999). Some studies find that herpes is a “cofactor,” which increases risk among women
`who are also exposed to HPV. Only certain strains of HPV are carcinogenic.
Reference Manual on Scientific Evidence: Third Edition
`Reference Manual on Scientific Evidence
`2. Randomized controlled experiments
`In randomized controlled experiments, investigators assign subjects to treatment
`or control groups at random. The groups are therefore likely to be comparable,
`except for the treatment. This minimizes the role of confounding. Minor imbal-
`ances will remain, due to the play of random chance; the likely effect on study
`results can be assessed by statistical techniques.19 The bottom line is that causal
`inferences based on well-executed randomized experiments are generally more
`secure than inferences based on well-executed observational studies.
`The following example should help bring the discussion together. Today, we
`know that taking aspirin helps prevent heart attacks. But initially, there was some
`controversy. People who take aspirin rarely have heart attacks. This is anecdotal
`evidence for a protective effect, but it proves almost nothing. After all, few people
`have frequent heart attacks, whether or not they take aspirin regularly. A good
`study compares heart attack rates for two groups: people who take aspirin (the
`treatment group) and people who do not (the controls). An observational study
`would be easy to do, but in such a study the aspirin-takers are likely to be dif-
`ferent from the controls. Indeed, they are likely to be sicker—that is why they
`are taking aspirin. The study would be biased against finding a protective effect.
`Randomized experiments are harder to do, but they provide better evidence. It
`is the experiments that demonstrate a protective effect.20
`In summary, data from a treatment group without a control group generally
`reveal very little and can be misleading. Comparisons are essential. If subjects are
`assigned to treatment and control groups at random, a difference in the outcomes
`between the two groups can usually be accepted, within the limits of statistical
`error (infra Section IV), as a good measure of the treatment effect. However, if
`the groups are created in any other way, differences that existed before treatment
`may contribute to differences in the outcomes or mask differences that otherwise
`would become manifest. Observational studies succeed to the extent that the treat-
`ment and control groups are comparable—apart from the treatment.
`3. Observational studies
`The bulk of the statistical studies seen in court are observational, not experi-
`mental. Take the question of whether capital punishment deters murder. To
`conduct a randomized controlled experiment, people would need to be assigned
`randomly to a treatment group or a control group. People in the treatment
`group would know they were subject to the death penalty for murder; the
`19. Randomization of subjects to treatment or control groups puts statistical tests of s

