throbber
@
`WDDU~<§Jm $a ©l®w®U<§Jlfll@J
`
`IBG 1058
`CBM2015-00179
`
`

`
`

`
`The Elements of Graphing Data
`by WilliamS. Cleveland
`
`REVIEWERS RAVED ABOUT THE 1ST EDITION
`THE REVISED EDITION Is EVEN BETTER
`
`Meteorological Magazine "Ideally, ev(cid:173)
`eryone interested in getting the most
`out of their data or presenting data
`clearly and concisely should have a
`copy handy."
`
`The American Cartographer "An ex(cid:173)
`cellent stimulus for deeper thinking
`about display techniques ... "
`
`College and Research Libraries "This
`book is a gem. Buy it, read it and urge
`everyone you know whose job it is to
`convert raw data to meaningful infor(cid:173)
`mation to do the same."
`
`Computing Reviews "This is an ad(cid:173)
`mirable book. It is clearly written and
`intellectually engaging."
`
`The American Mathematical Monthly
`"Graphical methods in science and
`technology. Many new ideas and
`methods; many not widely known
`before. Excellent methodological re(cid:173)
`source for research workers. Theme
`is communication of information from
`raw data."
`
`Atmospheric Environment " ... certain
`kinds of tendency toward bad graph(cid:173)
`ics could be cured if as many authors
`as possible would not just read, but, in
`the words of the Anglican Prayer Book,
`'learn, mark, and inwardly digest' this
`volume."
`
`

`
`0 Birds
`
`0 Fish
`
`0 Primates
`
`o Nonprimate Mammals
`
`I
`
`4-
`
`I
`
`I
`
`I
`
`I
`
`I
`
`I
`
`0
`
`0
`
`I
`
`f-
`
`-
`
`I-
`
`-
`
`-
`
`I
`7
`
`c;; 3-
`E
`co
`,_
`0>
`0
`0> 2-
`0
`...__..
`......
`..c
`0>
`"(j) s
`
`1 -
`
`c
`"@
`,_
`a:J
`0>
`0
`_J
`
`o-
`
`-1 -
`
`I
`0
`
`0
`0
`
`0
`
`I
`1
`
`0
`oo 0
`
`0
`o ocP
`
`o
`
`0
`
`~
`
`0
`
`0
`
`I ~0 cP 0
`~0
`0 °
`0
`OS(p
`0
`Cb
`0©
`$f§ 0 & gb
`o
`c9o
`Oo ~
`oo
`© 0
`0 0
`&
`a o ~o o
`o
`0
`0
`0
`8 0
`
`0
`
`0
`
`0
`0
`
`0
`
`0
`
`0
`
`I
`2
`
`I
`3
`
`I
`4
`
`I
`5
`
`I
`6
`
`Log Body Weight (log 10 grams)
`
`Figure I. COLOR ENCODING A CATEGORICAL VARIABLE. Color is a powerful tool that can ·
`genuinely enhance the visual decoding of information on data displays. Color can also be used to no
`purpose. We need to be hard-boiled in evaluating the efficacy of a visualization tool. It is easy to be
`dazzled by a display of data rendered in a rainbow of colors; our tendency is to be misled into
`thinking we are absorbing relevant information when we see a lot. But the success of a visualization
`tool should be based solely on the amount we learn about the phenomenon under study. There are
`two uses of color that genuinely transmit information from display to viewer. One is the rendering of
`different categories of graphical elements in different colors to provide efficient visual assembly of
`the categories, that is, to allow us to see each category of elements as a whole, mentally filtering out
`the other categories. In this figure, four different categories of plotting symbols are color encoded,
`and we can easily assemble the symbols of each category. The second use of color is illustrated by
`the display on the next page.
`
`

`
`3
`
`..--..
`E
`...:.::::
`.._...
`0> 2
`c
`:c
`t::::
`0
`z
`
`1
`
`..--..
`E
`0
`I
`E
`..c
`-2-
`>.
`.......
`·:;
`:;::::;
`(j)
`·u;
`(1.) a:
`
`80
`
`60
`
`40
`
`20
`
`0.2
`
`1.0
`0.6
`Easting (km)
`
`1.4
`
`Figure II. COLOR ENCODING A QUANTITATIVE VARIABLE. The second use of color is to
`display a function of two variables on a level plot. In this figure, color encodes soil resistivity to show
`how resistivity varies geographically. The encoding method achieves two goals: effortless
`perception of the order of the encoded quantities, and clearly perceived boundaries between
`adjacent levels. A number of encoding methods- for example, the much used rainbow encoding -
`do not achieve both of these goals.
`
`

`
`WilliamS. Cleveland
`
`The Elements
`of Graphing Data
`
`Revised Edition
`
`· AT&T Bell Laboratories, Murray Hill, New Jersey
`
`

`
`To my parents
`To my parents
`
`

`
`Acknowledgements
`
`To John Tukey, for ingenious inventions and applications of graphical
`data analysis.
`
`To many colleagues at Bell Labs, for creating an optimal environment to
`study graphical data analysis.
`
`To Marylyn McGill, for relentlessly pursuing perfection in experimenting
`with graphical displays.
`
`To Bob McGill, for our experiments in graphical perception and our many
`experiments with graphical inventions.
`
`To Nick Fisher, for effective communication of graphical principles based
`on scientific enquiry.
`
`To Sue Pope and Tina Sharp, for the considerable word processing skills
`that were needed to produce the text.
`
`To Lisa Cleveland, for days of proofreading in Summit and Abcoude.
`
`To Estelle McKittrick, for help in many forins.
`
`To Gerard Gorman, for the image processing that was needed to produce
`many of the displays.
`
`To Alan Cossa, for a high level of quality control in producing camera(cid:173)
`ready output.
`
`To many who commented on the manuscript- Paul Anderson, Jon
`Bentley, John Chambers, Nick Cox, Lisa Cleveland, Arnold Court, Mary
`Donnelly, Nick Fisher, Bob Futrelle, Colin Mallows, Bob McGill, Marylyn
`McGill, Brad Murphy, Richard Nuccitelli, James Palmer, Arno Penzias,
`and John Tukey.
`
`

`
`Published by Hobart Press, Summit, New Jersey
`
`Copyright @1994 AT&T. All rights reserved.
`
`Printed in the United States of America
`
`ISBN 0-9634884-1-4 CLOTH
`
`LIBRARY OF CONGRESS CATALOG CARD NUMBER: 94-075052
`
`PUBLISHER'S CATALOGING IN PUBLICATION
`Cleveland, William S., 1943-
`The elements of graphing data I by William S. Cleveland.
`Revised edition.
`p. em.
`Includes bibliographical references and index.
`
`1. Graphic methods. 2. Mathematical statistics-Graphic
`methods.
`I. Title.
`
`QA90.C54 1994
`
`511'.5
`
`

`
`Contents
`
`Preface
`
`1
`
`1 Introduction 4
`6
`1.1 The Power of Graphical Data Display
`1.2 The Challenge of Graphical Data Display
`16
`1.3 The Contents of the Book
`
`9
`
`2 Principles of Graph Construction 22
`2.1 Terminology
`23
`2.2 Clear Vision
`25
`2.3 Clear Understanding
`66
`2.4 Banking to 45°
`2.5 Scales
`80
`2.6 General Strategy
`
`110
`
`54
`
`3 Graphical Methods 119
`3.1 Logarithms
`120
`3.2 Residuals
`126
`3.3 Distributions
`132
`3.4 Dot Plots
`150
`3.5 Plotting Symbols and Curve Types
`3.6 Visual Reference Grids
`166
`3.7 Loess
`168
`180
`3.8 Time Series
`193
`3.9 Scatterplot Matrices
`3.10 Coplots of Scattered Data
`3.11 Cop lots of Surfaces
`203
`3.12 Brushing 206
`209
`3.13 Color
`3.14 Statistical Variation
`
`198
`
`212
`
`154
`
`

`
`227
`
`4 Graphical Perception 221
`4.1 The Model
`223
`4.2 Superposed Curves
`4.3 Color Encoding
`230
`4.4 Texture Symbols
`234
`240
`4.5 Visual Reference Grids
`4.6 Order on Dot Plots
`244
`251
`.4.7 Banking to 45°
`4.8 Correlation
`256
`4.9 Graphing Along a Common Scale
`4.10 Pop Charts
`262
`
`259
`
`Bibliography 271
`
`Figure Acknowledgements 281
`
`Colophon 283
`
`Index
`
`285
`
`

`
`Preface
`
`1
`
`This book is about visualizing data in science and technology. It
`contains graphical methods and principles that are powerful tools for
`showing the structure of data. The material is relevant for data analysis,
`when the analyst wants to study data, and for data communication, when
`the analyst wants to communicate data to others.
`
`When a graph is made, quantitative and categorical information is
`encoded by a display method. Then the information is visually decoded.
`This visual perception is a vital link. No matter how clever the choice of
`the information, and no matter how technologically impressive the
`encoding, a visualization fails if the decoding fails. Some display
`methods lead to efficient, accurate decoding, and others lead to
`inefficient, inaccurate decoding. It is only through scientific study of
`visual perception that informed judgments can be made about display
`methods. The display methods of Elements rest on a foundation of
`scientific enquiry.
`
`Except for one small section, there is nothing in this book about
`computer graphics. The basic ideas, the methods, and the principles of
`the book transcend the computing environment used to implement
`them. While graphics technology is moving along at a rapid pace, the
`human visual system has remained the same.
`
`The prerequisites for understanding the book are minimal. A few
`topics require a knowledge of the elementary concepts of probability
`and statistical science, but these topics can be skipped without affecting
`comprehension of the remainder of the book.
`
`The book Visualizing Data is a companion volume [26]. It focuses on
`graphical methods, the topic of Chapter 3 of this book; it presents far
`more methods than covered here and is more advanced, requiring a
`greater knowledge of statistics. But Visualizing Data does not delve into
`graphical perception, and takes Elements as a starting point.
`
`

`
`2
`
`Preface
`
`Elements was meant to be read from the beginning and to be enjoyed.
`However, it is possible to read here and there. Winding its way through
`the book is a summary of the material: the figures and their legends.
`Reading this summary can help readers direct themselves to specific
`items.
`
`The graphs in this book are communicating information about
`fascinating subjects, and I have not hesitated to describe the subjects in
`some detail when needed. In many cases some knowledge of the subject
`is required to understand the purpose of a graphical analysis or why a
`graph is not doing what was intended or what a new graphical method
`can show us about data. I hope the reader will share with me the
`excitement of experiencing the increased insight that graphical data
`display brings us about these subjects.
`
`

`
`The Elements of Graphing Data
`
`3
`
`1ffill®
`~~@liTii)@[ft)fr@
`@{/
`®ll©J[p)[{)) Drru~
`[Q)©Ji}@]
`·~~~@[ftJi) ~a ~~@\YJ@~@[ft)<9)
`
`

`
`4
`
`"-
`CD
`£I
`E
`
`:::l z -0 a.
`
`(/l c:
`:::l
`(f)
`
`150
`
`100
`
`50
`
`0
`
`1750
`
`1800
`
`1850
`
`1900
`
`Year
`
`- "-
`
`CD
`0
`0..£:1
`~ E
`:::l
`:::l
`(f)Z
`
`15~ ~ :
`
`1750
`
`1800
`
`1850
`
`1900
`
`Year
`
`J
`
`1.1 GRAPHICAL METHODS AND PRINCIPLES. The visualization of data requires
`basic principles and methods. Both panels of this graph show the yearly sunspot
`numbers from 17 49 to 1924. A display method, banking to 45°, has been used to
`choose the shape, or aspect ratio, of the bottom panel. The method allows us to perceive
`an important property of the sunspots that is not revealed in the top panel -the
`sunspots rise more rapidly than they fall.
`
`-
`
`

`
`1 Introduction
`
`5
`
`Data display is critical to data analysis. Graphs allow us to explore
`data to see overall patterns and to see detailed behavior; no other
`approach can compete in revealing the structure of data so thoroughly.
`Graphs allow us to view complex mathematical models fitted to data,
`and they allow us to assess the validity of such models.
`
`But realizing the potential of data visualization requires methods and
`basic principles. Figure 1.1 illustrates this. The top panel graphs the
`yearly sunspot numbers from 17 49 to 1924. The dominant frequency
`component of variation in the data is the cycles with periods of about 11
`years. The existence of the cycles is clearly revealed, but an important
`property of them is not. And this property is critical to understanding
`the variation in the cycles, which in turn is critical to developing theories
`of solar physics that explain the origin of the sunspots. The problem is
`the shape, or aspect ratio, of the graph, a square. The data are graphed
`again in the bottom panel; a method called banking to 45°, which will be
`introduced in Chapter 2, is used to determine the aspect ratio, and the
`result is a narrow rectangle. Now the graph reveals the important
`property. The cycles typically rise more rapidly than they fall; this
`behavior is most pronounced for the cycles with high peaks, is less
`pronounced for those with medium peaks, and disappears for those
`cycles with the very lowest peaks.
`
`This book is about methods and basic principles that help the data
`analyst to realize the potential of visualization. The next three chapters
`of the book divide the material into principles of graph construction,
`graphical methods, and graphical perception. In this chapter, Section 1.1
`(pp. 6-9) demonstrates the power of visualization, Section 1.2 (pp. 9-15),
`demonstrates how easy it is for the graphing of data go wrong, and
`Section 1.3 (pp. 16-21) briefly describes the content of the next three
`chapters.
`
`

`
`6
`
`Introduction
`
`1.1 The Power of Graphical Data Display
`
`Figure 1.2 illustrates the power of visualization to reveal complex
`patterns in data. The top left panel is a graph of monthly average
`atmospheric carbon dioxide concentrations measured at the Mauna Loa
`Observatory in Hawaii [9,71]. These data woke up the world. Charles
`Keeling pioneered their collection and fostered them amidst the
`adversity of nature at the top of a volcano and the controversy of man
`closer to sea level. The controversy raged first in science and then later
`in politics [108]. Earlier data had hinted that atmospheric C02 was
`rising due to man-made emissions, but Keeling's data proved the case,
`signaling the danger of global climate change.
`
`The remaining panels of Figure 1.2 show a numerical decomposition
`of the data into four frequency components of variation whose sum is
`equal to the C02 concentrations. The decomposition was carried out by
`a statistical procedure, STL [21]. On the five vertical scales of the figure,
`the number of units per em varies. The heights of the bars on the right
`sides of the panels provide a visualization of the relative scaling; the
`heights represent equal changes in parts per million on the five vertical
`scales.
`
`The component graphed in the upper right panel is a trend
`component that describes the persistent long-term increase in the level
`of the concentrations. This rise, if continued unabated, will eventually
`cause atmospheric temperatures to rise, the polar ice caps to melt, the
`coastal areas of the continents to flood, and the climates of different
`regions of the earth to change radically [57,80,108]. And the graph
`shows that the rate of increase of C02 is itself increasing through time.
`
`The component graphed in the third panel from the bottom is a
`seasonal component: a yearly cycle in the concentrations due to the
`waxing and waning of foliage in the Northern Hemisphere. VVhen
`foliage grows in the spring, plant tissue absorbs C02 from the
`atmosphere, depositing some of the carbon in the soil, and atmospheric
`concentrations decline. When the foliage decreases at the end of the
`summer, C02 returns to the atmosphere, and the atmospheric
`concentrations increase. The graph shows that the amplitudes of these
`seasonal oscillations have increased slightly through time.
`
`

`
`The Elements of Graphing Data
`
`7
`
`E a. s
`
`C\J
`
`0
`()
`
`355
`
`D
`
`-a
`c:
`~ 335
`I-
`
`315
`
`D
`
`1960
`
`1970
`
`1980
`
`1990
`
`1960
`
`1970
`
`1980
`
`1990
`
`Year
`
`Year
`
`355
`
`335
`
`315
`
`3
`
`O w
`
`w ·~m~w·······•
`
`· •••»'*
`
`< •«M"
`
`'"""•. "• "'
`
`• •• •""
`
`•·><
`
`W·<·
`
`• ' · ' « "''"~""" <« »«•·- 0
`
`al
`c:
`~
`
`Q)
`(/)
`
`-3
`
`~ 0°.
`
`E
`&.
`
` jJJ:,~.~ul uUi1•.tilJuJI4~~..~.1.1tJI~IJif·,4(~.A,.J ... r~. .. ,..u11,t1 ... .~~~;r"•~J u,.l~t·,~~l·r',Y·~r''~":~~tl n
`,,,,~'11'1 u
`lr', rn".rl I 1iflll' ''·ff'ft
`.
`r"l,lrr"'r'Ttrr ,·, ~. "n
`II
`-0.7 '-...---------.---------.------------,~
`1990
`1970
`1980
`1960
`
`7 0
`
`Year
`
`1.2 THE POWER OF GRAPHICAL DATA DISPLAY. Visualization provides insight that
`cannot be appreciated by any other approach to learning from data. On this graph, the
`top left panel displays monthly average C02 concentrations from Mauna Loa, Hawaii.
`The remaining panels show frequency components of variation in the data. The heights
`of the five bars on the right sides of the panels portray the same changes in ppm on the
`five vertical scales.
`
`

`
`8
`
`Introduction
`
`An oscillatory component, graphed in the second panel from the
`bottom, is made up mostly of variation with periods in a band centered
`near three years. This variation is associated with changes in the
`Southern Oscillation index, a measure of the difference in atmospheric
`pressure between Easter Island in the South Pacific and Darwin,
`Australia. Changes in the index are also associated with changes in
`climate. For example, when the index drops sharply, the trade winds are
`reduced and the temperature of the equatorial Pacific increases. This
`warming, which has important consequences for South America, often
`occurs around Christmas time and is called El Nino- the child [73].
`
`The component shown in the bottom panel has no apparent, strong,
`time pattern and behaves, for the most part, like random noise.
`
`Figure 1.2 conveys a large amount of information about the C02
`concentrations. We have been able to summarize overall behavior and to
`see detailed information. As the eminent statistician W. Edwards
`Deming would have put it [45], "the graph retains the information in the
`data."
`
`Many techniques of data analysis have data reduction as their first
`step. For example, classical statistical procedures, widely used in science
`and technology, fall in this category. The first step is to take all of the
`data and reduce them to a few statistics such as means, standard
`deviations, correlation coefficients, variance components, and t-tests.
`Then, inferences are based on this very limited collection of values.
`Using only numerical reduction methods in data analyses is far too
`limiting. We cannot expect a small number of numerical values to
`consistently convey the wealth of information that exists in data.
`Numerical reduction methods do not retain the information in the data.
`
`Contained within the data of any investigation is information that can
`yield conclusions to questions not even originally asked. That is, there
`can be surprises in the data. The progress of science depends heavily on
`formulating hypotheses and probing them by data collection. Darwin,
`in a letter to Henry Fawcett in 1861, writes [54]: "How odd it is that
`anyone should not see that all observation must be for or against some
`view if it is to be of any service." But analyses of data should not
`narrowly focus on just those hypotheses that led to collection. This
`inhibits finding surprises in the data. To regularly miss surprises by
`failing to probe thoroughly with visualization tools is terribly inefficient
`
`

`
`The Elements of Graphing Data
`
`9
`
`because the cost of intensive data analysis is typically very small
`compared with the cost of data collection.
`
`A graph of C02 concentrations similar to that of Figure 1.2 produced
`a surprise discovery. For a long time it was thought that the amplitude
`of the seasonal component was stable and not changing through time,
`but eventually three groups- one at CSIRO in Australia [102], a second
`at Scripps Institution of Oceanography in the United States [3], and a
`third at AT&T Bell Laboratories in the United States [30](cid:173)
`independently discovered the small, but persistent change in the Mauna
`Loa seasonal cycles. For the Bell Labs group, the discovery was
`serendipitous. The goal of the analysis had been to study the
`relationship between COz and the Southern Oscillation index. The first
`step in the analysis was to decompose the COz concentrations as in
`Figure 1.2 to get the oscillatory component so it could be correlated with
`the index. Fortunately, the group graphed all of the components, and the
`graph showed clearly the persistent change in the amplitude of the
`seasonal component. This surprise was so exciting that the group
`switched its mission to the seasonal behavior of COz and abandoned the
`original mission. No one yet has a good understanding of what is
`causing the change. It might be a harbinger of changes in the earth's
`climate or it might be simply part of the natural variation in COz.
`
`1.2 The Challenge of Graphical Data Display
`
`Visualization is surprisingly difficult. Even the most simple matters
`can easily go wrong. This will be illustrated by three examples where
`seemingly straightforward graphical tasks ran into trouble.
`
`

`
`10
`
`Introduction
`
`Aerosol Concentrations
`
`Figure 1.3 is a graphical method called a q-q plot which will be
`discussed in detail in Chapter 3; the figure shows the graph as it
`originally appeared in a Science report [31]. As with almost all of the
`reproduced graphs in this book, the size of the graph is the same as that
`of the source. The display compares Sunday and workday
`concentrations of aerosols, or particles in the air. First, the graph has a
`construction error: the 0.0 label on the horizontal scale should be 0.6.
`Unfortunately, the error makes it appear that the left corner is the origin;
`many readers probably wondered why the line y = x, which is drawn
`on the graph, does not go through the origin. A second problem is that
`the scales on the graph are poorly chosen; comparison of the Sunday
`and workday values would have been enhanced by making the
`horizontal and vertical scales the same. Scale issues such as these are
`discussed in Chapter 2. Finally, the display of the data misses an
`opportunity to see the behavior of the data more thoroughly. On this
`single panel it is not easy to compare the vertical distances of the points
`from the line y = x; the solution is a graphical method called the Tukey
`mean-difference plot, which will be introduced in Chapter 3.
`
`1 .3 THE CHALLENGE OF GRAPHICAL DATA DISPLAY. This
`graph compares Sunday and workday concentrations of aerosols.
`The line shown is y = x. The graph has problems. There is a
`construction error: the 0.0 label on the horizontal scale is wrong
`and should be 0.6. The horizontal and vertical scales should be
`the same but are not. Furthermore, it is hard to judge the
`deviations of the points from the line y = x.
`
`16
`
`12
`
`08
`
`0.4
`
`"' >. "' "0
`
`c:
`:::1
`::tJ
`
`.
`...
`
`,.
`
`-..
`
`Aero~ol~
`Elizabeth
`truds)
`
`0.0
`0.0
`
`1.2
`
`2.0
`
`2.8
`
`Workdays
`
`0-Ring Data
`
`On January 27, 1986, the day before the last flight of the space shuttle
`Challenger, a group of engineers met to study an alarm that had been
`raised. The forecast of temperature at launch time the following day was
`31°. There was a suggestion that the low temperature might affest the
`performance of the 0-rings that sealed the joints of the rocket motors.
`
`

`
`The Elements of Graphing Data
`
`11
`
`To assess the issue, the engineers studied a graph of the data shown in
`Figure 1.4. Each data point was from a shuttle flight in which the
`0-rings had experienced thermal distress. The horizontal scale is 0-ring
`temperature, and the vertical scale is the number of 0-rings
`experiencing distress. The graph revealed no effect of temperature on
`the number of stress problems, and Morton Thiokel, the rocket
`manufacturer, communicated to NASA the conclusion that the
`"temperature data [are] not conclusive on predicting primary 0-ring
`blowby" [43]. The next day Challenger took off, the 0-rings failed, and
`the shuttle exploded, killing the seven people on board.
`
`en
`Q)
`(.)
`c
`Q) -c
`"(3
`
`c -0 .....
`
`Q)
`..0
`E
`::I z
`
`3
`
`0
`
`2
`
`0
`
`0
`
`00
`
`0
`
`8
`
`60
`
`70
`
`80
`
`Calculated Joint Temperature (°F)
`
`1.4 STATISTICAL REASONING. These
`data were graphed by space shuttle
`engineers the evening before the
`Challenger accident to determine the
`dependence of 0-ring failure on
`temperature. Data for no failures was not
`graphed in the mistaken belief that it was
`irrelevant to the issue of dependence.
`The engineers concluded from the graph
`that there is no dependence.
`
`The conclusion of the January 27 analysis was incorrect, in part,
`because the analysis of the data by the graph in Figure 1.4 was faulty. It
`omitted data for flights in which no 0-rings experienced thermal
`distress. Figure 1.5 shows a graph with all data included. Now a pattern
`emerges. The Rogers Commission, a group that intensively studied the
`Challenger mission afterward, concluded that the engineers had omitted
`the no-stress data in the mistaken belief that they would contribute no
`information to the thermal-stress question [43].
`
`

`
`12
`
`Introduction
`
`!
`I
`
`I I
`
`(/)
`
`Q) u
`c
`Q)
`"'0
`"t5
`
`c: -0 .....
`
`Q)
`..c
`E
`::l z
`
`3
`
`0
`
`2
`
`0
`
`0
`
`00
`
`0
`
`8
`
`o§ooB oo oB oooo r
`I
`I
`70
`80
`
`60
`
`Calculated Joint Temperature (°F)
`
`1.5 STATISTICAL REASONING. The
`complete set of 0-ring data is now
`graphed, including the observations with
`no failures. A dependence of failure on
`temperature is revealed.
`
`The graphical analysis of the 0-ring data failed, not because of the
`display method used, as with the aerosol data, but rather because of a
`poor choice of the statistical information selected for the graph. This
`arose because of a flaw in the statistical reasoning that underlay the
`graph. The flaw violated a basic statistical principle: in the analysis of
`failure data, the values of a causal variable when no failures occur are as
`relevant to the analysis as the values when failures occur. Statistical
`thinking is vital to data display. A number of statistical principles are
`discussed in Chapters 2 and 3.
`
`Brain Masses and Body Masses of Animal Species
`
`Figure 1.6 is a graph from Carl Sagan's intriguing book, The Dragons
`of Eden [107]. The graph shows the brain masses and body masses, both
`on a log scale, of a collection of animal species. We can see that log brain
`mass and log body mass are correlated, but this was not the main reason
`for making the graph.
`
`

`
`The Elements of Graphing Data
`
`13
`
`1 0,000 ..-E --r---T'n'T1.,..--1
`
`r-r-rTTTn,..-r
`1
`1
`5,000 1-
`
`rTT"mnr; ,1 --r,......,..,nr.1r-rTTTe'""'"~
`"TTTnrn-1.rTTTTTIII"----.-TTTm,,,..--,
`1
`Dol1>hin
`Elc1>hant •
`/ ·-
`\
`Blue whale
`•
`~odern man •
`/ ~·lale gorill~•
`'
`Homo habilis •
`•
`Tyrannosauru~ _
`rex I
`Gracile Australopithecus ~ .
`Chimpanzee---.... If • LJOn
`•
`/___,•Wo
`Brachios·nnus•
`b
`Ba oon /
`•
`·•
`"-':
`Saurornithoid Ostrich •
`Diplodocus •
`-
`
`s
`"' 6h
`. E
`"' "' "' e
`
`c
`·~
`=:l
`
`1,000 [
`.')0() F-
`
`100[
`50 F-
`
`10.0 f
`5.0 F-
`
`1.0
`0.5;;...
`
`I
`Moe •
`0.1 r
`
`0.05 F-
`
`Alligator •
`
`eCoelacanth
`
`•Crow
`•Opossum
`•Rat
`• Vampire bat
`•Goldfish
`
`•Eel
`
`• Hummingbird
`
`• Stegosaurus
`
`-:
`...::
`
`-:
`-=
`
`0.01
`.I
`0.001 0.01
`
`uL
`0.1
`
`,(
`1
`
`,J
`10
`
`,J
`100
`
`,.!_
`1,000 10,000 100,000
`
`Body mass in kilograms
`
`1.6 THE CHALLENGE OF GRAPHICAL DATA DISPLAY. This graph shows brain and
`body masses of animal species. The intent was for viewers to judge an intelligence
`measure, but the judgments require a visual operation that is too difficult.
`
`What Sagan wanted to describe was an intelligence scale that has
`been investigated extensively by Harry J. Jerison [65]. Sagan writes that
`this measure of intelligence is "the ratio of the mass of the brain to the
`total mass of the organism." Later he adds, referring the reader to the
`graph, "of all the organisms shown, the beast with the largest brain mass
`for its body weight is a creature called Homo sapiens. Next in such a
`ranking are dolphins."
`
`The first problem is that Sagan has made a mistake in describing the
`intelligence measure; it is not the ratio of brain to body mass but rather is
`(brain mass)/ (body mass )213 . If we study a group of related species,
`such as all mammals, brain mass tends to increase as a function of body
`mass. The general pattern of the data is reasonably well described by the
`equation
`brain mass = c (body mass ?13
`
`•
`
`

`
`I l l i il
`
`!
`,,!!
`
`1'.
`
`14
`
`Introduction
`
`1:!;
`.,
`l'
`
`I
`II I'
`
`'I I!,
`i I
`i I
`:!
`i I
`
`'I
`I
`!
`
`Since the densities of different species do not vary radically, we may
`think of the masses as being surrogate measures for volume, and volume
`to the 2/3 power behaves like a surface area. Thus the empirical
`relationship says that brain mass depends on the surface area of the
`body; Stephen Jay Gould conjectures that this is so because body
`surfaces serve as end points for so many nerve channels [52]. Now
`suppose a given species has a greater brain mass than other species with
`the same body mass; what this means is that
`
`(brain mass)/ (body mass )2
`
`/
`
`3
`
`is greater. We might expect that the big-brained species would be more
`intelligent since it has an excess of brain capacity given its body surface.
`This idea leads to measuring intelligence by this ratio.
`
`Let us now return to Figure 1.6 and consider the graphical problem,
`which is a serious one. How do we judge the intelligence measure from
`the graph? Suppose two species have the same intelligence measure;
`then both have the same value of
`
`(brain mass)
`----'-----~ = r .
`(body mass )213
`
`Thus
`
`log(brain mass) = 2/3log(body mass) +log (r)
`for both species. This means that in Figure 1.6, the two equally
`intelligent species lie on a line with slope 2/3. Suppose one species has a
`greater value of r than another; then the smarter one lies on a line with
`slope 2/3 that is to the northwest of the line on which the less intelligent
`one lies. In other words, to judge the intelligence measure from
`Figure 1.6 we must mentally superpose a set of parallel lines with
`slope 2/3. (If we attempt to judge Sagan's mistaken ratios, we must
`superpose lines with slope 1.) This visual operation is simply too hard.
`
`Figure 1.6 can be greatly improved, at least for the purpose of
`showing the intelligence measure, by graphing the measure directly on a
`log scale, as is done in the dot plot of Figure 1.7. Now we can see
`strikingly many things not so apparent from Figure 1.6. Happily,
`modern man is at the top. Dolphins are next; interestingly, they are
`ahead of our ancestor Homo habilis.
`
`

`
`The Elements of Graphing Data
`
`15
`
`The problems with Figure 1.6 do not stop here. Five of the labels are
`wrong. The following are the corrections: "saurornithoid" should be
`"wolf," "wolf" should be "saurornithoid," "hummingbird" should be
`"goldfish," "goldfish" should be "mole," and "mole" should be
`"hummingbird." The correct labels yield the satisfying result that a
`hummingbird is smaller than a mole.
`
`It should be emphasized that for some purposes, a corrected version
`of Figure 1.6 is a useful graph. For example, it shows the values of the
`brain and body masses and gives us information about their relationship.
`The point is that it does a poor job of showing the intelligence measure.
`
`-2
`-3
`-1
`0
`I
`L
`I
`L
`···· ····· · ······ ··· · · ··· ···· · ···· ······ ··· · · · · · · ··· ····· ··· ··· · · ····· · · ···•·· ·
`Modern Man
`· ·· · ····· ······ · ··· ····· ··· · ·· ··· · ···· · ··· ··· · · ··· ····· · ··· ··· ·· · · ···•···· ····
`Dolphin
`··· · ··· ·· ··· · · ···· · ···· · ····· ··· ·· ·· · ···· ········· · · ·· ·· · ··· · · ·· ···•· ··· · ·· ···
`Homo habilis
`· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·•· · · · · · · · · · · · ·
`Gracile Australopithecus
`· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·•· · · · · · · · · · · · · · · · · ·
`Chimpanzee
`··· · ····· ····· · · ··· ····· ··· ··· ·· · · ···· · · ·· ···· · ··· · ··· · · ··•·· ·· · · ··· · ···· ····
`Baboon
`Crow ··························································~··················
`Vampire Bat
`· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·•· · · · · · · · · · · · · · · · · · ·
`Wolf ·······················································•······················
`Gorilla ··· · ····· ······ · ··· ·· ··· ··· · ····· ······ · ·· ··· ·· ·· · · ···•· ··· ·· · · · ·· ··· · · ··· ··· ·
`Elephant
`· · · · · · · · · · · · · · · · · · · · · · · · ·· · · · · · · · · · · · · · · · · · · · · · · · · · · · ·•· · · · · · · · · · · · · · · · · · · · · · ·
`Hummingbird
`· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · • · · · · · · · · · · · · · · · · · · · · · · · · · ·
`Lion
`···· ······· · · ··· · · · · · · ·· .... · ····· ............ ··•· .......................... .
`~···············································•·····························
`Mole ··············································•······························
`Opossum
`· · · · · · · · · · · · · · · · · · · · · · · · ·· · · · · · · · · ·· · · · · · · · · · •· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
`···· ···································· ···•·································
`Blue Whale
`Saurornithoid
`· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · •· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
`··· · ····· ······· ··· ·· ·· ···· ·· · ··· ······ •· ··· ·· ··· ····· ···· ·· · ·· · ···· · ···· ·· ··
`Goldfish
`Ostrich
`· · · · · · · · · · · ·

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket