`WDDU~<§Jm $a ©l®w®U<§Jlfll@J
`
`IBG 1058
`CBM2015-00179
`
`
`
`
`
`The Elements of Graphing Data
`by WilliamS. Cleveland
`
`REVIEWERS RAVED ABOUT THE 1ST EDITION
`THE REVISED EDITION Is EVEN BETTER
`
`Meteorological Magazine "Ideally, ev(cid:173)
`eryone interested in getting the most
`out of their data or presenting data
`clearly and concisely should have a
`copy handy."
`
`The American Cartographer "An ex(cid:173)
`cellent stimulus for deeper thinking
`about display techniques ... "
`
`College and Research Libraries "This
`book is a gem. Buy it, read it and urge
`everyone you know whose job it is to
`convert raw data to meaningful infor(cid:173)
`mation to do the same."
`
`Computing Reviews "This is an ad(cid:173)
`mirable book. It is clearly written and
`intellectually engaging."
`
`The American Mathematical Monthly
`"Graphical methods in science and
`technology. Many new ideas and
`methods; many not widely known
`before. Excellent methodological re(cid:173)
`source for research workers. Theme
`is communication of information from
`raw data."
`
`Atmospheric Environment " ... certain
`kinds of tendency toward bad graph(cid:173)
`ics could be cured if as many authors
`as possible would not just read, but, in
`the words of the Anglican Prayer Book,
`'learn, mark, and inwardly digest' this
`volume."
`
`
`
`0 Birds
`
`0 Fish
`
`0 Primates
`
`o Nonprimate Mammals
`
`I
`
`4-
`
`I
`
`I
`
`I
`
`I
`
`I
`
`I
`
`0
`
`0
`
`I
`
`f-
`
`-
`
`I-
`
`-
`
`-
`
`I
`7
`
`c;; 3-
`E
`co
`,_
`0>
`0
`0> 2-
`0
`...__..
`......
`..c
`0>
`"(j) s
`
`1 -
`
`c
`"@
`,_
`a:J
`0>
`0
`_J
`
`o-
`
`-1 -
`
`I
`0
`
`0
`0
`
`0
`
`I
`1
`
`0
`oo 0
`
`0
`o ocP
`
`o
`
`0
`
`~
`
`0
`
`0
`
`I ~0 cP 0
`~0
`0 °
`0
`OS(p
`0
`Cb
`0©
`$f§ 0 & gb
`o
`c9o
`Oo ~
`oo
`© 0
`0 0
`&
`a o ~o o
`o
`0
`0
`0
`8 0
`
`0
`
`0
`
`0
`0
`
`0
`
`0
`
`0
`
`I
`2
`
`I
`3
`
`I
`4
`
`I
`5
`
`I
`6
`
`Log Body Weight (log 10 grams)
`
`Figure I. COLOR ENCODING A CATEGORICAL VARIABLE. Color is a powerful tool that can ·
`genuinely enhance the visual decoding of information on data displays. Color can also be used to no
`purpose. We need to be hard-boiled in evaluating the efficacy of a visualization tool. It is easy to be
`dazzled by a display of data rendered in a rainbow of colors; our tendency is to be misled into
`thinking we are absorbing relevant information when we see a lot. But the success of a visualization
`tool should be based solely on the amount we learn about the phenomenon under study. There are
`two uses of color that genuinely transmit information from display to viewer. One is the rendering of
`different categories of graphical elements in different colors to provide efficient visual assembly of
`the categories, that is, to allow us to see each category of elements as a whole, mentally filtering out
`the other categories. In this figure, four different categories of plotting symbols are color encoded,
`and we can easily assemble the symbols of each category. The second use of color is illustrated by
`the display on the next page.
`
`
`
`3
`
`..--..
`E
`...:.::::
`.._...
`0> 2
`c
`:c
`t::::
`0
`z
`
`1
`
`..--..
`E
`0
`I
`E
`..c
`-2-
`>.
`.......
`·:;
`:;::::;
`(j)
`·u;
`(1.) a:
`
`80
`
`60
`
`40
`
`20
`
`0.2
`
`1.0
`0.6
`Easting (km)
`
`1.4
`
`Figure II. COLOR ENCODING A QUANTITATIVE VARIABLE. The second use of color is to
`display a function of two variables on a level plot. In this figure, color encodes soil resistivity to show
`how resistivity varies geographically. The encoding method achieves two goals: effortless
`perception of the order of the encoded quantities, and clearly perceived boundaries between
`adjacent levels. A number of encoding methods- for example, the much used rainbow encoding -
`do not achieve both of these goals.
`
`
`
`WilliamS. Cleveland
`
`The Elements
`of Graphing Data
`
`Revised Edition
`
`· AT&T Bell Laboratories, Murray Hill, New Jersey
`
`
`
`To my parents
`To my parents
`
`
`
`Acknowledgements
`
`To John Tukey, for ingenious inventions and applications of graphical
`data analysis.
`
`To many colleagues at Bell Labs, for creating an optimal environment to
`study graphical data analysis.
`
`To Marylyn McGill, for relentlessly pursuing perfection in experimenting
`with graphical displays.
`
`To Bob McGill, for our experiments in graphical perception and our many
`experiments with graphical inventions.
`
`To Nick Fisher, for effective communication of graphical principles based
`on scientific enquiry.
`
`To Sue Pope and Tina Sharp, for the considerable word processing skills
`that were needed to produce the text.
`
`To Lisa Cleveland, for days of proofreading in Summit and Abcoude.
`
`To Estelle McKittrick, for help in many forins.
`
`To Gerard Gorman, for the image processing that was needed to produce
`many of the displays.
`
`To Alan Cossa, for a high level of quality control in producing camera(cid:173)
`ready output.
`
`To many who commented on the manuscript- Paul Anderson, Jon
`Bentley, John Chambers, Nick Cox, Lisa Cleveland, Arnold Court, Mary
`Donnelly, Nick Fisher, Bob Futrelle, Colin Mallows, Bob McGill, Marylyn
`McGill, Brad Murphy, Richard Nuccitelli, James Palmer, Arno Penzias,
`and John Tukey.
`
`
`
`Published by Hobart Press, Summit, New Jersey
`
`Copyright @1994 AT&T. All rights reserved.
`
`Printed in the United States of America
`
`ISBN 0-9634884-1-4 CLOTH
`
`LIBRARY OF CONGRESS CATALOG CARD NUMBER: 94-075052
`
`PUBLISHER'S CATALOGING IN PUBLICATION
`Cleveland, William S., 1943-
`The elements of graphing data I by William S. Cleveland.
`Revised edition.
`p. em.
`Includes bibliographical references and index.
`
`1. Graphic methods. 2. Mathematical statistics-Graphic
`methods.
`I. Title.
`
`QA90.C54 1994
`
`511'.5
`
`
`
`Contents
`
`Preface
`
`1
`
`1 Introduction 4
`6
`1.1 The Power of Graphical Data Display
`1.2 The Challenge of Graphical Data Display
`16
`1.3 The Contents of the Book
`
`9
`
`2 Principles of Graph Construction 22
`2.1 Terminology
`23
`2.2 Clear Vision
`25
`2.3 Clear Understanding
`66
`2.4 Banking to 45°
`2.5 Scales
`80
`2.6 General Strategy
`
`110
`
`54
`
`3 Graphical Methods 119
`3.1 Logarithms
`120
`3.2 Residuals
`126
`3.3 Distributions
`132
`3.4 Dot Plots
`150
`3.5 Plotting Symbols and Curve Types
`3.6 Visual Reference Grids
`166
`3.7 Loess
`168
`180
`3.8 Time Series
`193
`3.9 Scatterplot Matrices
`3.10 Coplots of Scattered Data
`3.11 Cop lots of Surfaces
`203
`3.12 Brushing 206
`209
`3.13 Color
`3.14 Statistical Variation
`
`198
`
`212
`
`154
`
`
`
`227
`
`4 Graphical Perception 221
`4.1 The Model
`223
`4.2 Superposed Curves
`4.3 Color Encoding
`230
`4.4 Texture Symbols
`234
`240
`4.5 Visual Reference Grids
`4.6 Order on Dot Plots
`244
`251
`.4.7 Banking to 45°
`4.8 Correlation
`256
`4.9 Graphing Along a Common Scale
`4.10 Pop Charts
`262
`
`259
`
`Bibliography 271
`
`Figure Acknowledgements 281
`
`Colophon 283
`
`Index
`
`285
`
`
`
`Preface
`
`1
`
`This book is about visualizing data in science and technology. It
`contains graphical methods and principles that are powerful tools for
`showing the structure of data. The material is relevant for data analysis,
`when the analyst wants to study data, and for data communication, when
`the analyst wants to communicate data to others.
`
`When a graph is made, quantitative and categorical information is
`encoded by a display method. Then the information is visually decoded.
`This visual perception is a vital link. No matter how clever the choice of
`the information, and no matter how technologically impressive the
`encoding, a visualization fails if the decoding fails. Some display
`methods lead to efficient, accurate decoding, and others lead to
`inefficient, inaccurate decoding. It is only through scientific study of
`visual perception that informed judgments can be made about display
`methods. The display methods of Elements rest on a foundation of
`scientific enquiry.
`
`Except for one small section, there is nothing in this book about
`computer graphics. The basic ideas, the methods, and the principles of
`the book transcend the computing environment used to implement
`them. While graphics technology is moving along at a rapid pace, the
`human visual system has remained the same.
`
`The prerequisites for understanding the book are minimal. A few
`topics require a knowledge of the elementary concepts of probability
`and statistical science, but these topics can be skipped without affecting
`comprehension of the remainder of the book.
`
`The book Visualizing Data is a companion volume [26]. It focuses on
`graphical methods, the topic of Chapter 3 of this book; it presents far
`more methods than covered here and is more advanced, requiring a
`greater knowledge of statistics. But Visualizing Data does not delve into
`graphical perception, and takes Elements as a starting point.
`
`
`
`2
`
`Preface
`
`Elements was meant to be read from the beginning and to be enjoyed.
`However, it is possible to read here and there. Winding its way through
`the book is a summary of the material: the figures and their legends.
`Reading this summary can help readers direct themselves to specific
`items.
`
`The graphs in this book are communicating information about
`fascinating subjects, and I have not hesitated to describe the subjects in
`some detail when needed. In many cases some knowledge of the subject
`is required to understand the purpose of a graphical analysis or why a
`graph is not doing what was intended or what a new graphical method
`can show us about data. I hope the reader will share with me the
`excitement of experiencing the increased insight that graphical data
`display brings us about these subjects.
`
`
`
`The Elements of Graphing Data
`
`3
`
`1ffill®
`~~@liTii)@[ft)fr@
`@{/
`®ll©J[p)[{)) Drru~
`[Q)©Ji}@]
`·~~~@[ftJi) ~a ~~@\YJ@~@[ft)<9)
`
`
`
`4
`
`"-
`CD
`£I
`E
`
`:::l z -0 a.
`
`(/l c:
`:::l
`(f)
`
`150
`
`100
`
`50
`
`0
`
`1750
`
`1800
`
`1850
`
`1900
`
`Year
`
`- "-
`
`CD
`0
`0..£:1
`~ E
`:::l
`:::l
`(f)Z
`
`15~ ~ :
`
`1750
`
`1800
`
`1850
`
`1900
`
`Year
`
`J
`
`1.1 GRAPHICAL METHODS AND PRINCIPLES. The visualization of data requires
`basic principles and methods. Both panels of this graph show the yearly sunspot
`numbers from 17 49 to 1924. A display method, banking to 45°, has been used to
`choose the shape, or aspect ratio, of the bottom panel. The method allows us to perceive
`an important property of the sunspots that is not revealed in the top panel -the
`sunspots rise more rapidly than they fall.
`
`-
`
`
`
`1 Introduction
`
`5
`
`Data display is critical to data analysis. Graphs allow us to explore
`data to see overall patterns and to see detailed behavior; no other
`approach can compete in revealing the structure of data so thoroughly.
`Graphs allow us to view complex mathematical models fitted to data,
`and they allow us to assess the validity of such models.
`
`But realizing the potential of data visualization requires methods and
`basic principles. Figure 1.1 illustrates this. The top panel graphs the
`yearly sunspot numbers from 17 49 to 1924. The dominant frequency
`component of variation in the data is the cycles with periods of about 11
`years. The existence of the cycles is clearly revealed, but an important
`property of them is not. And this property is critical to understanding
`the variation in the cycles, which in turn is critical to developing theories
`of solar physics that explain the origin of the sunspots. The problem is
`the shape, or aspect ratio, of the graph, a square. The data are graphed
`again in the bottom panel; a method called banking to 45°, which will be
`introduced in Chapter 2, is used to determine the aspect ratio, and the
`result is a narrow rectangle. Now the graph reveals the important
`property. The cycles typically rise more rapidly than they fall; this
`behavior is most pronounced for the cycles with high peaks, is less
`pronounced for those with medium peaks, and disappears for those
`cycles with the very lowest peaks.
`
`This book is about methods and basic principles that help the data
`analyst to realize the potential of visualization. The next three chapters
`of the book divide the material into principles of graph construction,
`graphical methods, and graphical perception. In this chapter, Section 1.1
`(pp. 6-9) demonstrates the power of visualization, Section 1.2 (pp. 9-15),
`demonstrates how easy it is for the graphing of data go wrong, and
`Section 1.3 (pp. 16-21) briefly describes the content of the next three
`chapters.
`
`
`
`6
`
`Introduction
`
`1.1 The Power of Graphical Data Display
`
`Figure 1.2 illustrates the power of visualization to reveal complex
`patterns in data. The top left panel is a graph of monthly average
`atmospheric carbon dioxide concentrations measured at the Mauna Loa
`Observatory in Hawaii [9,71]. These data woke up the world. Charles
`Keeling pioneered their collection and fostered them amidst the
`adversity of nature at the top of a volcano and the controversy of man
`closer to sea level. The controversy raged first in science and then later
`in politics [108]. Earlier data had hinted that atmospheric C02 was
`rising due to man-made emissions, but Keeling's data proved the case,
`signaling the danger of global climate change.
`
`The remaining panels of Figure 1.2 show a numerical decomposition
`of the data into four frequency components of variation whose sum is
`equal to the C02 concentrations. The decomposition was carried out by
`a statistical procedure, STL [21]. On the five vertical scales of the figure,
`the number of units per em varies. The heights of the bars on the right
`sides of the panels provide a visualization of the relative scaling; the
`heights represent equal changes in parts per million on the five vertical
`scales.
`
`The component graphed in the upper right panel is a trend
`component that describes the persistent long-term increase in the level
`of the concentrations. This rise, if continued unabated, will eventually
`cause atmospheric temperatures to rise, the polar ice caps to melt, the
`coastal areas of the continents to flood, and the climates of different
`regions of the earth to change radically [57,80,108]. And the graph
`shows that the rate of increase of C02 is itself increasing through time.
`
`The component graphed in the third panel from the bottom is a
`seasonal component: a yearly cycle in the concentrations due to the
`waxing and waning of foliage in the Northern Hemisphere. VVhen
`foliage grows in the spring, plant tissue absorbs C02 from the
`atmosphere, depositing some of the carbon in the soil, and atmospheric
`concentrations decline. When the foliage decreases at the end of the
`summer, C02 returns to the atmosphere, and the atmospheric
`concentrations increase. The graph shows that the amplitudes of these
`seasonal oscillations have increased slightly through time.
`
`
`
`The Elements of Graphing Data
`
`7
`
`E a. s
`
`C\J
`
`0
`()
`
`355
`
`D
`
`-a
`c:
`~ 335
`I-
`
`315
`
`D
`
`1960
`
`1970
`
`1980
`
`1990
`
`1960
`
`1970
`
`1980
`
`1990
`
`Year
`
`Year
`
`355
`
`335
`
`315
`
`3
`
`O w
`
`w ·~m~w·······•
`
`· •••»'*
`
`< •«M"
`
`'"""•. "• "'
`
`• •• •""
`
`•·><
`
`W·<·
`
`• ' · ' « "''"~""" <« »«•·- 0
`
`al
`c:
`~
`
`Q)
`(/)
`
`-3
`
`~ 0°.
`
`E
`&.
`
` jJJ:,~.~ul uUi1•.tilJuJI4~~..~.1.1tJI~IJif·,4(~.A,.J ... r~. .. ,..u11,t1 ... .~~~;r"•~J u,.l~t·,~~l·r',Y·~r''~":~~tl n
`,,,,~'11'1 u
`lr', rn".rl I 1iflll' ''·ff'ft
`.
`r"l,lrr"'r'Ttrr ,·, ~. "n
`II
`-0.7 '-...---------.---------.------------,~
`1990
`1970
`1980
`1960
`
`7 0
`
`Year
`
`1.2 THE POWER OF GRAPHICAL DATA DISPLAY. Visualization provides insight that
`cannot be appreciated by any other approach to learning from data. On this graph, the
`top left panel displays monthly average C02 concentrations from Mauna Loa, Hawaii.
`The remaining panels show frequency components of variation in the data. The heights
`of the five bars on the right sides of the panels portray the same changes in ppm on the
`five vertical scales.
`
`
`
`8
`
`Introduction
`
`An oscillatory component, graphed in the second panel from the
`bottom, is made up mostly of variation with periods in a band centered
`near three years. This variation is associated with changes in the
`Southern Oscillation index, a measure of the difference in atmospheric
`pressure between Easter Island in the South Pacific and Darwin,
`Australia. Changes in the index are also associated with changes in
`climate. For example, when the index drops sharply, the trade winds are
`reduced and the temperature of the equatorial Pacific increases. This
`warming, which has important consequences for South America, often
`occurs around Christmas time and is called El Nino- the child [73].
`
`The component shown in the bottom panel has no apparent, strong,
`time pattern and behaves, for the most part, like random noise.
`
`Figure 1.2 conveys a large amount of information about the C02
`concentrations. We have been able to summarize overall behavior and to
`see detailed information. As the eminent statistician W. Edwards
`Deming would have put it [45], "the graph retains the information in the
`data."
`
`Many techniques of data analysis have data reduction as their first
`step. For example, classical statistical procedures, widely used in science
`and technology, fall in this category. The first step is to take all of the
`data and reduce them to a few statistics such as means, standard
`deviations, correlation coefficients, variance components, and t-tests.
`Then, inferences are based on this very limited collection of values.
`Using only numerical reduction methods in data analyses is far too
`limiting. We cannot expect a small number of numerical values to
`consistently convey the wealth of information that exists in data.
`Numerical reduction methods do not retain the information in the data.
`
`Contained within the data of any investigation is information that can
`yield conclusions to questions not even originally asked. That is, there
`can be surprises in the data. The progress of science depends heavily on
`formulating hypotheses and probing them by data collection. Darwin,
`in a letter to Henry Fawcett in 1861, writes [54]: "How odd it is that
`anyone should not see that all observation must be for or against some
`view if it is to be of any service." But analyses of data should not
`narrowly focus on just those hypotheses that led to collection. This
`inhibits finding surprises in the data. To regularly miss surprises by
`failing to probe thoroughly with visualization tools is terribly inefficient
`
`
`
`The Elements of Graphing Data
`
`9
`
`because the cost of intensive data analysis is typically very small
`compared with the cost of data collection.
`
`A graph of C02 concentrations similar to that of Figure 1.2 produced
`a surprise discovery. For a long time it was thought that the amplitude
`of the seasonal component was stable and not changing through time,
`but eventually three groups- one at CSIRO in Australia [102], a second
`at Scripps Institution of Oceanography in the United States [3], and a
`third at AT&T Bell Laboratories in the United States [30](cid:173)
`independently discovered the small, but persistent change in the Mauna
`Loa seasonal cycles. For the Bell Labs group, the discovery was
`serendipitous. The goal of the analysis had been to study the
`relationship between COz and the Southern Oscillation index. The first
`step in the analysis was to decompose the COz concentrations as in
`Figure 1.2 to get the oscillatory component so it could be correlated with
`the index. Fortunately, the group graphed all of the components, and the
`graph showed clearly the persistent change in the amplitude of the
`seasonal component. This surprise was so exciting that the group
`switched its mission to the seasonal behavior of COz and abandoned the
`original mission. No one yet has a good understanding of what is
`causing the change. It might be a harbinger of changes in the earth's
`climate or it might be simply part of the natural variation in COz.
`
`1.2 The Challenge of Graphical Data Display
`
`Visualization is surprisingly difficult. Even the most simple matters
`can easily go wrong. This will be illustrated by three examples where
`seemingly straightforward graphical tasks ran into trouble.
`
`
`
`10
`
`Introduction
`
`Aerosol Concentrations
`
`Figure 1.3 is a graphical method called a q-q plot which will be
`discussed in detail in Chapter 3; the figure shows the graph as it
`originally appeared in a Science report [31]. As with almost all of the
`reproduced graphs in this book, the size of the graph is the same as that
`of the source. The display compares Sunday and workday
`concentrations of aerosols, or particles in the air. First, the graph has a
`construction error: the 0.0 label on the horizontal scale should be 0.6.
`Unfortunately, the error makes it appear that the left corner is the origin;
`many readers probably wondered why the line y = x, which is drawn
`on the graph, does not go through the origin. A second problem is that
`the scales on the graph are poorly chosen; comparison of the Sunday
`and workday values would have been enhanced by making the
`horizontal and vertical scales the same. Scale issues such as these are
`discussed in Chapter 2. Finally, the display of the data misses an
`opportunity to see the behavior of the data more thoroughly. On this
`single panel it is not easy to compare the vertical distances of the points
`from the line y = x; the solution is a graphical method called the Tukey
`mean-difference plot, which will be introduced in Chapter 3.
`
`1 .3 THE CHALLENGE OF GRAPHICAL DATA DISPLAY. This
`graph compares Sunday and workday concentrations of aerosols.
`The line shown is y = x. The graph has problems. There is a
`construction error: the 0.0 label on the horizontal scale is wrong
`and should be 0.6. The horizontal and vertical scales should be
`the same but are not. Furthermore, it is hard to judge the
`deviations of the points from the line y = x.
`
`16
`
`12
`
`08
`
`0.4
`
`"' >. "' "0
`
`c:
`:::1
`::tJ
`
`.
`...
`
`,.
`
`-..
`
`Aero~ol~
`Elizabeth
`truds)
`
`0.0
`0.0
`
`1.2
`
`2.0
`
`2.8
`
`Workdays
`
`0-Ring Data
`
`On January 27, 1986, the day before the last flight of the space shuttle
`Challenger, a group of engineers met to study an alarm that had been
`raised. The forecast of temperature at launch time the following day was
`31°. There was a suggestion that the low temperature might affest the
`performance of the 0-rings that sealed the joints of the rocket motors.
`
`
`
`The Elements of Graphing Data
`
`11
`
`To assess the issue, the engineers studied a graph of the data shown in
`Figure 1.4. Each data point was from a shuttle flight in which the
`0-rings had experienced thermal distress. The horizontal scale is 0-ring
`temperature, and the vertical scale is the number of 0-rings
`experiencing distress. The graph revealed no effect of temperature on
`the number of stress problems, and Morton Thiokel, the rocket
`manufacturer, communicated to NASA the conclusion that the
`"temperature data [are] not conclusive on predicting primary 0-ring
`blowby" [43]. The next day Challenger took off, the 0-rings failed, and
`the shuttle exploded, killing the seven people on board.
`
`en
`Q)
`(.)
`c
`Q) -c
`"(3
`
`c -0 .....
`
`Q)
`..0
`E
`::I z
`
`3
`
`0
`
`2
`
`0
`
`0
`
`00
`
`0
`
`8
`
`60
`
`70
`
`80
`
`Calculated Joint Temperature (°F)
`
`1.4 STATISTICAL REASONING. These
`data were graphed by space shuttle
`engineers the evening before the
`Challenger accident to determine the
`dependence of 0-ring failure on
`temperature. Data for no failures was not
`graphed in the mistaken belief that it was
`irrelevant to the issue of dependence.
`The engineers concluded from the graph
`that there is no dependence.
`
`The conclusion of the January 27 analysis was incorrect, in part,
`because the analysis of the data by the graph in Figure 1.4 was faulty. It
`omitted data for flights in which no 0-rings experienced thermal
`distress. Figure 1.5 shows a graph with all data included. Now a pattern
`emerges. The Rogers Commission, a group that intensively studied the
`Challenger mission afterward, concluded that the engineers had omitted
`the no-stress data in the mistaken belief that they would contribute no
`information to the thermal-stress question [43].
`
`
`
`12
`
`Introduction
`
`!
`I
`
`I I
`
`(/)
`
`Q) u
`c
`Q)
`"'0
`"t5
`
`c: -0 .....
`
`Q)
`..c
`E
`::l z
`
`3
`
`0
`
`2
`
`0
`
`0
`
`00
`
`0
`
`8
`
`o§ooB oo oB oooo r
`I
`I
`70
`80
`
`60
`
`Calculated Joint Temperature (°F)
`
`1.5 STATISTICAL REASONING. The
`complete set of 0-ring data is now
`graphed, including the observations with
`no failures. A dependence of failure on
`temperature is revealed.
`
`The graphical analysis of the 0-ring data failed, not because of the
`display method used, as with the aerosol data, but rather because of a
`poor choice of the statistical information selected for the graph. This
`arose because of a flaw in the statistical reasoning that underlay the
`graph. The flaw violated a basic statistical principle: in the analysis of
`failure data, the values of a causal variable when no failures occur are as
`relevant to the analysis as the values when failures occur. Statistical
`thinking is vital to data display. A number of statistical principles are
`discussed in Chapters 2 and 3.
`
`Brain Masses and Body Masses of Animal Species
`
`Figure 1.6 is a graph from Carl Sagan's intriguing book, The Dragons
`of Eden [107]. The graph shows the brain masses and body masses, both
`on a log scale, of a collection of animal species. We can see that log brain
`mass and log body mass are correlated, but this was not the main reason
`for making the graph.
`
`
`
`The Elements of Graphing Data
`
`13
`
`1 0,000 ..-E --r---T'n'T1.,..--1
`
`r-r-rTTTn,..-r
`1
`1
`5,000 1-
`
`rTT"mnr; ,1 --r,......,..,nr.1r-rTTTe'""'"~
`"TTTnrn-1.rTTTTTIII"----.-TTTm,,,..--,
`1
`Dol1>hin
`Elc1>hant •
`/ ·-
`\
`Blue whale
`•
`~odern man •
`/ ~·lale gorill~•
`'
`Homo habilis •
`•
`Tyrannosauru~ _
`rex I
`Gracile Australopithecus ~ .
`Chimpanzee---.... If • LJOn
`•
`/___,•Wo
`Brachios·nnus•
`b
`Ba oon /
`•
`·•
`"-':
`Saurornithoid Ostrich •
`Diplodocus •
`-
`
`s
`"' 6h
`. E
`"' "' "' e
`
`c
`·~
`=:l
`
`1,000 [
`.')0() F-
`
`100[
`50 F-
`
`10.0 f
`5.0 F-
`
`1.0
`0.5;;...
`
`I
`Moe •
`0.1 r
`
`0.05 F-
`
`Alligator •
`
`eCoelacanth
`
`•Crow
`•Opossum
`•Rat
`• Vampire bat
`•Goldfish
`
`•Eel
`
`• Hummingbird
`
`• Stegosaurus
`
`-:
`...::
`
`-:
`-=
`
`0.01
`.I
`0.001 0.01
`
`uL
`0.1
`
`,(
`1
`
`,J
`10
`
`,J
`100
`
`,.!_
`1,000 10,000 100,000
`
`Body mass in kilograms
`
`1.6 THE CHALLENGE OF GRAPHICAL DATA DISPLAY. This graph shows brain and
`body masses of animal species. The intent was for viewers to judge an intelligence
`measure, but the judgments require a visual operation that is too difficult.
`
`What Sagan wanted to describe was an intelligence scale that has
`been investigated extensively by Harry J. Jerison [65]. Sagan writes that
`this measure of intelligence is "the ratio of the mass of the brain to the
`total mass of the organism." Later he adds, referring the reader to the
`graph, "of all the organisms shown, the beast with the largest brain mass
`for its body weight is a creature called Homo sapiens. Next in such a
`ranking are dolphins."
`
`The first problem is that Sagan has made a mistake in describing the
`intelligence measure; it is not the ratio of brain to body mass but rather is
`(brain mass)/ (body mass )213 . If we study a group of related species,
`such as all mammals, brain mass tends to increase as a function of body
`mass. The general pattern of the data is reasonably well described by the
`equation
`brain mass = c (body mass ?13
`
`•
`
`
`
`I l l i il
`
`!
`,,!!
`
`1'.
`
`14
`
`Introduction
`
`1:!;
`.,
`l'
`
`I
`II I'
`
`'I I!,
`i I
`i I
`:!
`i I
`
`'I
`I
`!
`
`Since the densities of different species do not vary radically, we may
`think of the masses as being surrogate measures for volume, and volume
`to the 2/3 power behaves like a surface area. Thus the empirical
`relationship says that brain mass depends on the surface area of the
`body; Stephen Jay Gould conjectures that this is so because body
`surfaces serve as end points for so many nerve channels [52]. Now
`suppose a given species has a greater brain mass than other species with
`the same body mass; what this means is that
`
`(brain mass)/ (body mass )2
`
`/
`
`3
`
`is greater. We might expect that the big-brained species would be more
`intelligent since it has an excess of brain capacity given its body surface.
`This idea leads to measuring intelligence by this ratio.
`
`Let us now return to Figure 1.6 and consider the graphical problem,
`which is a serious one. How do we judge the intelligence measure from
`the graph? Suppose two species have the same intelligence measure;
`then both have the same value of
`
`(brain mass)
`----'-----~ = r .
`(body mass )213
`
`Thus
`
`log(brain mass) = 2/3log(body mass) +log (r)
`for both species. This means that in Figure 1.6, the two equally
`intelligent species lie on a line with slope 2/3. Suppose one species has a
`greater value of r than another; then the smarter one lies on a line with
`slope 2/3 that is to the northwest of the line on which the less intelligent
`one lies. In other words, to judge the intelligence measure from
`Figure 1.6 we must mentally superpose a set of parallel lines with
`slope 2/3. (If we attempt to judge Sagan's mistaken ratios, we must
`superpose lines with slope 1.) This visual operation is simply too hard.
`
`Figure 1.6 can be greatly improved, at least for the purpose of
`showing the intelligence measure, by graphing the measure directly on a
`log scale, as is done in the dot plot of Figure 1.7. Now we can see
`strikingly many things not so apparent from Figure 1.6. Happily,
`modern man is at the top. Dolphins are next; interestingly, they are
`ahead of our ancestor Homo habilis.
`
`
`
`The Elements of Graphing Data
`
`15
`
`The problems with Figure 1.6 do not stop here. Five of the labels are
`wrong. The following are the corrections: "saurornithoid" should be
`"wolf," "wolf" should be "saurornithoid," "hummingbird" should be
`"goldfish," "goldfish" should be "mole," and "mole" should be
`"hummingbird." The correct labels yield the satisfying result that a
`hummingbird is smaller than a mole.
`
`It should be emphasized that for some purposes, a corrected version
`of Figure 1.6 is a useful graph. For example, it shows the values of the
`brain and body masses and gives us information about their relationship.
`The point is that it does a poor job of showing the intelligence measure.
`
`-2
`-3
`-1
`0
`I
`L
`I
`L
`···· ····· · ······ ··· · · ··· ···· · ···· ······ ··· · · · · · · ··· ····· ··· ··· · · ····· · · ···•·· ·
`Modern Man
`· ·· · ····· ······ · ··· ····· ··· · ·· ··· · ···· · ··· ··· · · ··· ····· · ··· ··· ·· · · ···•···· ····
`Dolphin
`··· · ··· ·· ··· · · ···· · ···· · ····· ··· ·· ·· · ···· ········· · · ·· ·· · ··· · · ·· ···•· ··· · ·· ···
`Homo habilis
`· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·•· · · · · · · · · · · · ·
`Gracile Australopithecus
`· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·•· · · · · · · · · · · · · · · · · ·
`Chimpanzee
`··· · ····· ····· · · ··· ····· ··· ··· ·· · · ···· · · ·· ···· · ··· · ··· · · ··•·· ·· · · ··· · ···· ····
`Baboon
`Crow ··························································~··················
`Vampire Bat
`· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·•· · · · · · · · · · · · · · · · · · ·
`Wolf ·······················································•······················
`Gorilla ··· · ····· ······ · ··· ·· ··· ··· · ····· ······ · ·· ··· ·· ·· · · ···•· ··· ·· · · · ·· ··· · · ··· ··· ·
`Elephant
`· · · · · · · · · · · · · · · · · · · · · · · · ·· · · · · · · · · · · · · · · · · · · · · · · · · · · · ·•· · · · · · · · · · · · · · · · · · · · · · ·
`Hummingbird
`· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · • · · · · · · · · · · · · · · · · · · · · · · · · · ·
`Lion
`···· ······· · · ··· · · · · · · ·· .... · ····· ............ ··•· .......................... .
`~···············································•·····························
`Mole ··············································•······························
`Opossum
`· · · · · · · · · · · · · · · · · · · · · · · · ·· · · · · · · · · ·· · · · · · · · · · •· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
`···· ···································· ···•·································
`Blue Whale
`Saurornithoid
`· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · •· · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·
`··· · ····· ······· ··· ·· ·· ···· ·· · ··· ······ •· ··· ·· ··· ····· ···· ·· · ·· · ···· · ···· ·· ··
`Goldfish
`Ostrich
`· · · · · · · · · · · ·