`in Statistics
`
`
`
`Jeffrey S. Simonoff
`
`Smoothing
`Methods in
`
`Statistics
`
` r If; "
`
`.
`a: :=— Sprlnger
`
`PAGE 1 OF 349
`
`IPR of US. Pat. No. 8,942,25
`
`SONOS EXHIBIT 101
`
`
`PAGE 1 OF 349
`
`SONOS EXHIBIT 1016
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`Springer Series in Statistics
`
`Advisors:
`P. Bickel, P. Diggle, S. Fienberg, K. Krickeberg,
`I. Olkin, N. Wermuth, S. Zeger
`
`Springer
`New York
`Berlin
`Heidelberg
`Barcelona
`Budapest
`Hong Kong
`London
`Milan
`Paris
`Santa Clara
`Singapore
`Tokyo
`
`
`PAGE 2 OF 349
`
`SONOS EXHIBIT 1016
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`Springer Series in Statistics
`
`Andersen/BorganiGill/Keiding: Statistical Models Based on Counting Processes.
`Andrews/Herzberg: Data: A Collection of Problems from Many Fields for the Student
`and Research Worker.
`Anscombe: Computing in Statistical Science through APL.
`Berger: Statistical Decision Theory and Bayesian Analysis, 2nd edition.
`BolJarineiZacks: Prediction Theory for Finite Populations.
`Borg/Groenen: Modem Multidimensional Scaling: Theory and Applications
`Bremaud: Point Processes and Queues: Martingale Dynamics.
`Brockwell/Davis:Time Series: Theory and Methods, 2nd edition.
`Daley/Vere-Jones: An Introduction to the Theory of Point Processes.
`Dzhaparidze: Parameter Estimation and Hypothesis Testing in Spectral Analysis of
`Stationary Time Series.
`Fahrmeir/Tutz: Multivariate Statistical Modelling Based on Generalized Linear
`Models.
`Farrell: Multivariate Calculation.
`Federer: Statistical Design and Analysis for Intercropping Experiments.
`Fienberg/Hoaglin/Kruskal/Tanur(Eds.): A Statistical Model: Frederick Mosteller's
`Contributions to Statistics, Science and Public Policy.
`Fisher/Sen: The Collected Works of Wassily Hoeffding.
`Good: Permutation Tests: A Practical Guide to Resampling Methods for Testing
`Hypotheses.
`Goodman/Kruskal: Measures of Association for Cross Classifications.
`Gourieroux: ARCH Models and Financial Applications.
`Grandell: Aspects of Risk Theory.
`Haberman: Advanced Statistics, Volume I: Description of Populations.
`Hall: The Bootstrap and Edgeworth Expansion.
`Hardie: Smoothing Techniques: With Implementation in S.
`Hart: Nonparametric Smoothing and Lack-of-Fit Tests.
`Hartigan: Bayes Theory.
`Heyde: Quasi-Likelihood And Its Application: A General Approach to Optimal
`Parameter Estimation.
`Heyer: Theory of Statistical Experiments.
`Huet/Bouvier/Gruet/Jolivet: Statistical Tools for Nonlinear Regression: A Practical
`Guide with S-PLUS Examples.
`Jolliffe: Principal Component Analysis.
`Kolen/Brennan: Test Equating: Methods and Practices.
`Kotz/Johnson (Eds.): Breakthroughs in Statistics Volume I.
`Kotz/Johnson (Eds.): Breakthroughs in Statistics Volume II.
`Kotz/Johnson (Eds.): Breakthroughs in Statistics Volume III.
`Kres: Statistical Tables for Multivariate Analysis.
`Kuchler/Sorensen: Exponential Families of Stochastic Processes.
`Le Cam: Asymptotic Methods in Statistical Decision Theory.
`
`(continued after index)
`
`
`PAGE 3 OF 349
`
`SONOS EXHIBIT 1016
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`Jeffrey S. Simonoff
`
`Smoothing Methods
`in Statistics
`
`With 117 Figures
`
`"
`
`Springer
`
`
`PAGE 4 OF 349
`
`SONOS EXHIBIT 1016
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`Jeffrey S. Simonoff
`Department of Statistics and Operations Research
`Leonard N. Stern School of Business
`New York University
`44 West 4th Street
`New York, NY 10012-1126 USA
`
`Library of Congress Cataloging-in-Publication Data
`Simonoff, Jeffrey S.
`Smoothing methods in statistics / Jeffrey S. Simonoff.
`p.
`cm. -
`(Springer series in statistics)
`Includes bibliographical references and indexes.
`ISBN-13: 978-1-4612-8472-7
`e-ISBN-13: 978-1-4612-4026-6
`DOl: 10.1007/978-1-4612-4026-6
`1. Smoothing (Statistics)
`I. Title.
`QA278.S526 1996
`519.5'36-dc20
`
`II. Series.
`
`96-11742
`
`Printed on acid-free paper.
`
`© 1996 Springer-Verlag New York, Inc.
`Softcover reprint of the hardcover 1 st edition 1996
`All rights reserved. This work may not be translated or copied in whole or in part without the
`written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New
`York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly
`analysis. Use in connection with any form of information storage and retrieval, electronic
`adaptation, computer software, or by similar or dissimilar methodology now known or hereaf(cid:173)
`ter developed is forbidden.
`The use of general descriptive names, trade names, trademarks, etc., in this publication, even
`if the former are not especially identified, is not to be taken as a sign that such names, as
`understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely
`by anyone.
`
`Production managed by Hal Henglein; manufacturing supervised by Jeffrey Taub.
`Camera-ready copy prepared from the author's TeX files.
`
`9875432 (Corrected second printing. 1998)
`
`
`PAGE 5 OF 349
`
`SONOS EXHIBIT 1016
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`To Beverly, Robert and Alexandra
`
`
`PAGE 6 OF 349
`
`SONOS EXHIBIT 1016
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`Preface
`
`The existence of high speed, inexpensive computing has made it easy to look
`at data in ways that were once impossible. Where once a data analyst was
`forced to make restrictive assumptions before beginning, the power of the
`computer now allows great freedom in deciding where an analysis should
`go. One area that has benefited greatly from this new freedom is that of
`non parametric density, distribution, and regression function estimation, or
`what are generally called smoothing methods. Most people are familiar with
`some smoothing methods (such as the histogram) but are unlikely to know
`about more recent developments that could be useful to them.
`If a group of experts on statistical smoothing methods are put in a
`room, two things are likely to happen. First, they will agree that data
`analysts seriously underappreciate smoothing methods. Smoothing meth(cid:173)
`ods use computing power to give analysts the ability to highlight unusual
`structure very effectively, by taking advantage of people's abilities to draw
`conclusions from well-designed graphics. Data analysts should take advan(cid:173)
`tage of this, they will argue.
`Then, they will strongly disagree about which smoothing methods
`should be disseminated to the public. These conflicts, which often hinge on
`subtle technical points, send a garbled message, since nonexperts naturally
`think that if the experts can't agree, the field must be too underdeveloped
`to be of practical use in analyzing real data. Besides being counterproduc(cid:173)
`tive, these arguments are often pointless, since while some methods are
`better at some tasks than others, no method is best on all counts. The
`data analyst must always address issues of conceptual and computational
`simplicity versus complexity.
`In this book, I have tried to sort through some of these controversies
`and uncertainties, while always keeping an eye towards practical applica(cid:173)
`tions. Some of the methods discussed here are old and well understood,
`while others are promising but underdeveloped. In all cases, I have been
`guided by the idea of highlighting what seems to work, rather than by the
`elegance (or even existence) of statistical theory.
`This book is, first and foremost, for the data analyst. By "data ana(cid:173)
`lyst" I mean the scientist who analyzes real data. This person has a good
`knowledge of basic statistical theory and methodology, and probably knows
`
`
`PAGE 7 OF 349
`
`SONOS EXHIBIT 1016
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`viii
`
`Preface
`
`certain areas of statistics very well, but might be unaware of the benefits
`that smoothing methods could bring to his or her problems. Such a person
`should benefit from the decidedly applied focus of the book, as arguments
`generally proceed from actual data problems rather than statistical theory.
`A second audience for this book is statisticians who are interested in
`studying the area of smoothing methods, perhaps with the intention of un(cid:173)
`dertaking research in the field. For these people, the "Background material"
`section in each chapter should be helpful. The section is bibliographic, giv(cid:173)
`ing references for the methods described in the chapter, but it also fills in
`some gaps and mentions related approaches and results not discussed in
`the main text. The extensive reference list (with over 750 references) also
`allows researchers to follow up on original sources for more technical details
`on different methods.
`This book also can be used as a text for a senior undergraduate or
`graduate-level course in smoothing. If the course is at an applied level, the
`book can be used alone, but a more theoretical course probably requires
`the use of supplementary material, such as some of the original research
`papers. Each chapter includes exercises with a heavily computational focus
`(indeed, some are quite time consuming computationally) based on the data
`sets used in the book. Appendix A gives details on the data sets and how
`to obtain them electronically.
`I believe that anyone interested in smoothing methods benefits from
`applying the methods to real data, and I have included sources of code for
`methods in a "Computational issues" section in each chapter. New code
`often becomes available, and commercial packages change the functionality
`that they provide, but I still hope that these sources are useful. Many
`software packages include the capability to write macros, which means that
`analysts can write their own code (or perhaps someone else already has).
`I apologize for any omissions or errors in the descriptions of packages and
`code. No endorsement or warranty, express or implied, should be drawn
`from the listing and/or description of the software given in this book. The
`available code is to be used at one's own risk. See Appendix B for more
`details on computational issues.
`In recent years, several good books on different aspects of smoothing
`have appeared, and I would be remiss if I did not acknowledge my debt
`to them. These include, in particular, Silverman (1986), Eubank (1988),
`HardIe (1990, 1991), Hastie and Tibshirani (1990), Wahba (1990), Scott
`(1992), Green and Silverman (1994), and Wand and Jones (1995). I own all
`these books, but I believe that this book is different from them.
`The coverage in this book is very broad, including simple and complex
`univariate and multivariate density estimation, non parametric regression
`estimation, categorical data smoothing, and applications of smoothing to
`other areas of statistics. There are strong theoretical connections between
`all these methods, which I have tried to exploit, while still only briefly exam(cid:173)
`ining technical details in places. Density estimation (besides its importance
`
`
`PAGE 8 OF 349
`
`SONOS EXHIBIT 1016
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`Preface
`
`IX
`
`in its own right) provides a simple framework within which smoothing issues
`can be considered, which then builds the necessary structure for regression
`and categorical data smoothing and allows the latter topics to be covered in
`less detail. Even so, the chapter on non parametric regression is the longest
`in the book, reinforcing the central nature of regression modeling in data
`analysis.
`Despite the broad coverage, I have had to omit certain topics because of
`space considerations. Most notably, I do not describe methods for censored
`data, estimation of curves with sharp edges and jumps, the smoothing of
`time series in the frequency domain (smoothed spectral estimation), and
`wavelet estimators. I hope that the material here provides the necessary
`background so that readers can pick up the essence of that material on
`their own.
`This book originated as notes for a doctoral seminar course at New
`York University, and I would like to thank the students in that course,
`David Barg, Hongshik Kim, Koaru Koyano, Nomi Prins, Abe Schwarz,
`Karen Shane, Yongseok Sohn, and Gang Yu, for helping to sharpen my
`thoughts on smoothing methods. I have benefited greatly from many stim(cid:173)
`ulating conversations about smoothing methods through the years with
`Mark Handcock, Cliff Hurvich, Paul Janssen, Chris Jones, Steve Marron,
`David Scott, Berwin Turlach, Frederic Udina, and Matt Wand. Samprit
`Chatterjee, Ali Hadi, Mark Handcock, Cliff Hurvich, Chris Jones, Bernard
`Silverman, Frederic Udina, and Matt Wand graciously read and gave com(cid:173)
`ments on (close to) final drafts of the manuscript. Cliff Frohlich, Chong Gu,
`Clive Loader, Gary Oehlert, David Scott, and Matt Wand shared code and
`data sets with me. Marc Scott provided invaluable assistance in installing
`and debugging computer code for different methods. I sincerely thank all
`these people for their help. Finally, I would like to thank my two editors
`at Springer, Martin Gilchrist and John Kimmel, for shepherding this book
`through the publication process.
`A World Wide Web (WWW) archive at the URL address
`http://www.stern.nyu.edu/SOR/SmoothMeth
`is devoted to this book, and I invite readers to examine it using a WWW
`browser, such as Netscape or Mosaic. I am eager to hear from readers
`about any aspect of the book. I can be reached via electronic mail at the
`Internet address j simonoff@stern.nyu. edu.
`
`East Meadow, N.Y.
`
`Jeffrey S. Simonoff
`
`
`PAGE 9 OF 349
`
`SONOS EXHIBIT 1016
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`Contents
`
`Preface
`
`1.
`
`Introduction
`1.1 Smoothing Methods: a Nonparametric/Parametric
`Compromise
`1.2 Uses of Smoothing Methods
`1.3 Outline of the Chapters
`Background material
`Computational issues
`Exercises
`
`2. Simple Univariate Density Estimation
`2.1 The Histogram
`2.2 The Frequency Polygon
`2.3 Varying the Bin Width
`2.4 The Effectiveness of Simple Density Estimators
`Background material
`Computational issues
`Exercises
`
`Vll
`
`1
`
`1
`8
`10
`11
`11
`12
`
`13
`13
`20
`22
`26
`30
`37
`38
`
`40
`3. Smoother Univariate Density Estimation
`3.1 Kernel Density Estimation
`40
`3.2 Problems with Kernel Density Estimation
`49
`3.3 Adjustments and Improvements to Kernel Density Estimation 53
`3.4 Local Likelihood Estimation
`64
`3.5 Roughness Penalty and Spline-Based Methods
`67
`3.6 Comparison of Univariate Density Estimators
`70
`Background material
`72
`Computational issues
`92
`Exercises
`94
`
`4. Multivariate Density Estimation
`4.1
`Simple Density Estimation Methods
`4.2 Kernel Density Estimation
`
`96
`96
`102
`
`
`PAGE 10 OF 349
`
`SONOS EXHIBIT 1016
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`XII
`
`Contents
`
`4.3 Other Estimators
`4.4 Dimension Reduction and Projection Pursuit
`4.5 The State of Multivariate Density Estimation
`Background material
`Computational issues
`Exercises
`
`5. Nonparametric Regression
`5.1 Scatter Plot Smoothing and Kernel Regression
`5.2 Local Polynomial Regression
`5.3 Bandwidth Selection
`5.4 Locally Varying the Bandwidth
`5.5 Outliers and Autocorrelation
`5.6 Spline Smoothing
`5.1 Multiple Predictors and Additive Models
`5.8 Comparing Nonparametric Regression Methods
`Background material
`Computational issues
`Exercises
`
`6. Smoothing Ordered Categorical Data
`6.1
`Smoothing and Ordered Categorical Data
`6.2 Smoothing Sparse Multinomials
`6.3 Smoothing Sparse Contingency Tables
`6.4 Categorical Data, Regression, and Density Estimation
`Background material
`Computational issues
`Exercises
`
`7. Further Applications of Smoothing
`1.1 Discriminant Analysis
`1.2 Goodness-of-Fit Tests
`1.3 Smoothing-Based Parametric Estimation
`7.4 The Smoothed Bootstrap
`Background material
`Computational issues
`Exercises
`
`Appendices
`A. Descriptions of the Data Sets
`B. More on Computational Issues
`
`References
`Author Index
`Subject Index
`
`111
`117
`121
`123
`131
`132
`
`134
`134
`138
`151
`154
`160
`168
`178
`190
`191
`210
`212
`
`215
`215
`217
`226
`236
`243
`250
`250
`
`252
`252
`258
`261
`266
`268
`273
`273
`
`275
`275
`288
`
`290
`321
`329
`
`
`PAGE 11 OF 349
`
`SONOS EXHIBIT 1016
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`Chapter 1
`Introduction
`
`1.1 Smoothing Methods: a
`Nonparametric/Parametric Compromise
`
`One thing that sets statisticians apart from other scientists is the general
`public's relative ignorance about what the field of statistics actually is.
`People have at least a general idea of what chemistry or biology is -
`but
`what is it exactly that statisticians do?
`One answer to that question is as follows: statistics is the science that
`deals with the collection, summarization, presentation, and interpretation
`of data. Data are the key, of course -
`the stuff from which we gain insights
`and make inferences (or, to paraphrase Sherlock Holmes, the clay from
`which we make our bricks).
`Consider Table 1.1. This data set represents the three-month certificate
`of deposit (CD) rates for 69 Long Island banks and thrifts, as given in
`the August 23, 1989, issue of Newsday. This table presents a valid data
`collection but clearly is quite inadequate for summarizing or interpreting
`the data. Indeed, it is difficult to glean any information past a feeling for
`the range (roughly 7.5% - 8.8%) and a "typical" value (perhaps around
`8.3%).
`
`Table 1.1. Three-month CD rates for Long Island banks and thrifts.
`
`7.56 7.57 7.71 7.82 7.82 7.90 8.00 8.00 8.00 8.00
`8.00 8.00 8.00 8.05 8.05 8.06 8.11 8.17 8.30 8.33
`8.33 8.40 8.50 8.51 8.55 8.57 8.65 8.65 8.71
`7.51 7.75 7.90 8.00 8.00 8.00 8.15 8.20 8.25 8.25
`8.30 8.30 8.33 8.34 8.35 8.35 8.36 8.40 8.40 8.40
`8.40 8.40 8.40 8.45 8.49 8.49 8.49 8.50 8.50 8.50
`8.50 8.50 8.50 8.50 8.50 8.50 8.52 8.70 8.75 8.78
`
`
`PAGE 12 OF 349
`
`SONOS EXHIBIT 1016
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`2
`
`Chapterl. Introduction
`
`>.<X:!
`0
`:t::
`CIl
`C
`
`Q) o
`
`o
`c:i
`
`7.5
`
`8.0
`
`8.5
`
`9.0
`
`CD rate
`
`Fig. 1.1. Fitted Gaussian density estimate for Long Island CD rate data.
`
`The problem is that no assumptions have been made about the un(cid:173)
`derlying process that generated these data (loosely speaking, the analysis
`is purely nonparametric, in the sense that no formal structure is imposed
`on the data). Therefore, no true summary is possible. The classical ap(cid:173)
`proach to this difficulty is to assume a parametric model for the underlying
`process, specifying a particular form for the underlying density. Then, ap(cid:173)
`propriate summary statistics can be calculated, and a fitted density can be
`presented. For example, a data analyst might hypothesize a Gaussian form
`for the density f. Calculation of the sample mean (X = 8.26) and standard
`deviation (8 = .299) then determines a specific estimate, which is given in
`Fig. 1.1. This curve provides a wealth of information about the pattern of
`CD rates, including typical rates, the likelihood of finding certain rates at
`a randomly selected institution, and so on.
`Unfortunately, the strength of parametric modeling is also its weak(cid:173)
`ness. By linking inference to a specific model, great gains in efficiency are
`possible, but only if the assumed model is (at least approximately) true. If
`the assumed model is not the correct one, inferences can be worse than
`useless, leading to grossly misleading interpretations of the data.
`
`
`PAGE 13 OF 349
`
`SONOS EXHIBIT 1016
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`1.1. Smoothing Methods: a Nonparametric/Parametric Compromise
`
`3
`
`~
`C\.I
`
`~
`
`>--'iii ~
`
`c
`Q)
`0
`
`I.{)
`c:i
`
`0
`c:i
`
`I
`
`I
`
`7.6
`
`7.8
`
`8.0
`
`8.2
`CD rate
`
`8.4
`
`8.6
`
`8.8
`
`Fig. 1.2. Histogram for Long Island CD rate data.
`
`Smoothing methods provide a bridge between making no assumptions
`on formal structure (a purely nonparametric approach) and making very
`strong assumptions (a parametric approach). By making the relatively weak
`assumption that whatever the true density of CD rates might be, it is
`a smooth curve, it is possible to let the data tell the analyst what the
`pattern truly is. Figure 1.2 gives a histogram for these data, based on
`equally sized bins (discussion of histograms and their variants is the focus
`of Chapter 2). The picture is very different from the parametric curve of
`Fig. 1.1. The density appears to be bimodal, with a primary mode around
`8.5% and a secondary mode around 8.0% (the possibility that the observed
`bimodality could be due to the specific construction of this histogram should
`be addressed, and such issues are discussed in Chapter 2).
`The form of this histogram could suggest to the data analyst that there
`are two well-defined subgroups in the data. This is, in fact, the case ~ the
`69 savings institutions include 29 commercial banks and 40 thrift (Savings
`and Loan) institutions (the CD rates for the commercial banks correspond
`to the first three rows of Table 1.1, while those for the thrifts appear in
`the last four rows). These subgroups can be acknowledged parametrically
`by fitting separate Gaussian densities for the two groups (with means 8.15
`
`
`PAGE 14 OF 349
`
`SONOS EXHIBIT 1016
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`4
`
`Chapter 1. Introduction
`
`/
`
`/
`
`/
`
`I
`
`I
`
`I
`
`I
`
`I
`
`I
`
`I
`
`I
`
`/
`
`I
`
`/
`
`/
`
`,-
`
`~
`
`-
`
`>0-
`'00
`c:
`Q)
`0
`
`Lq
`0
`
`0
`c:i
`
`\
`
`\
`
`\
`
`\
`
`\
`
`\
`
`\
`
`\
`
`\
`
`,
`
`"-
`
`"-
`
`"-
`
`7.5
`
`8.0
`
`8.5
`
`9.0
`
`CD rate
`
`Fig. 1.3. Fitted Gaussian density estimates for Long Island CD rate data: com(cid:173)
`mercial banks (dashed line) and thrifts (solid line).
`
`and 8.35, respectively, and standard deviations .32 and .25, respectively).
`Figure 1.3 gives the resultant fitted densities (the dashed line refers
`to commercial banks, while the solid line refers to thrifts). It is apparent
`that recognizing the distinction between commercial banks and thrifts helps
`to account for the bimodal structure in the histogram. There are several
`plausible hypotheses to explain this pattern. The Savings and Loan bailout
`scandal was just becoming big news at this time, and it is possible that
`many thrifts felt they had to offer higher rates to attract nervous investors.
`Another possibility is that these institutions were trying to encourage an
`influx of deposits so as to ward off bankruptcy. Still, Fig. 1.3 is less than
`satisfactory, as the modes are not as distinct as they are in Fig. 1.2 (in(cid:173)
`deed, the mixture density that combines these two Gaussian densities is
`unimodal).
`Figure 1.4 provides still more insight into the data process. It gives two
`kernel density estimates for these data, corresponding to the commercial
`banks (dashed line) and the thrifts (solid line). These estimates can be
`thought of as smoothed histograms with very small bins centered at many
`
`
`PAGE 15 OF 349
`
`SONOS EXHIBIT 1016
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`1.1. Smoothing Methods: a Nonparametric/Parametric Compromise
`
`5
`
`\
`
`\
`
`7.5
`
`8.0
`CD rate
`
`8.5
`
`9.0
`
`Fig. 1.4. Kernel density estimates for Long Island CD rate data.
`
`different CD rate values. The underlying structure in the data is now even
`clearer. While the distinction between commercial banks and thrifts is a
`key aspect of the data, there is still more going on. The commercial bank
`CD rates are bimodal, with a primary mode at 8.0% and a secondary mode
`around 8.5%, while the distribution for the thrifts has a pronounced left
`tail and a mode around 8.5%. These modes account for the form of the
`histogram in Fig. 1.2.
`Note also the apparent desirability of "round" numbers for CD rates of
`both types, as bumps or modes are apparent at 7.5%, 8.0%, and 8.5%. The
`construction and properties of density estimators of the type constructed
`in Fig. 1.4 will be discussed in Chapter 3.
`It is clear that any model that did not take the subgroups into account
`would be doomed to failure for this data set. The subgroups do not exactly
`correspond to the observed modes in the original histogram (since the left
`mode comes from both subgroups), which shows that there is not necessar(cid:173)
`ily a one-to-one correspondence between modes and subgroups. Still, the
`observed structure in the histogram was instrumental in recognizing the
`best way to approach the analysis of these data.
`The ability of smoothing methods to identify potentially unexpected
`
`
`PAGE 16 OF 349
`
`SONOS EXHIBIT 1016
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`6
`
`Chapter 1. Introduction
`
`Fig. 1.5. Scatter plot oflog C-peptide versus age with linear least squares (dashed
`line) and Nadaraya-Watson kernel (solid line) estimates superimposed.
`
`structure extends to more complicated data analysis problems as well. The
`scatter plot given in Fig. 1.5 comes from a study of the factors affecting
`patterns of insulin-dependent diabetes mellitus in children. It relates the
`relationship of the logarithm of C-peptide concentration to age for 43 chil(cid:173)
`dren. The superimposed dashed line is the ordinary least squares linear
`regression line, which does not adequately summarize the apparent rela(cid:173)
`tionship in the data. The solid line is a so-called Nadaraya-Watson kernel
`estimate (a smooth, non parametric representation of the regression rela(cid:173)
`tionship), which shows that log C-peptide increases steadily with age to
`about age 7, where it levels off (with perhaps a slight rise around age 14).
`Again, the smoothed estimate highlights structure in a nonparametric fash(cid:173)
`ion. Regression smoothers of this type will be discussed in Chapter 5.
`It might be supposed that the benefits of smoothing occur only when
`analyzing relatively small data sets, but this is not the case. It also can
`happen that a data set can be so large that it overwhelms an otherwise
`useful graphical display.
`Consider, for example, Fig. 1.6. This scatter plot refers to a data set
`examining the geographic pattern of sulfate wet deposition ("acid rain").
`
`
`PAGE 17 OF 349
`
`SONOS EXHIBIT 1016
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`1.1. Smoothing Methods: a Nonparametric/Parametric Compromise
`
`7
`
`CX)
`o
`
`; ...
`
`~
`0
`
`c
`o
`~
`(j)
`....
`....
`o
`0
`o 0
`
`.;t o I
`
`o
`
`4000
`3000
`2000
`1000
`Distance between stations (km)
`
`Fig. 1.6. Scatter plot of correlation of adjusted wet deposition levels versus dis(cid:173)
`tance with lowess curve superimposed.
`
`The plot relates the distance (in kilometers) between measuring stations
`and the correlation of adjusted deposition levels (adjusted for monthly and
`time trend effects) for the 3321 pairings of 82 stations. It is of interest
`to understand and model this relationship, in order to estimate sulfate
`concentration and trend (and provide information on the accuracy of such
`estimates) on a regional level.
`Unfortunately, the sheer volume of points on the plot makes it difficult
`to tell what structure is present in the relationship between the two variables
`past an overall trend. Superimposed on the plot is a lowess curve, a robust
`nonparametric scatter plot smoother discussed further in Chapter 5. It is
`clear from this curve that there is a nonlinear relationship between distance
`and correlation, with the association between deposition patterns dropping
`rapidly until the stations are roughly 2000 km apart, and then leveling off
`at zero.
`
`
`PAGE 18 OF 349
`
`SONOS EXHIBIT 1016
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`8
`
`Chapter 1.
`
`Introduction
`
`1.2 Uses of Smoothing Methods
`
`The three preceding examples illustrate the two important ways that
`smoothing methods can aid in data analysis: by being able to extract more
`information from the data than is possible purely non parametrically, as
`long as the (weak) assumption of smoothness is reasonable; and by being
`able to free oneself from the "parametric straitjacket" of rigid distributional
`assumptions, thereby providing analyses that are both flexible and robust.
`There are many applications of smoothing that use one, or both, of
`these strengths to aid in analysis and inference. An illustrative (but nonex(cid:173)
`haustive) list follows; many of these applications will be discussed further
`in later chapters.
`
`A. Exploratory data analysis
`
`The importance of looking at the data in any exploratory analysis cannot be
`overemphasized. Smoothing methods provide a way of doing that efficiently
`-
`often even the simplest graphical smoothing methods will highlight im(cid:173)
`portant structure clearly.
`
`B. Model building
`
`Related to exploratory data analysis is the concept of model building. It
`should be recognized that choosing the appropriate model as the basis of
`analysis is an iterative process. Box (1980) stated this point quite explicitly:
`"Known facts (data) suggest a tentative model, implicit or explicit, which
`in turn suggests a particular examination and analysis of data and/ or the
`need to acquire further data; analysis may then suggest a modified model
`that may require further practical illumination and so on." Box termed
`this the criticism stage of model building, and smoothing methods can,
`and should, be an integral part of it.
`The earlier example regarding CD rates illustrates this point. A first
`hypothesized model for the rates might be Gaussian, yielding Fig. 1.1. Ex(cid:173)
`amination of a histogram (Fig. 1.2) suggests that this model is inadequate
`and that a model that takes account of any subgroups in the data could be
`more effective. Outside knowledge of the data process then leads to the idea
`of using the type of banking institution as those subgroups (without seeing
`the histogram, the existence of these subgroups might easily be ignored).
`This might suggest a model based on a mixture of two Gaussians, with pos(cid:173)
`sibly different variances, as is presented in Fig. 1.3. The kernel estimates in
`Fig. 1.4 imply that this model is still inadequate, however, and that further
`refinement is necessary. In this way, both the data and outside knowledge
`combine to progressively improve understanding of the underlying random
`process.
`
`
`PAGE 19 OF 349
`
`SONOS EXHIBIT 1016
`IPR of U.S. Pat. No. 8,942,252
`
`
`
`1.2. Uses of Smoothing Methods
`
`9
`
`C. Goodness-of-fit
`
`Smoothed curves can be used to test formally the adequacy of fit of a
`hypothesized model. It is apparent from Figs. 1.2 and 1.4 that a Gaussian
`model is inadequate for the CD rate data; tests based on the difference
`between those curves and the Gaussian curve (Fig. 1.1) can be defined to
`assess this lack of fit formally. Similarly, the difference between the solid
`and dashed lines in Fig. 1.5 defines a test of the goodness-of-fit of a linear
`model to the diabetes data presented there. Tests constructed this way
`can be more powerful than those based on the empirical distribution alone
`and more robust than those based on a specific distributional form for
`the errors in a regression model. Alternatively, smoothed density estimates
`and regression curves can be used to construct confidence intervals and
`regions for true densities and regression functions, with similar avoidance
`of restrictive parametric assumptions.
`
`D. Parametric estimation
`
`Density and regression estimates can be used in parametric inference as well.
`Suppose the mixture of two Gaussians tied to the type of bank institution
`that was graphically presented in Fig. 1.3 was hypothesized for the CD
`rate data (considering the long tails and bimodality apparent in the density
`estimates of Fig. 1.4, this might be a poor choice, however). An alternative
`to the usual maximum likelihood estimates would be to fit the two Gaussian
`densities that are "closest" to the curves in Fig. 1.4, defining closeness
`by some suitable distance metric. Estimators of this type are often fully
`efficient compared to maximum likelihood but are more robust, since a
`density estimate is much less sensitive to an unusual observation (outlier)
`than are maximum likelihood estimates like the sample mean and variance.
`
`E. Modification of standard methodology
`
`Standard methodologies can be modified using smoothed density estimates
`by simply substituting the density estimate for either the empirical or para(cid:173)
`metric density function in the appropriate place. For example, discriminant
`analysis is usually based on assuming a multivariate normal distribution
`for each subgroup in the data, with either common (linear discriminant
`analysis) or different (quadratic discriminant analysis) covariance matrices.
`Then, observations are classified to the most probable group based on the
`normal densities. This procedure can be made nonparametric (and robust
`to violations of normality) by substituting smoothed density estimates for
`the normal density and classifying observations accordingly.
`The bootstrap is another method where improvement via smoothing
`is possible. The ordinary bootstrap is based on repeated sampling from the
`data using the empirical distribution. This can result in bootstrap samples
`
`
`