`
`eeeseafagesBAD USS. Patent No. 9,955,551
`
`=eZ
`
`z,
`
`sD-=7:a
`
`PATTERN
`RECOGNITION
`
`STATISTICAL, STRUCTURAL
`AND NEURAL APPROACHES
`
`ROBERT SCHALKOFF
`
`ma
`
`.
`
`"
`es eee a
`ee ey
`eaeieieisl
`f
`ae
`
`:
`
`
`
`nee
`
`"
`
`|
`
`VWGOoA EX1014
`
`LA
`
`ales
`ut
`Ele
`tog
`
`=a
`
`)=:a2C
`
`OD
`ee
`1°
`ua
`e
`
`AHOMNTVHOS
`
`0001
`
`VWGoA EX1014
`U.S. Patent No. 9,955,551
`
`
`
`ABOUT THE TEXT
`
`This self-contained text is structured in four main
`sections.
`
`@ Part 1 (Chapter 1) introduces the overall subject by
`asking ‘Whatis pattern recognition?’ and provides an
`overview of pattern recognition concerns which are
`commonto all approaches. The three approaches are
`then briefly introduced and compared.
`@ Part 2 (Chapters 2-5) concernsthe statistical approach,
`beginning with a simple example and ending with
`unsupervised learning throughclustering.
`@ Part 3 (Chapters 6-9) develops the syntactic or struc-
`tural approach, beginning with a look at the representa-
`tional capability of string grammars and corresponding
`recognition approaches (parsing), continuing with an
`exploration of higher-dimensional representations and
`graphical approaches, and ending with an introduction
`to supervised learning (grammatical inference).
`@ Part 4 (Chapters 10-13) details the emerging neural
`approach, beginning with an overview of neural com-
`puting basics, an examination of pattern associators
`and feedforward nets and associated training algo-
`rithms, and concluding with unsupervised learning
`implementations.
`
`In each case, following development ofthe theory, exam-
`ples are shownto illustrate the concepts. Each chapter
`contains a concluding section which cites relevant lit-
`erature for more in-depth study of specific topics—
`approximately 250 references are cited.
`
`Diagram of a Neural Net (experimental), 1989.
`Collection, The Museum of Modern Art, New York. Gift of Synaptics,
`Incorporated.
`Photograph © 1992 The Museum of Modern Art, New York.
`
`
`
`0002
`
`0002
`
`
`
`
`
`Pattern Recognition:
`Statistical, Structural
`and Neural Approaches
`
`Robert J. Schalkoff
`
`Clemson University
`
`John Wiley & Sons,Inc.
`New York © Chichester @ Brisbane @ Toronto e Singapore
`
`
`0003
`
`
`
`0003
`
`
`
`Tl
`se
`
`=ame
`
`Steven Elliot
`Acquisitions Editor
`Genevieve Scandone
`Copy Editor
`Linda Muriello
`Production Manager
` Savoula Amanatidis
`Senior Production Supervisor
`Sigmund Malinowski
`Illustration Coordinator
`Lorraine Fumoso
`Manufacturing Manager
`Recognizing the importanceof preserving whathas been written,it is a policy of John Wiley & Sons,Inc. to
`have books of enduring value published in the United States printed on acid-free paper, and we exert our
`best efforts to that end.
`
`Copyright ©1992, by John Wiley & Sons,Inc.
`All rights reserved. Published simultaneously in Canada.
`Reproduction ortranslation of any part of
`this work beyond that permitted by Sections
`107 and 108 of the 1976 United States Copyright
`Act without the permission of the copyright
`owneris unlawful. Requests for permission
`or further information should be addressed to
`the Permissions Department, John Wiley & Sons.
`
`Library of Congress Cataloging in Publication Data:
`Schalkoff, Robert J.
`Pattern recognition : statistical, structural and neural approaches / RobertJ. Schalkoff.
`p-
`cm.
`Includesbibliographical references and index.
`ISBN 0-471-52974-5 (cloth)
`1. Pattern perception-Statistical methods.I. Title.
`Q327.827 1992
`006.4—dce20
`
`91-4751CIP
`
`Printed in the United States of America
`
`10987654321
`
`
`
`0004
`
`0004
`
`
`
`About The Author
`
`
`Robert J. Schalkoff is Professor of Electrical and Computer Engineering at Clemson
`University in Clemson, South Carolina. His primary research interests are in artificial
`intelligence and computer vision with a special emphasis on model-based image un-
`derstanding, motion andstereovision, and associated computerarchitectures. He is
`also the author ofDigital Image Processing and Computer Vision (John Wiley & Sons,
`Inc., 1989) andArtificial Intelligence: An EngineeringApproach (McGraw-Hill, 1990).
`Hereceived the Ph.D. degree in Electrical Engineering from the University of Vir-
`ginia in 1979.
`
`vi
`
`
`
`0005
`
`0005
`
`
`
`
`
`Preface
`
`
`
`
`Theart ofbeing wiseis the art ofknowing whatto overlook.
`The Principles ofPsychology [1890], Chapter 22
`William James 1842-1910
`
`This book brings together an identifiable ‘core’ of pattern recognition (PR) ideas,
`techniques, and applications using Statistical, syntactic and neural approaches.Itis
`intendedfor beginning graduate/advanced undergraduate students as well as practic-
`ing engineers andscientists. It is the outgrowth of a graduate course taught in the
`Electrical and Computer Engineering Department at Clemson University. Like many
`similar medium-sized and ‘mainstream’ departments, we realized the needto intro-
`duce graduate students to the growinginventory of PR topics found in this book, in-
`cludinga structured introduction to neural networks for PR applications. The text is
`suitable for use in a one- or two-semester course and may be supplementedby indi-
`vidual student projects and readings from the current literature. Since there is more
`material than could comfortably be coveredina single semester, individualinstructors
`may choose to emphasize or deemphasize specific topics.
`Aswith mosttechnical disciplines, the conceptof‘pattern recognition’ meansdif-
`ferent thingsto different people. PR researchers are typically fragmentedinto disjoint
`communities, It is my belief that PR is becoming a mature technology with roots in
`diverse areas such as estimation theory, formal languages, and optimization theory. In
`addition, the interfaces between pattern recognition and other technologies and ap-
`plication areas suchasartificial intelligence and image processing, are quite fuzzy. All
`three PR approaches(statistical, syntactic, and neural) are currently receiving consid-
`erable attention and often share overlapping themes. Thechoice of one over another
`is a function of manyvariables, including constraints of the specific application and
`vii
`
`ae
`
`0006
`
`
`
`
`
`viii
`
`Preface
`
`the backgroundofthe PR system designer. Therefore, I have attempted an exposition
`thatillustrates the similarities and differences among the three approaches. Although
`statistical and syntactic pattern recognition are different, they share commonideas.
`An exampleis the use of formal models for pattern generation, whichleads to formal
`solution proceduresfor classification/description. In addition, neural network-based
`approaches share a numberof characteristics with ‘traditional’ pattern recognition
`techniques. They include hierarchical structures, clustering, pattern association, and
`learning ortraining.
`I have also tried to emphasize an ‘engineering’ approach to PR,since I believe
`we should not be teaching PR as esoteric mathematics oras the ‘art’ of developing
`a set of problem-solving heuristics. An understanding of the underlying models and
`techniques andtheir respective limitations is a fundamental aspect of PR system de-
`sign. Like many engineering andscientific disciplines, PR system design forces the
`designer to consider trade-offs between exact solutions to approximate models and
`approximate solutions to exact models. One ofthe caveatsI stress to studentsis “In
`PR problem formulations, you will get what you ask for in the solution, whether you
`recognizeit or not.’ This allows students to appreciate a number of mathematicalre-
`sults that also appealto intuition. Examples are as follows:
`e Whena statistical pattern recognition-based algorithm changes decision re-
`gions to account for a changein a priori probabilities.
`e When a grammatical approachto describing the structure inherent in a pattern
`yields a context-sensitive grammar.
`e Whena feedforward-structure neural net, after training with a well-knownal-
`gorithm, developsan internal layer that allows the attachment of semantics to
`the action of the internal units.
`After completing this book, students should have a good groundingin the fundamen-
`tals of PR, exposure to three diverse PR approaches, and someperspective on the
`applicability of each approach. They shouldalso be able to begin reading the current
`literature.
`Opportunities abound for the application of pattern recognition techniques to
`present-day problems. Current examples are automated character recognition, digital
`image processing and analysis, and recognition and understanding of speech.In fact,
`manypattern recognition applicationsarise in situations where they are not expected.
`Examples of these are weather forecasting and stock market analysis. Individual in-
`structors may wish to emphasize PR application in their respective research domains.
`My ownexperience with computer vision problemsprovided a wealth of examples,
`someof which are used in this text.
`An attempt was madeto keep the presentation balanced. Thetext has four main
`sections. Part 1 (Chapter1) introducesthe overall subject by asking the question What
`is pattern recognition? It provides an overview of pattern recognition concerns that
`are commontoall approaches. Thethree approachesarethenbriefly introduced and
`compared. Part 2 (Chapters 2-5) concernsthestatistical approach, beginning with
`a simple example and ending with unsupervised learning through clustering. Part 3
`
`
`
`0007
`
`0007
`
`
`
`
`
`Preface
`
`ix
`
`(Chapters 6-9) develops the syntactic or structural approach, beginning with a look at
`the representational capability of string grammars and corresponding recognition ap-
`proaches(parsing), continuing with an exploration of higher dimensional representa-
`tions and graphical approaches, and ending with an introduction to supervised learn-
`ing (grammatical inference). Finally, Part 4 (Chapters 10-13) details the emerging
`neural approach, beginning with an overview of neural computing basics, an examina-
`tion of pattern associators and feedforward nets and associated training algorithms,
`and concluding with unsupervised learning implementations. In each case, following
`developmentofthe theory, examples are shown toillustrate the concepts. Each chap-
`ter contains a concludingsection that cites relevant literature for more in-depth study
`of specific topics. Approximately 250 referencesare cited.
`Some backgroundin probability, linear algebra, and discrete mathematicsis nec-
`essary to fully appreciate the topics considered in this book. The appendices docu-
`ment these and other fundamental background concepts and enable the book to be
`self-contained. In addition, numerousexercises are presented that will challenge and
`motivate the readerto further explore relevant concepts. Manyof these exercises pro-
`vide the bases for projects and thesis work. Instructors should contact the publisher
`regarding the availability of a solution manual.
`This book completes a trilogy of books I have written over the last three years.
`The other two are Digital Image Processing and Computer Vision (John Wiley & Sons,
`Inc.) and Artificial Intelligence: An EngineeringApproach (McGraw-Hill). They cover
`a broad area of computing technology that I believe (or hope) is likely to receive
`widespreadattention as it continues to mature during the ’90s.
`
`Acknowledgments
`
`A textbook doesnot just ‘happen,’ but rather requires a significant commitment from
`its author as well as a supporting environment.
`I have been quite fortunate in this
`regard. My wife, Leslie, functioned as ‘the ultimate pattern recognition system’ in
`transcribing pages of scribbling into an attractive TEX document. Students who used
`earlier versions of the notes provided both valuable feedback and enhanced motiva-
`tion, since they proved that the educational objective of this book could be achieved.
`The environmentin the Departmentof Electrical and Computer Engineering and the
`College of Engineering at Clemson University was also conducive to this task. The
`efforts of the professional staff at John Wiley and Sons, especially Steven Elliott and
`Savoula Amanatidis, deserve special thanks. Finally, in watching my daughter Katie
`grow,I realize there are lots of things we’ll never want to automate.
`
`Robert J. Schalkoff
`Clemson, South Carolina
`
`0008
`
`FO
`
`s|PyPre
`
`0008
`
`
`
`
`
`Contents
`
`
`
`
`PART 1
`
`INTRODUCTION AND GENERAL
`PATTERN RECOGNITION CONCERNS
`
`CHAPTER 1 PATTERN RECOGNITION (PR) OVERVIEW
`Whatis This Book About?
`Overview
`‘Engineering’ Approachesto Pattern Recognition
`Relationship of PR to Other Areas
`Pattern Recognition Applications
`Pattern Recognition, Classification, and Description
`Abstract Representation of Pattern Mappings
`Structure of a ‘Typical’ PR System
`Patterns and Feature Extraction with Examples
`Patterns and Features
`Pattern Distortions—A Fundamentaland Difficult Problem
`Example: Feature Extraction Using Generalized Cylinders for
`3-D Object Description and Classification
`Example: Generating RST Invariant Features and Application to
`2-D Figure Recognition
`Feature Extraction
`Numerical Results and Analysis
`Analysis
`What’s Really Different Among Different Patterns?
`The Feature Vector and Feature Space
`Definitions
`
`a
`
`ODAMARwWWWHHNb
`
`10
`11
`
`12
`12
`
`13
`13
`14
`
`xi
`
`0009
`
`0009
`
`
`
`xii
`
`Contents
`
`,
`
`Classifiers, Decision Regions and Boundaries, and Discriminant
`Functions
`Training and Learning in PR Systems
`Using A Priori Knowledge or ‘Experience’
`Learning Curves (Improved ‘Performance’)
`Training Approaches
`Pattern Recognition Approaches
`TheStatistical Pattern Recognition Approach (StatPR)
`The Syntactic Pattern Recognition Approach (SyntPR)
`The Neural Pattern Recognition Approach (NeurPR)
`Comparing and Relating StatPR, SyntPR and NeurPR
`Examples of Pattern Recognition Approaches
`StatPR
`SyntPR
`Example: Combinations of StatPR and SyntPR
`Procedures for PR System Engineering
`Other Approaches to PR
`‘Black Box’ Approaches
`Reasoning-Driven Pattern Recognition
`Overview of PR Literature and Resources
`Books
`Journals
`Conferences
`Exercises
`
`PART 2. STATISTICAL PATTERN RECOGNITION
`(StatPR)
`CHAPTER 2
`INTRODUCTIONTO STATISTICAL
`PATTERN RECOGNITION
`Introduction to Statistical Pattern Recognition
`Approaches to Developing StatPR Classifiers
`Simple ExamplesofStatistical Models and Applications
`The Gaussian Case and Class Dependence
`Gaussian Models for p(z|w;)
`Discriminant Functions
`Generalized Results (Gaussian Case)
`Decision Surfaces for Specific Cases
`A General Lookat the StatPR ApproachStructure
`Additional Examples
`Example: Uniform Densities, c = 2, d= 1
`Example: c = 3, d = 2, Gaussian Case
`Extensions
`
`15
`16
`16
`17
`17
`18
`18
`19
`20
`20
`20
`21
`26
`26
`260
`27
`27
`28
`28
`29
`29
`30
`30
`
`33
`
`34
`34
`34
`35
`39
`39
`39
`40
`40
`43
`43
`
`Training
`
`0010
`
`0010
`
`
`
`
`
`
`
` Contents _xili
`
`Alternative Classification Procedures
`Unsupervised Approaches
`Classifier Performance, Risk, and Errors
`MeasuresofClassification Performance
`General Measuresof Classification Risk
`Bibliographical Remarks
`Exercises
`CHAPTER 3 SUPERVISED LEARNING (TRAINING)
`USING PARAMETRIC AND NONPARAMETRIC
`APPROACHES
`
`Introduction
`Parametric Estimation and Supervised Learning
`Approachesto Parameter Estimation
`Supervised versus Unsupervised Learning
`Maximum Likelihood (ML) Estimation
`Formulation
`Use ofthe Training Set
`The Likelihood Function
`The Bayesian Parameter Estimation Approach
`Supervised Learning Using Nonparametric Approaches
`Nonparametric Approaches
`General Aspects of Nonparametric Density Estimation
`The Basic Approach
`- Parzen Windows
`Unit Step Function ¢(z)
`Extension to More GeneralInterpolation Functions
`k-nn Nonparametric Estimation
`A Problem with Vp
`Estimation of P(w;|x) Directly
`Examples of Nonparametric Learning
`Direct Classification Using the Training Set [The
`Nearest Neighbor Rule (NNR)]
`The NNR Approach
`Bibliographical Remarks
`Exercises
`
`,
`
`CHAPTER 4 LINEAR DISCRIMINANT FUNCTIONS
`AND THE DISCRETE AND BINARY
`FEATURE CASES
`
`‘
`
`Introduction
`Linear Discriminant Functions
`Fisher’s Linear Discriminant
`Discrete and Binary Classification Problems
`
`,
`
`0011
`
`47
`48
`49
`49
`51
`54
`54
`
`58
`
`58
`58
`59
`59
`59
`59
`59
`60
`62
`66
`66
`67
`67
`70
`70
`73
`74
`74
`75
`75
`
`75
`75
`83
`83
`
`89
`89
`89
`90
`94
`
`0011
`
`
`
`141
`
`119
`120
`
`121
`
`127
`
`128
`
`128
`129
`129
`130
`130
`130
`131
`133
`133
`135
`136
`137
`139
`
`xiv
`
`Contents
`
`Classification Procedures for Discrete Feature Data
`Formulation for the Binary Case
`Classification Proceduresfor the Binary Case
`Nonparametric Estimation of P(x|w;)
`Techniques to Directly Obtain Linear Classifiers
`The Concept of Linear Separability
`Design of Linear Classifiers
`Bibliographical Remarks
`Exercises
`CHAPTER 5 UNSUPERVISED LEARNING AND
`CLUSTERING
`Formulation of Unsupervised Learning Problems
`A Modified Training Set
`Unsupervised Learning Approaches
`Clustering for Unsupervised Learning and Classification
`The Clustering Concept and the Search for ‘Natural Clusters’
`The c-means Algorithm
`Learning Vector Quantization (LVQ)
`Formal Characterization of General Clustering Procedures
`Clustering Strategies
`‘Cluster Swapping’ Approaches
`Extended Example: A Hierarchical Clustering Procedure
`Bibliographical Remarks
`Exercises
`PART3 SYNTACTIC PATTERN RECOGNITION
`(SyntPR)
`
`CHAPTER 6 OVERVIEW
`Syntactic Pattern Recognition (SyntPR) Overview
`Quantifying Structure in Pattern Description and Recognition
`Hierarchical Approaches
`Relational Models
`Grammar-Based Approach and Applications
`Using FormalLinguistics to Model and Describe Patterns
`Structural Analysis and Description Applications
`Elements of Formal Grammars
`Definitions and Conventions
`Grammar and Languages Using Strings
`Grammar Application Modes
`Grammar Types and Productions
`Other Production Constraints in String Grammars
`Additional Introductory Concepts
`
`0012
`
`0012
`
`
`
`
`
`Contents
`
`xv
`
`Example: Context-Sensitive, Context-Free, and Finite State
`(Regular) GrammarProductions
`Examples of String Generation as Pattern Description
`Example: 2-D Line Drawing Description Grammar [Shaw 1970]
`Example: Character Description Using PDL
`Example: Object Description Using Projected Cylinder Models
`Extended Example: Blocks World Description
`Remarks on the Heuristic Generation of Grammars
`Bibliographical Remarks
`Exercises
`
`CHAPTER 7 SYNTACTIC RECOGNITIONVIA PARSING
`AND OTHER GRAMMARS
`
`Recognition of Syntactic Descriptions
`Recognition by String Matching
`Recognition by Parsing
`Parsing
`Fundamental Concepts
`An Abstract Viewof the Parsing Problem
`Parsing Approaches
`The Cocke-Younger-Kasami (CYK) Parsing Algorithm
`The Approach
`_ The CYK Table
`(Augmented) Transition Networks (ATNs) in Parsing
`Transition Networks
`Augmented Transition Nets (ATNs)
`Higher Dimensional Grammars
`Introduction
`Tree Grammars
`Traversing and Describing Trees
`Productions and Derivations with Trees
`Stochastic Grammars and Applications
`Stochastic Grammars
`Derivations in a Stochastic Language
`Bibliographical Remarks
`Exercises
`
`CHAPTER 8 GRAPHICAL APPROACHES TO
`SyntPR
`
`Graph-Based Structural Representations
`From Digraphs to Semantic Nets to Relational Graphs
`Application of Relational Graphs to PR
`Graph Isomorphism _
`
`0013
`
`142 .
`144
`144
`146
`146
`146
`149
`150
`150
`
`155
`
`155
`155
`156
`156
`156
`157
`159
`160
`160
`160
`163
`163
`164
`165
`165
`166
`166
`169
`170
`171
`172
`173
`174
`
`176
`
`176
`176
`177
`179
`
`0013
`
`
`
`179
`180
`181
`
`182
`183
`185
`
`185
`185
`186
`187
`187
`188
`
`188
`191
`192
`
`204
`
`194
`
`194
`194
`194
`195
`195
`195
`197
`197
`197
`198
`198
`199
`199
`200
`201
`
`203
`
`204
`
`204
`
`xvi
`
`Contents
`
`Uniqueness of Digraph-BasedStructural Description and
`Isomorphism
`Determining Isomorphism
`Extensions to the Elementary Graph Matching Approach
`A Formal Characterization of the Relational Graph Similarity
`Problem
`Extensions of Relational Graphs (Attributed Graphs)
`A Structured Strategy to Compare Attributed Graphs
`Relaxing Requirements for Pattern Isomorphism and
`Subisomorphism
`Match Graphs (MGs)
`Recursive Procedure to Find Cliques[cliques (X, Y)]
`Other Attributed Graph Distance or Similarity Measures
`Design andSelection of Similarity Measures
`Transforming G; into G;
`Extended Example:Structural Unification Using Attributed
`Graphs
`Bibliographical Remarks
`Exercises
`
`CHAPTER 9 LEARNING VIA GRAMMATICAL
`INFERENCE
`
`Learning Grammars
`Problem Overview
`Difficulties in Structural Learning
`Problem Formulation
`Characterizing the Grammar Source,
`ConcernsRelated to the Training Set
`Grammatical Inference (GI) Approaches
`Grammatical Inference Objectives
`Intuitive Sample GI Procedure
`Procedures to Generate Constrained Grammars
`Using a Single String to Partially Infer Giearn
`Generation of Cannonical Definite Grammars
`General Procedure to Generate CDFSG,(G.)
`Bibliographical Remarks
`Exercises
`
`PART 4 NEURAL PATTERN RECOGNITION(NeurPR)
`
`CHAPTER 10
`
`INTRODUCTION TO NEURAL NETWORKS
`
`Neurons and NeuralNets
`Introduction
`
`0014
`
`0014
`
`
`
`SES
`
`
`
`
`
`Contents _xvii
`
`History
`Neural Networks as a Black Box Approach and ‘Artificial’ Neural
`Systems
`Key Neural Network Concepts
`Characteristics of Neural Computing Applications
`“Connectionist’ Models and Computing
`Neural Network Structures for PR Applications
`Neural Network Structure (How Should I Connect My Neurons?)
`Learning in Neural Networks
`The Neural PR Application Design Phase
`WhyIs Neural Computation So Important?
`Physical Neural Networks
`The Physical Neuron
`Dynamicsof a Biological Neural System
`The Artificial Neural Network Model
`Artificial Neuron Activation and Output Characteristics
`Neural Unit Interconnection Strengths (Weights)
`Other Neural Network Parameters
`Network Implementationsfor Constraint Satisfaction and Matching
`Problems
`Hardware Realizations of Neural Networks
`Bibliographical Remarks
`
`CHAPTER 11
`
`INTRODUCTION TO NEURAL PATTERN
`ASSOCIATORS AND MATRIX
`APPROACHES
`
`Neural Network-Based Pattern Associators
`Introduction
`Design Procedure
`‘Black Box’ Structure
`CAM and Other Neural Memory Structures
`Desirable Pattern Associator (PA) Properties
`Matrix Approaches (Linear Associative Mappings) and Examples
`An Elementary Linear Network Structure and Mathematical
`Representation
`Approach 1: A Linear CAM (‘Hopfield’) Network
`Approach 2: Matched Filters/Adaptive Filters
`Approach 3: Outer Product Formulations
`Approach 4: Generalized Inverse Applications
`Extended Example: Heteroassociative Memory Design
`Hebbian or Correlation-Based Learning
`Bibliographical Remarks
`Exercises
`
`,
`
`205
`
`205
`206
`206
`207
`207
`_ 207
`208
`210
`210
`211
`211
`212
`213
`213
`217
`218
`
`219
`220
`220
`
`221
`
`221
`221
`221
`222 .
`223
`224
`224
`
`224
`226
`227
`228
`229
`231
`232
`233
`233
`
`0015
`
`0015
`
`
`
`xviii
`
`Contents
`
`CHAPTER 12. FEEDFORWARD NETWORKSAND
`TRAINING BY BACKPROPAGATION
`
`Multilayer, Feedforward Network Structure
`Introduction
`Overall Feedforward Structure
`The Role of Internal/Hidden Layers (What’s ‘Hidden’?)
`Training Considerations
`Training the Feedforward Network: The Delta Rule (DR) and
`Generalized Delta Rule (GDR)
`Overview
`Gradient Descent Approachesfor Training
`Training by Sample or Training by Epoch
`Derivation of the DR
`A General Assessmentof Error Sensitivity
`Extension of the DR for Units in the Hidden Layers [The
`Generalized Delta Rule (GDR)]
`Back Propagation—Summary of the Multistep Procedure
`Adding Momentum to the Training Procedure
`Training the Unit Bias Inputs
`Example: ‘Handworked?Training of a Single Unit Using the GDR
`Extended Example: Pattern Associator for Character Classification
`Chosen Features
`Case 1: Single 4-Valued Output, Pseudoinverse Solution
`Case 2: Single 4-Valued Output, 2-Layer Neural Network Solution
`Case 3: 4 Output (‘One-of’) 2-Layer Neural Network Solution
`Case 3: Summary of Results
`Case 4: 4-Output, 3-Layer (Hidden) Neural Network Solution
`Other Ramifications of Hidden Units
`Bibliographical and Historical Remarks
`Exercises
`
`CHAPTER 13. CONTENT ADDRESSABLE MEMORY
`APPROACHES AND UNSUPERVISED
`LEARNINGIN NeurPR
`
`Introduction
`The Hopfield Approach to Neural Computing
`Network Parameters
`Network Dynamics
`Hamming Distance andIts Significance in CAM
`Example: Hamming Distances Between Stored States
`Example: Design of a Simple Hopfield Network—Storing and
`Accessing Stable States
`Why the Hopfield Strategy Should Work
`
`236
`
`236
`236
`236
`238 ©
`239
`
`240
`240
`241
`241
`242
`242
`
`246
`247
`247
`249
`250
`251
`251
`252
`252
`253
`254
`254
`257
`258
`299
`
`264
`
`264
`265
`265
`265
`267
`267
`
`268
`
`0016
`
`0016
`
`
`
`
`
`
`
`Contents
`
`xix
`
`Additional Examples of CAM Applications in PR
`Example: Character Recognition
`Example: Relational Constraint Satisfaction (Coloring)
`Example: Symbolic Constraint Satisfaction (Labeling)
`Unsupervised Learning in NeurPR:Self-Organizing Networks
`Introduction
`Adaptive Resonance Architectures
`Self-Organizing Feature Maps (Kohonen)
`Summary
`Bibliographical Remarks
`Exercises
`
`APPENDIX 1 Linear Algebra Review
`APPENDIX 2 Probability and Random Variables/Vectors
`APPENDIX 3 Discrete Mathematics Review
`APPENDIX 4 Descent Procedures: Examples and Error Function
`Contours
`APPENDIX 5 Similarity Measures, Matching Techniques, and
`Scale-Space Approaches
`APPENDIX 6 Geometry for Classification and State-Space
`Visualization
`
`-
`
`REFERENCES
`
`PERMISSION SOURCE NOTES
`
`INDEX
`
`272
`272°
`273
`279
`275
`275
`276
`. 280
`284
`285
`286
`
`289
`298
`314
`
`323
`
`328
`
`337
`
`344
`
`-
`
`360
`
`361
`
`0017
`
`0017
`
`
`
`
`
`Part
`
`
`
`
`Introduction and General
`Pattern Recognition Concerns
`
`0018
`
`0018
`
`
`
`
`
`Pattern Recognition
`(PR) Overview
`
`
`
`
`Knowledgeis of two kinds. We know a subject ourselves, or we know where
`we can find information uponit.
`
`From James Boswell, Life ofJohnson 1791
`Samuel Johnson 1709-1784
`
`WHATIS THIS BOOK ABOUT
`
`Overview
`Machineintelligence will be a dominant technology in the 1990s. Pattern recognition
`(PR) techniquesare often an important component ofintelligent systemsandare used
`for both data preprocessing and decision making. Broadly speaking, pattern recogni-
`tion (PR)is the science that concerns the description orclassification (recognition)
`of measurements. Thereis little doubt that PR is an important, useful, and rapidly
`developing technology with cross-disciplinary interest and participation. PR is not
`comprised of one approach,but ratheris a broad bodyofoften loosely related knowl-
`edge and techniques. Historically, the two major approachesto pattern recognition
`are thestatistical (or decision theoretic) and the syntactic (or structural) approaches.
`Recently, the emerging technology of neural networks has provided a third approach,
`especially for ‘black box’ implementation of PR algorithms. Since nosingle technol-
`ogy is always the optimum solution for a given PR problem,all three are exploredin
`this text.
`Eachofthe three interrelated pattern recognition approaches:
`« Statistical Pattern Recognition (StatPR)
`e Syntactic Pattern Recognition (SyntPR)
`e Neural Pattern Recognition (NeurPR)
`
`2
`
`0019
`
`0019
`
`
`
`
`
`
`
`Pattern Recognition (PR) Overview 3
`
`is defined in the following sections. Where appropriate, computationalissues related
`to practical or even ‘real-time’ implementation of the corresponding PR systemsis:
`considered. In orderto achieve this breadth of coverage,the scope of some PR topics
`is restricted.
`
`‘Engineering’ Approachesto Pattern Recognition
`An engineering approach to problem solving involves incorporating all available and
`relevant problem informationin a structured fashion to formulate a solution. This in-
`cludes the developmentor modification of models that perhapsincorporate structural
`and a priori information,including‘training’ data.! Therefore, throughout this book,
`our examination of PR approacheswill be guided by these three questions:
`1. Are pattern recognition techniquessuitable, or even applicable to the problem
`at hand? (Can the problem besolved?)
`2. Can we develop or modify useful models for the situation and,if necessary,
`determine the model parameters?
`If so, are there formal tools and heuristics that may be applied, and does a
`computationallypractical solution procedure exist?
`
`3.
`
`Relationship of PR to Other Areas
`
`Pattern recognition techniques overlap with other areas, such as:
`° (Adaptive) signal processing and systems
`e Artificial intelligence
`e Neural modeling
`e Optimization/estimation theory
`e Automata theory
`© Fuzzy sets
`e Structural modeling
`e Formal languages
`
`Pattern Recognition Applications
`PRapplicationsinclude:
`e Image preprocessing, segmentation, and analysis
`e Computervision
`e Seismic analysis
`e Radarsignalclassification/analysis
`‘Pattern recognition (especially statistical) booksare replete with the terms‘a priori’ and ‘a posteriori.’
`Wewill define ‘a priori’ as ‘known before’ (usually‘before’ meaning ‘before feature measurement’) and ‘a
`posteriori’ to mean ‘derived from subsequent measurements.’
`
`0020
`
`0020
`
`
`
`
`
`4
`
`Introduction and General Pattern Recognition Concerns
`
`e Face recognition
`e Speech recognition/understanding
`e Fingerprint identification
`e Character(letter or number) recognition
`e Handwriting analysis (‘notepad’ computers)
` Electrocardiographicsignal analysis/understanding
`e Medical diagnosis
`
`PATTERN RECOGNITION, CLASSIFICATION, AND
`DESCRIPTION
`
`Abstract Representation of Pattern Mappings
`PR maybe characterized as an information reduction, information mapping,ot infor-
`mation labeling process. An abstract view of the PR classification/description problem
`is shown in Figure 1. We postulate a mapping between class-membership space, C,
`and pattern space, P. This mappingis donevia a relation, G;, for each class, and may
`be probabilistic. Each class, w;, generatesa subsetof ‘patterns’ in pattern space, where
`the ith pattern is denoted p;. Note, however, that these subspacesoverlap, to allow
`patterns from differentclassesto shareattributes. Anotherrelation, /, mapspatterns
`from subspacesof P into observations or measured patternsorfeatures, denoted m.
`Using this concept, the characterization of many PR problemsis simply that, given
`measurement m,, we desire a methodto identify and invert mappings M and G;for
`all i. Unfortunately, in practice, these mappingsare not functions. Even if they were,
`they are seldom 1:1, onto or invertible. For example, Figure 1 showsthat identical
`measurements or observations mayresult from different p;, which in turn correspond
`to different underlyingclasses. This suggestsa potential problem with ambiguity. Nev-
`ertheless,it seems reasonable to attempt to model and understandthese processes, in
`the hopethatthis leads to betterclassification/description techniques.
`The abstract formulation of Figure 1 is well suited for modeling in both StatPR
`and SyntPR. Looking ahead, Figure 13 shows how more detailed formalizations of
`these approaches may berelated to Figure 1. We may view the realized patternsin
`Figure 1 as basic ‘world data,’ which is then measured. Thus, another important as-
`pect of Figure 1 concerns M. This mappingreflects our choice ofmeasurementsystem.
`Measurementsystem design is an important aspect of PR system design, in the sense
`that good‘features’ or ‘primitives,’ to be derived subsequently, probably require good,
`or at least adequate, measurements.It is unlikely that erroneous or incomplete mea-
`surementswill facilitate good PR system performance.Finally, note that patterns that
`are generated by the sameclass (p, and p;, from w,, for example) and ‘close’ in pattern
`space, P, do not necessarily yield measurements (m, and mzin this case) that are also
`‘close.’ This is significant when‘clustering’ of measurement(or feature or primitive)
`data is used for measuring pattern similarity.
`
`-
`
`0021
`
`0021
`
`
`
`[
`
`
`
`Pattern Recognition (PR) Overview 5
`
`
`
`Class membership
`or description space
`
`(Realized) pattern
`space
`
`World observation
`or measurement space
`F
`
`representation of pattern genera-
`Figure 1: Mappings in an abstract
`tion/classification/interpretation systems.
`
`Example Useof the Abstract Representation. Consider a two-class problem where w,
`correspondsto theclass ‘apple’ and w2 correspondsto the class ‘hand grenade.’ In
`realized pattern space, there will be some difference among the p; resulting from ap-
`ples, since notall apples are the samesize, shape, weight, color, and so on. A similar
`remark holdstrue forrealizations of class wz. Also, somerealizations of ‘apple’ and
`‘hand grenade’sharesimilar attributes (e.g., mass, volume, weight). Therefore, basing
`our PR system on a measurementconsisting simply of realized pattern weightis likely
`to:
`
`¢ (Mis)-classify some apples and hand grenadesas the same.
`e (Mis)-classify or distinguish between heavy apples andlight apples (and simi-
`larly for we).
`Ironically, at the same time westrive to understand and to ‘undo’ these mappings,
`alternative (so-called black box) approaches,principally implementedby using neural
`networks, are receiving increased attention. These black box approaches attemptto
`eliminate the need for detailed information on both the mappings andinverses and,
`instead, require a goodsetof training data (sample mappings between C' and M) and
`a trainable system.
`
`Structure of a ‘Typical’ PR System
`
`Thestructure of a typical pattern recognition system is shown in Figure 2. Notice that
`it consists of a sensor (an image sensor or camera, for example), a feature extraction
`mechanism (algorithm), and a classification or description algorithm (depending