throbber
Page 1 of 224
`
`FORD 1009
`
`

`
`Introduction to
`Kno
`led
`
`Mark Stefik
`
`Morgan Kaufmann Publishers, Inc.
`San Francisco, California
`
`Page 2 of 224
`
`FORD 1009
`
`

`
`Sponsoring Editor Michael B. Morgan
`Production Manager Yonie Overton
`Production Editor Elisabeth Beller
`Editorial Coordinator Marilyn Uffner Alan
`Text Design, Project Management,
`Electronic Illustrations and Composition Professional Book Center
`Cover Design Carron Design
`Copyeditor Anna Huff
`Printer Quebecor Fairfield
`
`Morgan Kaufmann Publishers, Inc.
`Editorial and Sales Office
`340 Pine Street, Sixth Floor
`San Francisco, CA 94104-3205 USA
`Telephone 415/392-2665
`Facsimile 415/982-2665
`Internet mkp@mkp.com
`
`Library of Congress Cataloging-in-Publication Data is available for this book.
`) t::~
`I.,,,.
`
`ISBN 1-55860-166-X
`
`r-""i l'\
`
`0(!-+
`
`© 1995 by Morgan Kaufmann Publishers, Inc.
`
`All rights reserved
`
`Printed in the United States of America
`
`99 98 97 96 95
`
`5 4 3 2 1
`
`~~""'l
`,~:.--~-~1
`.<-''"'
`{
`No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or
`by any means---electronic, mechanical, photocopying, recording, or otherwise-without the prior written
`permission of the publisher.
`
`Brand and product names referenced in this book are trademarks or registered trademarks of their respec(cid:173)
`tive holders and are used here for informational purposes only.
`
`Page 3 of 224
`
`FORD 1009
`
`

`
`on tents
`
`Foreword Edward A. Feigenbaum
`
`xiii
`
`Preface
`
`xv
`
`Notes on the Exercises
`
`xix
`
`INTRODUCTION AND OVERVIEW
`
`PART I FOUNDATIONS
`
`CHAPTER 1 Symbol Systems
`
`22
`
`1.1 Symbols and Symbol Structures
`1.1.1 What Is a Symbol?
`22
`1.1.2 Designation
`25
`28
`1.1.3 Causal Coupling
`1.1.4 Cognitive and Document Perspectives of Symbols
`1.1.5 Summary and Review
`32
`Exercises for Section 1.1
`32
`
`35
`1.2 Semantics: The Meanings of Symbols
`36
`1.2.1 Model Theory and Proof Theory
`1.2.2 Reductionist Approaches for Composing Meanings
`1.2.3 Terminology for Graphs and Trees
`44
`1.2.4 Graphs as Symbol Structures
`47
`1.2.5 The Annotation Principle and Metalevel Notations
`1.2.6 Different Kinds of Semantics
`55
`1.2.7 Summary and Review
`59
`Exercises for Section 1.2
`60
`
`30
`
`41
`
`50
`
`1
`
`19
`
`21
`
`v
`
`Page 4 of 224
`
`FORD 1009
`
`

`
`vi
`
`CONTENTS
`
`68
`
`71
`77
`80
`84
`
`1.3 Modeling: Dimensions of Representation
`1.3.1 Fidelity and Precision
`69
`1.3.2 Abstractions and Implementations
`1.3.3 Primitive and Derived Propositions
`1.3.4 Explicit and Implict Representations
`1.3.5 Representation and Canonical Form
`1.3.6 Using Multiple Representations
`85
`1.3.7 Representation and Parallel Processing
`1.3.8 Space and Time Complexity
`89
`1.3.9 Structural Complexity
`98
`1.3.10 Summary and Review
`101
`Exercises for Section 1.3
`102
`
`88
`
`107
`
`1.4 Programs: Patterns, Simplicity, and Expressiveness
`1.4.1 Using Rules to Manipulate Symbols
`107
`1.4.2 Treating Programs as Data
`110
`1.4.3 Manipulating Expressions for Different Purposes
`1.4.4 Pattern Matching
`114
`1.4.5 Expressiveness, Defaults, and Epistemological Primitives
`1.4.6 The Symbol Level and the Knowledge Level
`129
`1.4.7 Summary and Review
`130
`Exercises for Section 1.4
`131
`
`112
`
`1.5 Quandaries and Open Issues
`
`136
`
`CHAPTER 2 Search and Problem Solving
`
`147
`2.1 Concepts of Search
`2.1.1 Solution Spaces and Search Spaces
`2.1.2 Terminology about Search Criteria
`2.1.3 Representing Search Spaces as Trees
`2.1.4 Preview of Search Methods
`157
`2.1.5 Summary and Review
`158
`Exercises for Section 2.1
`159
`
`148
`153
`156
`
`165
`2.2 Blind Search
`165
`2.2.1 Depth-First and Breadth-First Search
`2.2.2 Top-Down and Bottom-Up Search: A Note on Terminology
`2.2.3 Simple and Hierarchical Generate-and-Test
`173
`2.2.4 A Sample Knowledge System Using Hierarchical
`Generate-and-Test
`180
`2.2.5 Simple and Backtracking Constraint Satisfaction
`2.2.6 Summary and Review
`193
`Exercises for Section 2.2
`194
`
`187
`
`117
`
`146
`
`171
`
`Page 5 of 224
`
`FORD 1009
`
`

`
`CONTENTS
`
`vii
`
`203
`2.3 Directed Search
`205
`2.3.1 Simple Match
`2.3.2 Means-Ends Analysis
`210
`2.3.3 Hierarchical Match and Skeletal Planning
`225
`2.3.4 Hill Climbing and Best-first Search
`2.3.5 Shortest-Path Methods
`232
`2.3.6 A* and Related Methods
`239
`248
`2.3.7 Summary and Review
`251
`Exercises for Section 2.3
`
`219
`
`259
`2.4 Hierarchical Search
`264
`2.4.1 Two-Level Planning
`2.4.2 Planning with Multiple Abstraction Levels
`2.4.3 Planning with Imperfect Abstractions
`275
`279
`2.4.4 Summary and Review
`280
`Exercises for Section 2.4
`
`271
`
`2.5 Quandaries and Open Issues
`
`287
`
`.,
`
`CHAPTER 3 Knowledge and Software Engineering
`
`3.1 Understanding Knowledge Systems in Context
`292
`3.1.1 The Terminology of Knowledge Systems and Expertise
`3.1.2 Knowledge Systems and Document Systems:
`Five Scenarios
`299
`3.1.3 Preview of Knowledge Acquisition Topics
`312
`3.1.4 Summary .and Review
`Exercises for Section 3.1
`312
`
`311
`
`291
`
`292
`
`314
`3.2 Formulating Expertise
`3.2.1 Conducting Initial Interviews
`3.2.2 Taking Protocols
`320
`324
`3.2.3 Characterizing Search Spaces
`3.2.4 Adapting Protocol Analysis for Knowledge Systems
`3.2.5 Summary and Review
`330
`Exercises for Section 3.2
`330
`
`314
`
`336
`3.3 Collaboratively Articulating Work Practices
`3.3.1 Variations in Processes for Interview and Analysis
`3.3.2 Documenting Expertise
`345
`3.3.3 Engineering Software and Organizations
`358
`3.3.4 Summary anq Review
`Exercises for Section 3.3
`359
`
`349
`
`327
`
`336
`
`Page 6 of 224
`
`FORD 1009
`
`

`
`viii
`
`CONTENTS
`
`365
`3.4 Knowledge versus Complexity
`3.4.1 MYCIN: Study of a Classic Knowledge System
`365
`3.4.2 The Knowledge Hypothesis and the Qualification Problem
`3.4.3 Summary and Review
`389
`Exercises for Section 3.4
`389
`
`3.5 Open Issues and Quandaries
`
`394
`
`PART II THE SYMBOl lEVEl
`
`CHAPTER 4 Reasoning about Time
`
`380
`
`403
`
`405
`
`407
`4.1 Temporal Concepts
`407
`4.1.1 Timeline Representations
`4.1.2 A Discrete Model of Transactions
`in the Balance of an Account
`408
`4.2 Continuous versus Discrete Temporal Models
`4.3 Temporal Uncertainty and Constraint Reasoning
`4.3.1 Partial Knowledge of Event Times
`412
`4.3.2 Arc Consistency and Endpoint Constraints
`4.3.3 Time Maps and Scheduling Problems
`416
`4.3.4 The Interface between a Scheduler and
`a Temporal Database
`416
`4.4 Branching Time
`421
`4.5 Summary and Review
`Exercises for Chapter 4
`
`423
`424
`
`410
`411
`
`414
`
`4.6 Open Issues and Quandaries
`
`427
`
`CHAPTER 5 Reasoning about Space
`
`432
`
`433
`5.1 Spatial Concepts
`5.2 Spatial Search
`435
`436
`5.2.1 Simple Nearest-First Search
`5.2.2 Problems with Uniform-Size Regions
`438
`5.2.3 Quadtree Nearest-First Search
`5.2.4 Multi-Level Space Representations
`5.3 Reasoning about Shape
`442
`5.4 The Piano Example: Using Multiple Representations of Space
`5.4.1 Reasoning for the Piano Movers
`444
`5.4.2 Rendering a Piano
`448
`5.4.3 The Action of a Piano
`450
`
`440
`
`437
`
`444
`
`Page 7 of 224
`
`FORD 1009
`
`

`
`ix
`
`460
`
`541
`
`543
`
`CONTENTS
`
`5.5 Summary and Review
`Exercises for Chapter 5
`
`452
`453
`
`5.6 Open Issues and Quandries
`
`458
`
`CHAPTER 6 Reasoning about Uncertainty and Vagueness
`
`461
`6.1 Representing Uncertainty
`6.1.1 Concepts about Uncertainty
`6.1.2 The Certainty-Factor Approach
`6.1.3 The Dempster-Shafer Approach
`6.1.4 Probability Networks
`483
`6.1.5 Summary and Conclusions
`Exercises for Section 6.1
`506
`
`461
`469
`476
`
`504
`
`517
`6.2 Representing Vagueness
`6.2.1 Basic Concepts of Fuzzy Sets
`6.2.2 Fuzzy Reasoning
`524
`6.2.3 Summary and Conclusions
`Exercises for Section 6.2
`532
`
`518
`
`531
`
`6.3 Open Issues and Quandries
`
`533
`
`PART Ill THE KNOWLEDGE LEVEl.
`
`CHAPTER 7 Classification
`
`543
`7.1 Introduction
`7.1.1 Regularities and Cognitive Economies
`54 7
`7.2 Models for Classification Domains
`7.2.1 A Computational Model of Classification
`7 .2.2 Model Variations and Phenomena
`549
`7 .2.3 Pragmatics in Classification Systems
`554
`7 .2.4 Summary and Review
`556
`Exercises for Section 7.2
`557
`
`543
`
`547
`
`7.3 Case Studies of Classification Systems
`7.3.1 Classification in MYCIN
`563
`7.3.2 Classification in MORE
`567
`7 .3.3 Classification in MOLE
`572
`7.3.4 Classification in MDX
`580
`7.3.5 Classification in PROSPECTOR
`7.3.6 Summary and Review
`586
`Exercises for Section 7.3
`586
`
`563
`
`582
`
`Page 8 of 224
`
`FORD 1009
`
`

`
`CONTENTS
`
`588
`7.4 Knowledge and Methods for Classification
`7.4.1 Knowledge-Level and Symbol-Level Analysis
`of Classification Domains
`589
`7.4.2 MC-1: A Strawman Generate-and-Test Method
`7.4.3 MC-2: Driving from Data to Plausible Candidates
`7.4.4 MC-3: Solution-Driven Hierarchical Classification
`7.4.5 MC-4: Data-Driven Hierarchical Classification
`7.3.6 Method Variations for Classification
`599
`7.4.7 Summary and Review
`602
`Exercises for Section 7.4
`603
`
`592
`593
`594
`596
`
`7.5 Open Issues and Quandaries
`
`604
`
`CHAPTER 8 Configuration
`
`608
`8.1 Introduction
`8.1.1 Configuration Models and Configuration Tasks
`8.1.2 Defining Configuration
`610
`612
`8.2 Models for Configuration Domains
`8.2.1 Computational Models of Configuration
`8.2.2 Phenomena in Configuration Problems
`8.2.3 Summary and Review
`620
`Exercises for Section 8.2
`621
`
`612
`615
`
`608
`
`609
`
`625
`
`8.3 Case Studies of Configuration Systems
`8.3.1 Configuration in XCON
`625
`8.3.2 Configuration in M1/MICON
`633
`8.3.3 Configuration in MYCIN's Therapy Task
`8.3.4 Configuration in VT
`640
`8.3.5 Configuration in COSSACK
`8.3.6 Summary and Review
`650
`Exercises for Section 8.3
`652
`
`647
`
`637
`
`8.4 Methods for Configuration Problems
`656
`8.4.1 Knowledge-Level and Symbol-Level Analysis
`of Configuration Domains
`656
`8.4.2 MCF-1: Expand and Arrange
`661
`8.4.3 MCF-2: Staged Subtasks with Look-Ahead
`8.4.4 MCF-3: Propose-and-Revise
`665
`8.4.5 Summary and Review
`665
`Exercises for Section 8.4
`666
`
`662
`
`8.5 Open Issues and Quandaries
`
`667
`
`Page 9 of 224
`
`FORD 1009
`
`

`
`CONTENTS
`
`CHAPTER 9 Diagnosis and Troubleshooting
`
`xi
`
`670
`
`670
`9.1 Introduction
`9 .1.1 Diagnosis and Troubleshooting Scenarios
`9 .1.2 Dimensions of Variation in Diagnostic Tasks
`9.2 Models for Diagnosis Domains
`673
`9.2.1 Recognizing Abnormalities and Conflicts
`9.2.2 Generating and Testing Hypotheses
`680
`9.2.3 Discriminating among Hypotheses
`690
`9 .2.4 Summary and Review
`700
`Exercises for Section 9.2
`701
`
`670
`671
`
`677
`
`711
`
`9.3 Case Studies of Diagnosis and Troubleshooting Systems
`9.3.1 Diagnosis in DARN
`712
`715
`9.3.2 Diagnosis in INTERNIST
`9.3.3 Diagnosis in CASNET/GLAUCOMA
`9.3.4 Diagnosis in SOPHIE III
`729
`9.3.5 Diagnosis in GDE
`737
`9.3.6 Diagnosis in SHERLOCK
`9.3.7 Diagnosis in XDE
`748
`9.3.8 Summary and Review
`759
`Exercises for Section 9.3
`761
`
`744
`
`724
`
`9.4 Knowledge and Methods for Diagnosis
`9.4.1 Plan Models for Diagnosis
`765
`766
`9.4.2 Classification Models for Diagnosis
`9.4.3 Causal and Behavioral Models for Systems
`9.4.4 Summary and Review
`768
`Exercises for Section 9.4
`769
`
`764
`
`767
`
`9.5 Open Issues and Quandaries
`
`771
`
`APPENDIX A Annotated Bibliographies by Chapter
`
`APPENDIX B Selected Answers to Exercises
`
`Index
`
`853
`
`776
`
`811
`
`Page 10 of 224
`
`FORD 1009
`
`

`
`•
`lntro uction an
`verv1ew
`The Building of a Knowledge System
`tQ Identify Wild Plants
`
`There is always a tension between top-down and bottom-up presentations. A top-down presenta(cid:173)
`tion starts with goals and then establishes a framework for pursuing the parts in depth. Bottom(cid:173)
`up presentations start with fundamental and primitive concepts and then build to higher-level
`ones. Top-down presentations can be motivating but they risk lack of rigor; bottom-up presenta(cid:173)
`tions can be principled but they risk losing sight of goals and direction.
`Most of this book is organized bottom-up. This reflects my desire for clarity in a field that
`is entering its adolescence, metaphorically if not chronologically. The topics are arranged so a
`reader starting at the beginning is prepared for concepts along the way. Occasionally I break out
`of the bottom-up rhythm and step-at-a-time development to survey where we are, where we have
`been, and where we are going. This introduction serves that purpose.
`The following overview traces the steps of building a hypothetical knowledge system.
`Woven into the story are some notes that connect it with sections in this book that develop the
`concepts further. Many of the questions and issues of knowledge engineering that are mysterious
`in a bottom-up presentation seem quite natural when they are encountered in the context of
`building a knowledge system. In particular, it becomes easy to see why they arise.
`I made up the following story, so it does not require a disclaimer saying that the names
`have been changed to protect the innocent. Nonetheless, the phenomena in the story are familiar
`to anyone who has developed a knowledge system. Imagine that we work for a small software
`company that builds popular software packages including knowledge systems. This is a story of a
`knowledge system: how it was conceived, built, introduced, used, and later extended.
`
`To Build a Knowledge System
`It all began when we were approached by an entrepreneur who enjoys hiking and camping in the
`hills, mountains, and deserts of California. Always looking for a new market opportunity, he
`
`1
`
`Page 11 of 224
`
`FORD 1009
`
`

`
`2
`
`INTRODUCTION AND OVERVIEW
`
`noticed that campers and hikers like to identify wild plants but that they are not very good at it.
`Identifying wild plants can be useful for survival in the woods ("What can I eat?") and it also has
`recreational value. He was convinced that conservationists, environmentalists, and well-heeled
`hikers have a common need.
`The entrepreneur proposed that we build a portable knowledge system for identifying
`wildlife. He had consulted a California hiking club and a professional naturalist. He suggested
`that we begin by constructing a hypertext database about different kinds of plants, describing
`their appearance, habitats, relations to other wildlife, and human uses. Our initial project team is
`as follows:
`
`A hike representing the user community and customer.
`A naturalist, our domain expert on wildlife identification.
`A knowledge engineer, our expert in acquiring knowledge and knowledge representation.
`A software engineer, the team leader having overall responsibility for the development of
`software.
`
`After some discussion within the company we agreed to develop a prototype version of the
`knowledge system using the latest palmsize or "backpack" computers. If the technical project
`seemed feasible, we would then consider the next steps of commercialization. We planned to use
`the process of building the prototype to help us determine the feasibility of a larger project. We
`recruited the naturalist and a prominent member of one of the hiking clubs to our project team.
`We called the group together and started to learn about each other's ideas and terminology.
`
`Notes
`.The participants are just getting started. They need to size up the task, develop
`their goals, and determine their respective roles oh the project. They need to .consider
`many questions about the nature of the knowledge system they would build. They ask
`"Who wants it?'' because the situation and people matter for shaping the system. They
`also ask "What do these people do?" and "What role· should the system play?" because
`these issues arise in all software engineering projects,
`
`Connections See Chapter 3 for a discussion of the initial interview concepts :and
`background on software engineering.
`
`Our Initial Interviews
`Our naturalist tells us he wants to focus on native trees of California. We begin with the famous
`California redwood trees, the Sequoia sempervirens or coastal redwood and the Sequoiadendron
`giganteum or giant redwood that thrives in the Sierras. Our naturalist is a stickler for complete(cid:173)
`ness. He also adds the Metasequoia glyptostroboides or dawn redwood, which grew in most parts
`of North America. The dawn redwood was thought to have become extinct until a grove was dis(cid:173)
`covered in China in 1944. At a blackboard he draws a chart of plant families as shown in Figure
`I. I. He tells us about the history of the plant kingdom:
`
`Page 12 of 224
`
`FORD 1009
`
`

`
`~
`
`DIVISION
`
`ORDER
`
`FAMILY
`
`Gymnosperms
`
`Angiosperms
`
`I
`
`Taxale I Yew
`
`I
`
`Coniferale
`
`r
`
`Taxodiaceae
`(swamp cypress)
`
`Pinus
`(pine)
`
`Abies
`(fir)
`
`Pice a
`(spruce)
`
`Dicotyledon
`
`I
`
`I
`
`1···l
`
`Oak Maple
`
`Elm
`
`GENUS
`
`Taxodium
`
`Metasequoia
`
`· Sequoia
`
`Sequoiadendron
`
`Cunninghamia
`
`Sciadopitys
`
`Others
`
`SPECIES
`
`Taxodium
`distichum
`(swamp
`bald cypress)
`
`Metasequoia
`glyptostroboides
`(dawn
`redwood)
`
`Sequoia
`sempervirens
`(coastal
`redwood)
`
`Sequoiadendron
`giganteum
`(giant
`redwood)
`
`Cunninghamia
`lane eo lata
`(Chinese
`fir)
`
`Sciadopitys
`vertic illata
`(Japanese
`umbrella pine)
`
`FIGURE 1.1. A partial taxonomy showing relations among plants closely related to California redwood trees. In our scenario and
`thought experiment, the naturalist was asked about redwoods and started lecturing about plant families.
`
`w
`
`Page 13 of 224
`
`FORD 1009
`
`

`
`4
`
`INTRODUCTION AND OVERVIEW
`
`Plants evolved on Earth from earlier one-celled animals. About 200 million years
`ago was the age of conifers, the cone-bearing trees. Redwoods are members of the
`conifers, which were the dominant plant species at that time. They are among the
`gymnosperms, plants that release their seeds without a protective coating or shell.
`
`Our naturalist is a gifted teacher but he tends to slip into what we have started to call his
`"lecture mode." After an hour of exploring taxonomies of the plant kingdom we begin to get rest(cid:173)
`less. One of the team members interrupts him to ask the proper location in the taxonomy for the
`"albino redwoods," which are often visited in Muir Woods. This question jars the naturalist.
`Albinos do not fit into the plant taxonomy because they are not a true species, but rather a
`mutated parasite from otherwise normal coastal redwoods. Redwoods propagate by both seeds
`and roots. Sometimes something goes wrong in the root propagation, resulting in a tree that lacks
`the capability to make chlorophyll. Such rare plants would normally die, except a few that con(cid:173)
`tinue to live parasitically off the parent. Albino trees have extra pores on their leaves that rnalce
`them efficient for moving quantities of water and nutrients from their host and parent trees.
`At this point another member of the group, a horne gardener, wants to know about trees she
`had purchased at a local nursery called Sequoia sempervirens soquel and Sequoia sempervirens
`aptos blue. Again, the naturalist explains that these trees are not really species either. Rather,
`they are clones of registered individual coastal redwoods, propagated by cuttings and popular
`with nurseries because they grow to be predictable "twins" to the parent tree, having the same
`shape and color. These registered clones are sometimes called cultivars. They would be impossi(cid:173)
`ble to identify reliably by visual examination alone and they are not found in the wild.
`This leads us to a discussion of exactly what a taxonomic chart means, what a species is,
`what the chart is useful for, and whether it is really a good starting place for plant identification.
`Clearly the chart does not contain all the information we need about plants because some plants
`of apparent interest do not appear in it. We also have learned that there are some plants about
`which there is debate as to their lineage. After discussion we decide that the information is inter(cid:173)
`esting and that it would be a good base for establishing names of plants, but that it would not be
`appropriate for us to proceed by just filling out more and more of the taxonomic chart. We decide
`to focus on actual cases of plant identification at our next session.
`
`Notes
`In this part of the story the participants are beginning to build bridges into each
`other's areas to understand how they will work together. They bring to the discussion
`some preexisting symbol structures or representations, such as the plant taxonomic chart.
`Often there needs to be discussion about just what the symbols mean and whether those
`meanings are useful for the task at hand. As in this story, it sometimes turns out that these
`symbols and representations need to be modified. When the end product includes a
`knowledge system, then conventions about symbol structures must be made precise
`enough for clear communication and also expressive enough for the distinctions made in
`performing the task
`
`Connections See Chapter 1 for an introduction to symbols and symbol structures and
`the assignment of meaning to them. See Chapter 3 for a discussion of tools and methods
`for incremental formalization of knowledge.
`
`Page 14 of 224
`
`FORD 1009
`
`

`
`To Build a Knowledge System
`
`5
`
`1. The specimen is tall,
`2. I'd guess about 30 or 35 feet tall.
`3. So it's a tree ...
`4. symmetrical in shape.
`5. From the needles in the foliage, it's obviously a pine,
`6. but not one of the coastal pines since we're at too high an elevation in these mountains.
`7. Could be either a Pinus ponderosa, ajeffreyi, or a torreyana.
`8. Let's see (walking in closer) ... dark green needles, not yellow-green,
`9. about7inches1ong,and
`10. in clusters of three.
`11. Rather grayish bark, not cinnamon-brown.
`12. Medium-sized cones.
`13. Seems to be a young tree. Others like it are near, reaching heights of over a hundred feet tall.
`14. It's probably a Pinus jeffreyi, that is, a Jeffrey pine.
`
`FIG~RE 1 . .2. Transcription of our naturalist talking through the identification of one of the plants.
`
`The Naturalist in the Woods
`We prepare to study the naturalist's classification process on some sample cases. One member of
`the group sets up portable video and audio recorders at a local state park. Our hiking club mem(cid:173)
`ber is our prototype user. We define his job as walking into the woods and selecting a plant to be
`identified. In this way we hope to gain insight into what plants he finds interesting and to test the
`relevance of the plant taxonomy. We ask the naturalist to "think out loud" as he identifies·plants.
`Recording such a session is called taking a protocol. This results in verbal data where the natural(cid:173)
`ist talks about the bark coloration, surface roots, and leaf shapes. After these dialogs are recorded
`and pktures of the plants are taken, we transcribe all of the tapes. Figure 1.2 shows a sample tran(cid:173)
`script.
`Mter the session in the park, we go over the transcripts carefully with the naturalist, trying
`to reconstruct any intermediate aspects of his thought process that were not verbalized. We ask
`him a variety of questions. "What else did you consider here? How did you know that it was not
`a manzanita? Why couldn't it have been a fir tree or a digger pine? Why did you ask about the
`coloring of the needles?" Our goal is not so much to capture exactly what his reasoning was in
`every case, but rather to develop a set of case examples that we could use as benchmarks for test(cid:173)
`ing our computer system. As it turns out, the naturalist does different things on different cases.
`He does not always start out with exactly the same set of questions, so his method is not one of
`just working through a fixed decision tree or discrimination network.
`
`Notes At this point the group has begun a process of collecting knowledge about the
`task in terms. of examples of problem-solving behavior. As we will see below, it is possi(cid:173)
`ble to make some false starts in this, and it is also possible to recover from such false
`starts.
`
`Connections See Chapter 3 for discussion of the assumptions and methods of the
`"transfer of expertise approach.
`
`Page 15 of 224
`
`FORD 1009
`
`

`
`6
`
`INTRODUCTION AND OVERVIEW
`
`Characterizing the Task
`The knowledge engineer begins a tentative analysis of the protocols. He tells us he might need to
`analyze these sessions several different ways before we are done. He wants to characterize the
`actions of the naturalist in terms of problem-solving steps. His approach is to model the problem(cid:173)
`solving task as a search problem, in which the naturalist's steps carry out different operations in
`the search. Figure 1.3 shows his first tentative analysis of the session from Figure 1.2. In this, he
`
`Collect Initial Data
`Determine height of plant: Plant is more than 30-feet tall. (1)
`Shape is symmetrical. (4)
`Foliage has needles. (5)
`
`Determine General Classification
`Infer: Plant is a tree. (3) Plant is a pine tree or a close relative.
`Knowledge: Only pines and close relatives have needle-shaped leaves. (5)
`
`Collect Data about Location
`Mountrun location.
`Knowledge: Trees from the low areas and the coast do not grow in the mountains. (6)
`Rule out candidates that do not grow in this region.
`
`Form Specific Candidate Hypotheses
`Mountain pine trees include the Pinus ponderosa, the jeffreyi, and the torreyana .. (7)
`
`Determine Data to Discriminate among Hypotheses
`Knowledge: The hypotheses make different predictions about needle color and bark color.
`Species:
`ponderosa
`jeffreyi
`torreyana
`Bark color:
`cinnamon-brown
`grey
`brown
`Needle color:
`yellow-green
`dark green
`du11 green.
`Needle clusters:
`three
`three
`five
`Needlelength:
`8"
`7"
`10"
`
`CollectDiscriminating Data
`Needles are dark green. (8)
`Needle$ are 7 inches long. (9)
`Needles are clustered in threes. (10)
`Barkis grayish. (11) ·
`Cones are medium-size. (12)
`
`Consider Reliability of Data
`There are other trees in the area of the same character. (13)
`Mature he!ghtofother:s is more than lOOfeet. (13)
`Infer that the specimen is representative but not yet full grown. (13)
`
`Dete'rm.ine Whether Unique Solution Is Found
`Only a Pinus jeffreyi fits th.e data. ( 14)
`
`Retrieve Common Name
`Knowledge: APinusjeffreyi is commonly called a Jeffrey pine.
`
`FIGURE 1.3. Preliminary analysis of the protocol from the transcript in Figure 1.2. The numbers in paren(cid:173)
`theses refer to the corresponding steps in Figure 1.2.
`
`Page 16 of 224
`
`FORD 1009
`
`

`
`To Build a Knowledge System
`
`7
`
`Data. Space
`
`Solution Space
`
`Examples:
`Tall tree
`Desert region
`Spring bloomer ...
`
`Abstracted
`data
`
`Heuristic
`match
`
`Examples:
`Pinus family
`Vine in bush ...
`
`t
`
`/II..
`Data
`' I ' abstraction
`
`t G
`
`Solution
`, 1 ,
`refinement 'f
`
`t
`
`Examples:
`Arbutus menzies IT
`(Madrone) ...
`
`Examples:
`Tree is 30 feet tall
`Rainfall is 4 inches
`per year
`Berries are red
`
`FIGURE 1.4. The search spaces for classification. This method reasons about data, which may be
`abstracted into general features. The data are associated heuristically with abstracted solutions and ulti(cid:173)
`mately specific solutions.
`
`characterized operations such as "determining the general classification," "collecting data,"
`"forming specific candidate hypotheses," and so on. These operations constitute a sketch of a
`comp~tation model for the plant identification, which searches through a catalog of possible
`answers.
`This tentative analysis of the protocol is consistent with a computational model that the
`knowledge engineer calls "classification." Someone in the group objects, arguing that the natu(cid:173)
`ralist was not "classifying." Instead, he was merely "identifying" plants because the classes of
`possible plants were predetermined. The knowledge engineer agrees but explains that this is
`exactly what classification systems do. He draws Figure 1.4 to illustrate the basic concepts used
`in this method.
`To use this method, we needed to identify the kinds of data that could be collected in the
`field-the data space-as well as the kinds of solutions-the solution space. Data consist of such
`observations as the number of needles in a cluster. A final solution is a plant species. Classifica(cid:173)
`tion uses abstractions of both data and solutions. A datum such as "3 inches of rain falls in the
`region annually" might be generalized to "this is a dry, inland region." A solution and species
`description such as Pinus contorta murray ana (lodgepole pine) might be generalized to pine tree.
`There are variations of classification, but they all proceed by ruling out candidate solutions that
`do not fit the data. Further analysis of protocols on multiple cases would be needed to determine
`what kinds of knowledge were being used and how they were used.
`The knowledge engineer now has some questions for the naturalist. Suppose the solution
`space is given by a catalog of possibilities, such as the charts in the botany books we used on the
`
`Page 17 of 224
`
`FORD 1009
`
`

`
`INTRODUCTION AND OVERVIEW
`
`project. The protocol analysis in Figure 1.3 shows that the naturalist quickly ruled out the coastal
`varieties of the pine tree. But how about the many other species of pine that grow in the moun(cid:173)
`tains? With book in hand, he asks why the naturalist had not considered a coulter pine (Pinus
`coulteri). The naturalist is taken aback. He answers that the coulter pine actually is a plausible
`candidate and asks to see the pictures of the specimen. After looking at it, he says the pine cones
`are too small and that the specimen does not have a characteristic open tree shape like an oak
`tree. Continuing, the knowledge engineer asks about the sugar pine. The naturalist answers that
`the cross-examination ferls like "lesson time," but that sugar pines are the tallest pine trees in the
`world, being more than 200 feet tall and that you would know immediately if you were in a sugar
`pine forest. However, the idea of systematically going through the catalog to analyze the proto(cid:173)
`cols is appealing, so the two of them start working over them. The naturalist suggests that all of
`this post-protocol explanation and introspection might make him more systematic about his own
`methods.
`As we continue to work on this, the significant size of the search space becomes clearer to
`everyone. One could be "systematic" by asking leading questions about each possible plant spe(cid:173)
`cies. I;Iowever, there are about 50 common species of just pine trees in California. Species of
`trees represent only a small fraction of the native plants. A quick check of some catalogs suggests
`that there are about 7,000 plant species of interest in California, not counting 300 or 400 species
`of wildflowers that are often discounted as weeds. It is clear that any identification process needs
`a means to focus its search, and that we need to be economical about asking questions. We begin
`to examine the protocols for clues about search strategy. We want to understand not only what he
`knows about particular plants, but also how he narrows the search, using knowledge about the
`families of plants and other things to quickly focus on a relatively small set of candidates.
`
`Notes The group is developing a systematic approach for gathering and analyzing the
`do~ain lmowledge. The protocol analysis has led to a framework based on heuristic clas(cid:173)
`sification. Usually protocol analysis and selection of a framework are done together. It is
`not ull.usual for the analysis to reveal aspects that were :q.ot articulated. Experts sometimes
`_ forget to say things out loud and sometimes make mistakes. For these reasons, it is good
`practice to compare many examples of protocols on related cases. Knowledge needed for
`a task is seldom revealed all at once.
`
`Connections See Chapter 2 for characterizations of problem solving as search andfor
`the terminology of data spaces, search spaces, and solution spaces. This chapter focuses
`on basic methods for search. To build a computational model of a task domain, we need
`-to identify-the search spaces and to determine what knowledge is needed and how itis
`used. See Chapter 3 for a discussion about approaches and psychological assumptionsfor
`the analysis of protocols. See Chapters 7, 8, and 9 for examples of the knowledge-level
`analysis and computational models for different tasks.
`
`A "Naturalist in a Box"
`As we build up a collection of cases and study the transcripts, we become aware of some diffi(cid:173)
`culties with our approach. The first problem is that the naturalist is depending a great deal on
`
`Page 18 of 224
`
`FORD 1009
`
`

`
`To Build a Knowledge System
`
`9
`
`properties of the plants that he can see and smell. Much of the knowledge he is using in doing
`this is not articulated in the transcripts.
`Our hiking club representative kids the naturalist, saying he is "cheating" by just looking at
`the plants. We decide

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket