`
`FORD 1107
`
`
`
`Introduction to
`Kno
`led
`
`Mark Stefik
`
`Morgan Kaufmann Publishers, Inc.
`San Francisco, California
`
`Page 2 of 153
`
`FORD 1107
`
`
`
`Sponsoring Editor Michael B. Morgan
`Production Manager Yonie Overton
`Production Editor Elisabeth Beller
`Editorial Coordinator Marilyn Uffner Alan
`Text Design, Project Management,
`Electronic Illustrations and Composition Professional Book Center
`Cover Design Carron Design
`Copyeditor Anna Huff
`Printer Quebecor Fairfield
`
`Morgan Kaufmann Publishers, Inc.
`Editorial and Sales Office
`340 Pine Street, Sixth Floor
`San Francisco, CA 94104-3205 USA
`Telephone 415/392-2665
`Facsimile 415/982-2665
`Internet mkp@mkp.com
`
`Library of Congress Cataloging-in-Publication Data is available for this book.
`) t::~
`I.,,,.
`
`ISBN 1-55860-166-X
`
`r-""i l'\
`
`0(!-+
`
`© 1995 by Morgan Kaufmann Publishers, Inc.
`
`All rights reserved
`
`Printed in the United States of America
`
`99 98 97 96 95
`
`5 4 3 2 1
`
`~~""'l
`,~:.--~-~1
`.<-''"'
`{
`No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or
`by any means---electronic, mechanical, photocopying, recording, or otherwise-without the prior written
`permission of the publisher.
`
`Brand and product names referenced in this book are trademarks or registered trademarks of their respec(cid:173)
`tive holders and are used here for informational purposes only.
`
`Page 3 of 153
`
`FORD 1107
`
`
`
`on tents
`
`Foreword Edward A. Feigenbaum
`
`xiii
`
`Preface
`
`xv
`
`Notes on the Exercises
`
`xix
`
`INTRODUCTION AND OVERVIEW
`
`PART I FOUNDATIONS
`
`CHAPTER 1 Symbol Systems
`
`22
`
`1.1 Symbols and Symbol Structures
`1.1.1 What Is a Symbol?
`22
`1.1.2 Designation
`25
`28
`1.1.3 Causal Coupling
`1.1.4 Cognitive and Document Perspectives of Symbols
`1.1.5 Summary and Review
`32
`Exercises for Section 1.1
`32
`
`35
`1.2 Semantics: The Meanings of Symbols
`36
`1.2.1 Model Theory and Proof Theory
`1.2.2 Reductionist Approaches for Composing Meanings
`1.2.3 Terminology for Graphs and Trees
`44
`1.2.4 Graphs as Symbol Structures
`47
`1.2.5 The Annotation Principle and Metalevel Notations
`1.2.6 Different Kinds of Semantics
`55
`1.2.7 Summary and Review
`59
`Exercises for Section 1.2
`60
`
`30
`
`41
`
`50
`
`1
`
`19
`
`21
`
`v
`
`Page 4 of 153
`
`FORD 1107
`
`
`
`vi
`
`CONTENTS
`
`68
`
`71
`77
`80
`84
`
`1.3 Modeling: Dimensions of Representation
`1.3.1 Fidelity and Precision
`69
`1.3.2 Abstractions and Implementations
`1.3.3 Primitive and Derived Propositions
`1.3.4 Explicit and Implict Representations
`1.3.5 Representation and Canonical Form
`1.3.6 Using Multiple Representations
`85
`1.3.7 Representation and Parallel Processing
`1.3.8 Space and Time Complexity
`89
`1.3.9 Structural Complexity
`98
`1.3.10 Summary and Review
`101
`Exercises for Section 1.3
`102
`
`88
`
`107
`
`1.4 Programs: Patterns, Simplicity, and Expressiveness
`1.4.1 Using Rules to Manipulate Symbols
`107
`1.4.2 Treating Programs as Data
`110
`1.4.3 Manipulating Expressions for Different Purposes
`1.4.4 Pattern Matching
`114
`1.4.5 Expressiveness, Defaults, and Epistemological Primitives
`1.4.6 The Symbol Level and the Knowledge Level
`129
`1.4.7 Summary and Review
`130
`Exercises for Section 1.4
`131
`
`112
`
`1.5 Quandaries and Open Issues
`
`136
`
`CHAPTER 2 Search and Problem Solving
`
`147
`2.1 Concepts of Search
`2.1.1 Solution Spaces and Search Spaces
`2.1.2 Terminology about Search Criteria
`2.1.3 Representing Search Spaces as Trees
`2.1.4 Preview of Search Methods
`157
`2.1.5 Summary and Review
`158
`Exercises for Section 2.1
`159
`
`148
`153
`156
`
`165
`2.2 Blind Search
`165
`2.2.1 Depth-First and Breadth-First Search
`2.2.2 Top-Down and Bottom-Up Search: A Note on Terminology
`2.2.3 Simple and Hierarchical Generate-and-Test
`173
`2.2.4 A Sample Knowledge System Using Hierarchical
`Generate-and-Test
`180
`2.2.5 Simple and Backtracking Constraint Satisfaction
`2.2.6 Summary and Review
`193
`Exercises for Section 2.2
`194
`
`187
`
`117
`
`146
`
`171
`
`Page 5 of 153
`
`FORD 1107
`
`
`
`CONTENTS
`
`vii
`
`203
`2.3 Directed Search
`205
`2.3.1 Simple Match
`2.3.2 Means-Ends Analysis
`210
`2.3.3 Hierarchical Match and Skeletal Planning
`225
`2.3.4 Hill Climbing and Best-first Search
`2.3.5 Shortest-Path Methods
`232
`2.3.6 A* and Related Methods
`239
`248
`2.3.7 Summary and Review
`251
`Exercises for Section 2.3
`
`219
`
`259
`2.4 Hierarchical Search
`264
`2.4.1 Two-Level Planning
`2.4.2 Planning with Multiple Abstraction Levels
`2.4.3 Planning with Imperfect Abstractions
`275
`279
`2.4.4 Summary and Review
`280
`Exercises for Section 2.4
`
`271
`
`2.5 Quandaries and Open Issues
`
`287
`
`.,
`
`CHAPTER 3 Knowledge and Software Engineering
`
`3.1 Understanding Knowledge Systems in Context
`292
`3.1.1 The Terminology of Knowledge Systems and Expertise
`3.1.2 Knowledge Systems and Document Systems:
`Five Scenarios
`299
`3.1.3 Preview of Knowledge Acquisition Topics
`312
`3.1.4 Summary .and Review
`Exercises for Section 3.1
`312
`
`311
`
`291
`
`292
`
`314
`3.2 Formulating Expertise
`3.2.1 Conducting Initial Interviews
`3.2.2 Taking Protocols
`320
`324
`3.2.3 Characterizing Search Spaces
`3.2.4 Adapting Protocol Analysis for Knowledge Systems
`3.2.5 Summary and Review
`330
`Exercises for Section 3.2
`330
`
`314
`
`336
`3.3 Collaboratively Articulating Work Practices
`3.3.1 Variations in Processes for Interview and Analysis
`3.3.2 Documenting Expertise
`345
`3.3.3 Engineering Software and Organizations
`358
`3.3.4 Summary anq Review
`Exercises for Section 3.3
`359
`
`349
`
`327
`
`336
`
`Page 6 of 153
`
`FORD 1107
`
`
`
`viii
`
`CONTENTS
`
`365
`3.4 Knowledge versus Complexity
`3.4.1 MYCIN: Study of a Classic Knowledge System
`365
`3.4.2 The Knowledge Hypothesis and the Qualification Problem
`3.4.3 Summary and Review
`389
`Exercises for Section 3.4
`389
`
`3.5 Open Issues and Quandaries
`
`394
`
`PART II THE SYMBOl lEVEl
`
`CHAPTER 4 Reasoning about Time
`
`380
`
`403
`
`405
`
`407
`4.1 Temporal Concepts
`407
`4.1.1 Timeline Representations
`4.1.2 A Discrete Model of Transactions
`in the Balance of an Account
`408
`4.2 Continuous versus Discrete Temporal Models
`4.3 Temporal Uncertainty and Constraint Reasoning
`4.3.1 Partial Knowledge of Event Times
`412
`4.3.2 Arc Consistency and Endpoint Constraints
`4.3.3 Time Maps and Scheduling Problems
`416
`4.3.4 The Interface between a Scheduler and
`a Temporal Database
`416
`4.4 Branching Time
`421
`4.5 Summary and Review
`Exercises for Chapter 4
`
`423
`424
`
`410
`411
`
`414
`
`4.6 Open Issues and Quandaries
`
`427
`
`CHAPTER 5 Reasoning about Space
`
`432
`
`433
`5.1 Spatial Concepts
`5.2 Spatial Search
`435
`436
`5.2.1 Simple Nearest-First Search
`5.2.2 Problems with Uniform-Size Regions
`438
`5.2.3 Quadtree Nearest-First Search
`5.2.4 Multi-Level Space Representations
`5.3 Reasoning about Shape
`442
`5.4 The Piano Example: Using Multiple Representations of Space
`5.4.1 Reasoning for the Piano Movers
`444
`5.4.2 Rendering a Piano
`448
`5.4.3 The Action of a Piano
`450
`
`440
`
`437
`
`444
`
`Page 7 of 153
`
`FORD 1107
`
`
`
`ix
`
`460
`
`541
`
`543
`
`CONTENTS
`
`5.5 Summary and Review
`Exercises for Chapter 5
`
`452
`453
`
`5.6 Open Issues and Quandries
`
`458
`
`CHAPTER 6 Reasoning about Uncertainty and Vagueness
`
`461
`6.1 Representing Uncertainty
`6.1.1 Concepts about Uncertainty
`6.1.2 The Certainty-Factor Approach
`6.1.3 The Dempster-Shafer Approach
`6.1.4 Probability Networks
`483
`6.1.5 Summary and Conclusions
`Exercises for Section 6.1
`506
`
`461
`469
`476
`
`504
`
`517
`6.2 Representing Vagueness
`6.2.1 Basic Concepts of Fuzzy Sets
`6.2.2 Fuzzy Reasoning
`524
`6.2.3 Summary and Conclusions
`Exercises for Section 6.2
`532
`
`518
`
`531
`
`6.3 Open Issues and Quandries
`
`533
`
`PART Ill THE KNOWLEDGE LEVEl.
`
`CHAPTER 7 Classification
`
`543
`7.1 Introduction
`7.1.1 Regularities and Cognitive Economies
`54 7
`7.2 Models for Classification Domains
`7.2.1 A Computational Model of Classification
`7 .2.2 Model Variations and Phenomena
`549
`7 .2.3 Pragmatics in Classification Systems
`554
`7 .2.4 Summary and Review
`556
`Exercises for Section 7.2
`557
`
`543
`
`547
`
`7.3 Case Studies of Classification Systems
`7.3.1 Classification in MYCIN
`563
`7.3.2 Classification in MORE
`567
`7 .3.3 Classification in MOLE
`572
`7.3.4 Classification in MDX
`580
`7.3.5 Classification in PROSPECTOR
`7.3.6 Summary and Review
`586
`Exercises for Section 7.3
`586
`
`563
`
`582
`
`Page 8 of 153
`
`FORD 1107
`
`
`
`CONTENTS
`
`588
`7.4 Knowledge and Methods for Classification
`7.4.1 Knowledge-Level and Symbol-Level Analysis
`of Classification Domains
`589
`7.4.2 MC-1: A Strawman Generate-and-Test Method
`7.4.3 MC-2: Driving from Data to Plausible Candidates
`7.4.4 MC-3: Solution-Driven Hierarchical Classification
`7.4.5 MC-4: Data-Driven Hierarchical Classification
`7.3.6 Method Variations for Classification
`599
`7.4.7 Summary and Review
`602
`Exercises for Section 7.4
`603
`
`592
`593
`594
`596
`
`7.5 Open Issues and Quandaries
`
`604
`
`CHAPTER 8 Configuration
`
`608
`8.1 Introduction
`8.1.1 Configuration Models and Configuration Tasks
`8.1.2 Defining Configuration
`610
`612
`8.2 Models for Configuration Domains
`8.2.1 Computational Models of Configuration
`8.2.2 Phenomena in Configuration Problems
`8.2.3 Summary and Review
`620
`Exercises for Section 8.2
`621
`
`612
`615
`
`608
`
`609
`
`625
`
`8.3 Case Studies of Configuration Systems
`8.3.1 Configuration in XCON
`625
`8.3.2 Configuration in M1/MICON
`633
`8.3.3 Configuration in MYCIN's Therapy Task
`8.3.4 Configuration in VT
`640
`8.3.5 Configuration in COSSACK
`8.3.6 Summary and Review
`650
`Exercises for Section 8.3
`652
`
`647
`
`637
`
`8.4 Methods for Configuration Problems
`656
`8.4.1 Knowledge-Level and Symbol-Level Analysis
`of Configuration Domains
`656
`8.4.2 MCF-1: Expand and Arrange
`661
`8.4.3 MCF-2: Staged Subtasks with Look-Ahead
`8.4.4 MCF-3: Propose-and-Revise
`665
`8.4.5 Summary and Review
`665
`Exercises for Section 8.4
`666
`
`662
`
`8.5 Open Issues and Quandaries
`
`667
`
`Page 9 of 153
`
`FORD 1107
`
`
`
`CONTENTS
`
`CHAPTER 9 Diagnosis and Troubleshooting
`
`xi
`
`670
`
`670
`9.1 Introduction
`9 .1.1 Diagnosis and Troubleshooting Scenarios
`9 .1.2 Dimensions of Variation in Diagnostic Tasks
`9.2 Models for Diagnosis Domains
`673
`9.2.1 Recognizing Abnormalities and Conflicts
`9.2.2 Generating and Testing Hypotheses
`680
`9.2.3 Discriminating among Hypotheses
`690
`9 .2.4 Summary and Review
`700
`Exercises for Section 9.2
`701
`
`670
`671
`
`677
`
`711
`
`9.3 Case Studies of Diagnosis and Troubleshooting Systems
`9.3.1 Diagnosis in DARN
`712
`715
`9.3.2 Diagnosis in INTERNIST
`9.3.3 Diagnosis in CASNET/GLAUCOMA
`9.3.4 Diagnosis in SOPHIE III
`729
`9.3.5 Diagnosis in GDE
`737
`9.3.6 Diagnosis in SHERLOCK
`9.3.7 Diagnosis in XDE
`748
`9.3.8 Summary and Review
`759
`Exercises for Section 9.3
`761
`
`744
`
`724
`
`9.4 Knowledge and Methods for Diagnosis
`9.4.1 Plan Models for Diagnosis
`765
`766
`9.4.2 Classification Models for Diagnosis
`9.4.3 Causal and Behavioral Models for Systems
`9.4.4 Summary and Review
`768
`Exercises for Section 9.4
`769
`
`764
`
`767
`
`9.5 Open Issues and Quandaries
`
`771
`
`APPENDIX A Annotated Bibliographies by Chapter
`
`APPENDIX B Selected Answers to Exercises
`
`Index
`
`853
`
`776
`
`811
`
`Page 10 of 153
`
`FORD 1107
`
`
`
`•
`lntro uction an
`verv1ew
`The Building of a Knowledge System
`tQ Identify Wild Plants
`
`There is always a tension between top-down and bottom-up presentations. A top-down presenta(cid:173)
`tion starts with goals and then establishes a framework for pursuing the parts in depth. Bottom(cid:173)
`up presentations start with fundamental and primitive concepts and then build to higher-level
`ones. Top-down presentations can be motivating but they risk lack of rigor; bottom-up presenta(cid:173)
`tions can be principled but they risk losing sight of goals and direction.
`Most of this book is organized bottom-up. This reflects my desire for clarity in a field that
`is entering its adolescence, metaphorically if not chronologically. The topics are arranged so a
`reader starting at the beginning is prepared for concepts along the way. Occasionally I break out
`of the bottom-up rhythm and step-at-a-time development to survey where we are, where we have
`been, and where we are going. This introduction serves that purpose.
`The following overview traces the steps of building a hypothetical knowledge system.
`Woven into the story are some notes that connect it with sections in this book that develop the
`concepts further. Many of the questions and issues of knowledge engineering that are mysterious
`in a bottom-up presentation seem quite natural when they are encountered in the context of
`building a knowledge system. In particular, it becomes easy to see why they arise.
`I made up the following story, so it does not require a disclaimer saying that the names
`have been changed to protect the innocent. Nonetheless, the phenomena in the story are familiar
`to anyone who has developed a knowledge system. Imagine that we work for a small software
`company that builds popular software packages including knowledge systems. This is a story of a
`knowledge system: how it was conceived, built, introduced, used, and later extended.
`
`To Build a Knowledge System
`It all began when we were approached by an entrepreneur who enjoys hiking and camping in the
`hills, mountains, and deserts of California. Always looking for a new market opportunity, he
`
`1
`
`Page 11 of 153
`
`FORD 1107
`
`
`
`2
`
`INTRODUCTION AND OVERVIEW
`
`noticed that campers and hikers like to identify wild plants but that they are not very good at it.
`Identifying wild plants can be useful for survival in the woods ("What can I eat?") and it also has
`recreational value. He was convinced that conservationists, environmentalists, and well-heeled
`hikers have a common need.
`The entrepreneur proposed that we build a portable knowledge system for identifying
`wildlife. He had consulted a California hiking club and a professional naturalist. He suggested
`that we begin by constructing a hypertext database about different kinds of plants, describing
`their appearance, habitats, relations to other wildlife, and human uses. Our initial project team is
`as follows:
`
`A hike representing the user community and customer.
`A naturalist, our domain expert on wildlife identification.
`A knowledge engineer, our expert in acquiring knowledge and knowledge representation.
`A software engineer, the team leader having overall responsibility for the development of
`software.
`
`After some discussion within the company we agreed to develop a prototype version of the
`knowledge system using the latest palmsize or "backpack" computers. If the technical project
`seemed feasible, we would then consider the next steps of commercialization. We planned to use
`the process of building the prototype to help us determine the feasibility of a larger project. We
`recruited the naturalist and a prominent member of one of the hiking clubs to our project team.
`We called the group together and started to learn about each other's ideas and terminology.
`
`Notes
`.The participants are just getting started. They need to size up the task, develop
`their goals, and determine their respective roles oh the project. They need to .consider
`many questions about the nature of the knowledge system they would build. They ask
`"Who wants it?'' because the situation and people matter for shaping the system. They
`also ask "What do these people do?" and "What role· should the system play?" because
`these issues arise in all software engineering projects,
`
`Connections See Chapter 3 for a discussion of the initial interview concepts :and
`background on software engineering.
`
`Our Initial Interviews
`Our naturalist tells us he wants to focus on native trees of California. We begin with the famous
`California redwood trees, the Sequoia sempervirens or coastal redwood and the Sequoiadendron
`giganteum or giant redwood that thrives in the Sierras. Our naturalist is a stickler for complete(cid:173)
`ness. He also adds the Metasequoia glyptostroboides or dawn redwood, which grew in most parts
`of North America. The dawn redwood was thought to have become extinct until a grove was dis(cid:173)
`covered in China in 1944. At a blackboard he draws a chart of plant families as shown in Figure
`I. I. He tells us about the history of the plant kingdom:
`
`Page 12 of 153
`
`FORD 1107
`
`
`
`~
`
`DIVISION
`
`ORDER
`
`FAMILY
`
`Gymnosperms
`
`Angiosperms
`
`I
`
`Taxale I Yew
`
`I
`
`Coniferale
`
`r
`
`Taxodiaceae
`(swamp cypress)
`
`Pinus
`(pine)
`
`Abies
`(fir)
`
`Pice a
`(spruce)
`
`Dicotyledon
`
`I
`
`I
`
`1···l
`
`Oak Maple
`
`Elm
`
`GENUS
`
`Taxodium
`
`Metasequoia
`
`· Sequoia
`
`Sequoiadendron
`
`Cunninghamia
`
`Sciadopitys
`
`Others
`
`SPECIES
`
`Taxodium
`distichum
`(swamp
`bald cypress)
`
`Metasequoia
`glyptostroboides
`(dawn
`redwood)
`
`Sequoia
`sempervirens
`(coastal
`redwood)
`
`Sequoiadendron
`giganteum
`(giant
`redwood)
`
`Cunninghamia
`lane eo lata
`(Chinese
`fir)
`
`Sciadopitys
`vertic illata
`(Japanese
`umbrella pine)
`
`FIGURE 1.1. A partial taxonomy showing relations among plants closely related to California redwood trees. In our scenario and
`thought experiment, the naturalist was asked about redwoods and started lecturing about plant families.
`
`w
`
`Page 13 of 153
`
`FORD 1107
`
`
`
`4
`
`INTRODUCTION AND OVERVIEW
`
`Plants evolved on Earth from earlier one-celled animals. About 200 million years
`ago was the age of conifers, the cone-bearing trees. Redwoods are members of the
`conifers, which were the dominant plant species at that time. They are among the
`gymnosperms, plants that release their seeds without a protective coating or shell.
`
`Our naturalist is a gifted teacher but he tends to slip into what we have started to call his
`"lecture mode." After an hour of exploring taxonomies of the plant kingdom we begin to get rest(cid:173)
`less. One of the team members interrupts him to ask the proper location in the taxonomy for the
`"albino redwoods," which are often visited in Muir Woods. This question jars the naturalist.
`Albinos do not fit into the plant taxonomy because they are not a true species, but rather a
`mutated parasite from otherwise normal coastal redwoods. Redwoods propagate by both seeds
`and roots. Sometimes something goes wrong in the root propagation, resulting in a tree that lacks
`the capability to make chlorophyll. Such rare plants would normally die, except a few that con(cid:173)
`tinue to live parasitically off the parent. Albino trees have extra pores on their leaves that rnalce
`them efficient for moving quantities of water and nutrients from their host and parent trees.
`At this point another member of the group, a horne gardener, wants to know about trees she
`had purchased at a local nursery called Sequoia sempervirens soquel and Sequoia sempervirens
`aptos blue. Again, the naturalist explains that these trees are not really species either. Rather,
`they are clones of registered individual coastal redwoods, propagated by cuttings and popular
`with nurseries because they grow to be predictable "twins" to the parent tree, having the same
`shape and color. These registered clones are sometimes called cultivars. They would be impossi(cid:173)
`ble to identify reliably by visual examination alone and they are not found in the wild.
`This leads us to a discussion of exactly what a taxonomic chart means, what a species is,
`what the chart is useful for, and whether it is really a good starting place for plant identification.
`Clearly the chart does not contain all the information we need about plants because some plants
`of apparent interest do not appear in it. We also have learned that there are some plants about
`which there is debate as to their lineage. After discussion we decide that the information is inter(cid:173)
`esting and that it would be a good base for establishing names of plants, but that it would not be
`appropriate for us to proceed by just filling out more and more of the taxonomic chart. We decide
`to focus on actual cases of plant identification at our next session.
`
`Notes
`In this part of the story the participants are beginning to build bridges into each
`other's areas to understand how they will work together. They bring to the discussion
`some preexisting symbol structures or representations, such as the plant taxonomic chart.
`Often there needs to be discussion about just what the symbols mean and whether those
`meanings are useful for the task at hand. As in this story, it sometimes turns out that these
`symbols and representations need to be modified. When the end product includes a
`knowledge system, then conventions about symbol structures must be made precise
`enough for clear communication and also expressive enough for the distinctions made in
`performing the task
`
`Connections See Chapter 1 for an introduction to symbols and symbol structures and
`the assignment of meaning to them. See Chapter 3 for a discussion of tools and methods
`for incremental formalization of knowledge.
`
`Page 14 of 153
`
`FORD 1107
`
`
`
`To Build a Knowledge System
`
`5
`
`1. The specimen is tall,
`2. I'd guess about 30 or 35 feet tall.
`3. So it's a tree ...
`4. symmetrical in shape.
`5. From the needles in the foliage, it's obviously a pine,
`6. but not one of the coastal pines since we're at too high an elevation in these mountains.
`7. Could be either a Pinus ponderosa, ajeffreyi, or a torreyana.
`8. Let's see (walking in closer) ... dark green needles, not yellow-green,
`9. about7inches1ong,and
`10. in clusters of three.
`11. Rather grayish bark, not cinnamon-brown.
`12. Medium-sized cones.
`13. Seems to be a young tree. Others like it are near, reaching heights of over a hundred feet tall.
`14. It's probably a Pinus jeffreyi, that is, a Jeffrey pine.
`
`FIG~RE 1 . .2. Transcription of our naturalist talking through the identification of one of the plants.
`
`The Naturalist in the Woods
`We prepare to study the naturalist's classification process on some sample cases. One member of
`the group sets up portable video and audio recorders at a local state park. Our hiking club mem(cid:173)
`ber is our prototype user. We define his job as walking into the woods and selecting a plant to be
`identified. In this way we hope to gain insight into what plants he finds interesting and to test the
`relevance of the plant taxonomy. We ask the naturalist to "think out loud" as he identifies·plants.
`Recording such a session is called taking a protocol. This results in verbal data where the natural(cid:173)
`ist talks about the bark coloration, surface roots, and leaf shapes. After these dialogs are recorded
`and pktures of the plants are taken, we transcribe all of the tapes. Figure 1.2 shows a sample tran(cid:173)
`script.
`Mter the session in the park, we go over the transcripts carefully with the naturalist, trying
`to reconstruct any intermediate aspects of his thought process that were not verbalized. We ask
`him a variety of questions. "What else did you consider here? How did you know that it was not
`a manzanita? Why couldn't it have been a fir tree or a digger pine? Why did you ask about the
`coloring of the needles?" Our goal is not so much to capture exactly what his reasoning was in
`every case, but rather to develop a set of case examples that we could use as benchmarks for test(cid:173)
`ing our computer system. As it turns out, the naturalist does different things on different cases.
`He does not always start out with exactly the same set of questions, so his method is not one of
`just working through a fixed decision tree or discrimination network.
`
`Notes At this point the group has begun a process of collecting knowledge about the
`task in terms. of examples of problem-solving behavior. As we will see below, it is possi(cid:173)
`ble to make some false starts in this, and it is also possible to recover from such false
`starts.
`
`Connections See Chapter 3 for discussion of the assumptions and methods of the
`"transfer of expertise approach.
`
`Page 15 of 153
`
`FORD 1107
`
`
`
`6
`
`INTRODUCTION AND OVERVIEW
`
`Characterizing the Task
`The knowledge engineer begins a tentative analysis of the protocols. He tells us he might need to
`analyze these sessions several different ways before we are done. He wants to characterize the
`actions of the naturalist in terms of problem-solving steps. His approach is to model the problem(cid:173)
`solving task as a search problem, in which the naturalist's steps carry out different operations in
`the search. Figure 1.3 shows his first tentative analysis of the session from Figure 1.2. In this, he
`
`Collect Initial Data
`Determine height of plant: Plant is more than 30-feet tall. (1)
`Shape is symmetrical. (4)
`Foliage has needles. (5)
`
`Determine General Classification
`Infer: Plant is a tree. (3) Plant is a pine tree or a close relative.
`Knowledge: Only pines and close relatives have needle-shaped leaves. (5)
`
`Collect Data about Location
`Mountrun location.
`Knowledge: Trees from the low areas and the coast do not grow in the mountains. (6)
`Rule out candidates that do not grow in this region.
`
`Form Specific Candidate Hypotheses
`Mountain pine trees include the Pinus ponderosa, the jeffreyi, and the torreyana .. (7)
`
`Determine Data to Discriminate among Hypotheses
`Knowledge: The hypotheses make different predictions about needle color and bark color.
`Species:
`ponderosa
`jeffreyi
`torreyana
`Bark color:
`cinnamon-brown
`grey
`brown
`Needle color:
`yellow-green
`dark green
`du11 green.
`Needle clusters:
`three
`three
`five
`Needlelength:
`8"
`7"
`10"
`
`CollectDiscriminating Data
`Needles are dark green. (8)
`Needle$ are 7 inches long. (9)
`Needles are clustered in threes. (10)
`Barkis grayish. (11) ·
`Cones are medium-size. (12)
`
`Consider Reliability of Data
`There are other trees in the area of the same character. (13)
`Mature he!ghtofother:s is more than lOOfeet. (13)
`Infer that the specimen is representative but not yet full grown. (13)
`
`Dete'rm.ine Whether Unique Solution Is Found
`Only a Pinus jeffreyi fits th.e data. ( 14)
`
`Retrieve Common Name
`Knowledge: APinusjeffreyi is commonly called a Jeffrey pine.
`
`FIGURE 1.3. Preliminary analysis of the protocol from the transcript in Figure 1.2. The numbers in paren(cid:173)
`theses refer to the corresponding steps in Figure 1.2.
`
`Page 16 of 153
`
`FORD 1107
`
`
`
`To Build a Knowledge System
`
`7
`
`Data. Space
`
`Solution Space
`
`Examples:
`Tall tree
`Desert region
`Spring bloomer ...
`
`Abstracted
`data
`
`Heuristic
`match
`
`Examples:
`Pinus family
`Vine in bush ...
`
`t
`
`/II..
`Data
`' I ' abstraction
`
`t G
`
`Solution
`, 1 ,
`refinement 'f
`
`t
`
`Examples:
`Arbutus menzies IT
`(Madrone) ...
`
`Examples:
`Tree is 30 feet tall
`Rainfall is 4 inches
`per year
`Berries are red
`
`FIGURE 1.4. The search spaces for classification. This method reasons about data, which may be
`abstracted into general features. The data are associated heuristically with abstracted solutions and ulti(cid:173)
`mately specific solutions.
`
`characterized operations such as "determining the general classification," "collecting data,"
`"forming specific candidate hypotheses," and so on. These operations constitute a sketch of a
`comp~tation model for the plant identification, which searches through a catalog of possible
`answers.
`This tentative analysis of the protocol is consistent with a computational model that the
`knowledge engineer calls "classification." Someone in the group objects, arguing that the natu(cid:173)
`ralist was not "classifying." Instead, he was merely "identifying" plants because the classes of
`possible plants were predetermined. The knowledge engineer agrees but explains that this is
`exactly what classification systems do. He draws Figure 1.4 to illustrate the basic concepts used
`in this method.
`To use this method, we needed to identify the kinds of data that could be collected in the
`field-the data space-as well as the kinds of solutions-the solution space. Data consist of such
`observations as the number of needles in a cluster. A final solution is a plant species. Classifica(cid:173)
`tion uses abstractions of both data and solutions. A datum such as "3 inches of rain falls in the
`region annually" might be generalized to "this is a dry, inland region." A solution and species
`description such as Pinus contorta murray ana (lodgepole pine) might be generalized to pine tree.
`There are variations of classification, but they all proceed by ruling out candidate solutions that
`do not fit the data. Further analysis of protocols on multiple cases would be needed to determine
`what kinds of knowledge were being used and how they were used.
`The knowledge engineer now has some questions for the naturalist. Suppose the solution
`space is given by a catalog of possibilities, such as the charts in the botany books we used on the
`
`Page 17 of 153
`
`FORD 1107
`
`
`
`INTRODUCTION AND OVERVIEW
`
`project. The protocol analysis in Figure 1.3 shows that the naturalist quickly ruled out the coastal
`varieties of the pine tree. But how about the many other species of pine that grow in the moun(cid:173)
`tains? With book in hand, he asks why the naturalist had not considered a coulter pine (Pinus
`coulteri). The naturalist is taken aback. He answers that the coulter pine actually is a plausible
`candidate and asks to see the pictures of the specimen. After looking at it, he says the pine cones
`are too small and that the specimen does not have a characteristic open tree shape like an oak
`tree. Continuing, the knowledge engineer asks about the sugar pine. The naturalist answers that
`the cross-examination ferls like "lesson time," but that sugar pines are the tallest pine trees in the
`world, being more than 200 feet tall and that you would know immediately if you were in a sugar
`pine forest. However, the idea of systematically going through the catalog to analyze the proto(cid:173)
`cols is appealing, so the two of them start working over them. The naturalist suggests that all of
`this post-protocol explanation and introspection might make him more systematic about his own
`methods.
`As we continue to work on this, the significant size of the search space becomes clearer to
`everyone. One could be "systematic" by asking leading questions about each possible plant spe(cid:173)
`cies. I;Iowever, there are about 50 common species of just pine trees in California. Species of
`trees represent only a small fraction of the native plants. A quick check of some catalogs suggests
`that there are about 7,000 plant species of interest in California, not counting 300 or 400 species
`of wildflowers that are often discounted as weeds. It is clear that any identification process needs
`a means to focus its search, and that we need to be economical about asking questions. We begin
`to examine the protocols for clues about search strategy. We want to understand not only what he
`knows about particular plants, but also how he narrows the search, using knowledge about the
`families of plants and other things to quickly focus on a relatively small set of candidates.
`
`Notes The group is developing a systematic approach for gathering and analyzing the
`do~ain lmowledge. The protocol analysis has led to a framework based on heuristic clas(cid:173)
`sification. Usually protocol analysis and selection of a framework are done together. It is
`not ull.usual for the analysis to reveal aspects that were :q.ot articulated. Experts sometimes
`_ forget to say things out loud and sometimes make mistakes. For these reasons, it is good
`practice to compare many examples of protocols on related cases. Knowledge needed for
`a task is seldom revealed all at once.
`
`Connections See Chapter 2 for characterizations of problem solving as search andfor
`the terminology of data spaces, search spaces, and solution spaces. This chapter focuses
`on basic methods for search. To build a computational model of a task domain, we need
`-to identify-the search spaces and to determine what knowledge is needed and how itis
`used. See Chapter 3 for a discussion about approaches and psychological assumptionsfor
`the analysis of protocols. See Chapters 7, 8, and 9 for examples of the knowledge-level
`analysis and computational models for different tasks.
`
`A "Naturalist in a Box"
`As we build up a collection of cases and study the transcripts, we become aware of some diffi(cid:173)
`culties with our approach. The first problem is that the naturalist is depending a great deal on
`
`Page 18 of 153
`
`FORD 1107
`
`
`
`To Build a Knowledge System
`
`9
`
`properties of the plants that he can see and smell. Much of the knowledge he is using in doing
`this is not articulated in the transcripts.
`Our hiking club representative kids the naturalist, saying he is "cheating" by just looking at
`the plants. We decide