`an Intelligent Interface for
`Information Retrieval
`
`H.oger Howard Thompson
`Ph.D . Thesis
`Computer a nd Information Sci ence Department
`Univ ersity of Massachusetts
`
`corNS Techni cal Report H8-RH
`
`Computer and Information Science
`
`001
`
`Facebook Inc. Ex. 1214 Part 1
`
`
`
`002
`
`Facebook Inc. Ex. 1214 Part 1
`
`
`
`The Design and Implementation of
`an Intelligent Interface for
`Information Retrieval
`
`A Dissertation Presented
`
`by
`
`ROGER HOWARD THOMPSON
`
`Submitted to the Graduate School of the
`
`University of Massachusetts in partial fulfillment
`
`of the requirements for the degree of
`
`Doctor of Philosophy
`
`February 1989
`
`Department 0(" Computer and lnformation Science
`
`003
`
`Facebook Inc. Ex. 1214 Part 1
`
`
`
`.~.:
`
`.t
`
`© Copyright by ROGER HOWARD THOMPSON 1989
`
`All Rights Reserved
`
`.......~
`
`004
`
`Facebook Inc. Ex. 1214 Part 1
`
`
`
`The Design and Implementation of
`
`an Intelligent Interface for
`
`Information Retrieval
`
`A Dissertation Presented
`
`by
`
`Roger Howard Thompson
`
`\(/Iirl CI/{/.ic
`
`_
`
`\ ' : Rich ards Adr ion , Ucpart.mcnt. Head
`Department. of Co m p u ter a nd Inform ati on Scien ce
`
`005
`
`Facebook Inc. Ex. 1214 Part 1
`
`
`
`DEDICATION
`
`This work is dedicated to the memory of
`
`Dr. Victor Paul Wierwille.
`
`IV
`
`006
`
`Facebook Inc. Ex. 1214 Part 1
`
`
`
`ACKNOWLEDGMENTS
`
`I would like to thank the following people, who helped me greatly to accomplish
`
`this work. First, I would like to thank my advisor Bruce Croft, whose constant
`
`encouragement and thoughtful constructive criticism was instrumental in helping see this
`
`project through. Professors Dave Stemple and Nick Belkin provided me with different
`
`perspectives that enabled me to think more clearly.about the subject.
`
`I would like to thank some of the residents of the Wombat Research Lab, Larry
`
`Lefkowitz, Carol Broverman, Tom Parenty , Norm Carver, and Al Hough for making the
`
`time bearable.
`
`I would like to thank my supervisors at Hughes Aircraft, Bill King and Jim
`
`.---Blackburnfor their understanding while finishingmy.writing.; .. - ~ ----- --- -- - --
`
`I would like to thank my friend, Andy Zitelli, for his wise counsel throughout my
`
`entire undergraduate and graduate academic career.
`
`Finally, I would like to express my gratitude to wife, Darlene, for her unfailing
`
`encouragement and support, and my daughter Rebeca for the joy that only a young child
`
`can bring.
`
`v
`
`007
`
`Facebook Inc. Ex. 1214 Part 1
`
`
`
`ABSTRACT
`
`THE DESIGN AND IMPLEMENTATION OF
`
`AN INTELLIGENT INTERFACE FOR
`
`INFORMATION RETRIEVAL
`
`February, 1989
`
`ROGER HOWARD THOMPSON, B.A., UNIVERSITY OF CALIFORNIA AT
`
`BERKELEY
`
`M.S., NEW MEXICO STATE UNIVERSITY
`
`Ph.D., UNIVERSITY OF MASSACHUSETIS
`
`Directed by: Professor W. Bruce Croft
`
`Commercial information (text) retrieval systems have been available since the early
`
`1960's. While they have provided a service allowing individuals to find useful documents
`
`ou t of the millions of documents contained in online databases, their are, a number of
`
`problems that prevent the user from being more effective. The primary problems are an
`
`inadequate means for specifying information needs, a single way of responding to all users
`
`and their information needs, and an inadequate user interface.
`
`This thesis describes the design and implementation of 13R, an intelligent interface
`
`for information retrieval the purpose of which is to overcome the limitations of current
`
`information retrieval systems by providing multiple ways of assisting the user to precisely
`
`specify his information need and to search for information. The system organization is
`
`based on a blackboard architecture and consists of a number of "experts" that work
`
`cooperatively to assist the user. The operation of the experts is coordinated by a control
`
`expert that makes its decisions based on a plan derived from the analysis of human search
`
`intermediaries, end user dialogues, and user model. The experts provide multiple formal
`
`search strategies, the lise and collection of domain knowledge, and browsing assistance.
`
`The operation of the system is demonstrated by four scenarios.
`
`VI
`
`008
`
`Facebook Inc. Ex. 1214 Part 1
`
`
`
`TABLE OF CONTENTS
`
`DEDICATION
`
`ACKNOWLEDGEMENTS
`
`ABSlRACf
`
`LIST OF FIGURES
`
`CHAPTER
`
`1
`
`Overview
`
`,........ ... ... ....... . ....... .... . ...... .... .. .
`
`1.1 Introduction
`
`'" ..
`
`... ... .... ... .. .. .... . ... ... .... ... .
`
`1.2 Retrieval Problems
`
`'....... .....
`
`1.3 Intermediary Model
`
`:. ... .... ... .... ... ........ ... .
`
`1.4 System Analysis and Requirements..... ................. ......... .... .. .. ....
`
`1.5 Architecture. .... ... .... ... . ......
`
`IV
`
`V
`
`VI
`
`Vll
`
`1
`
`1
`
`1
`
`4
`
`5
`
`7
`
`1.6 Organization................. ... .. ............... ................................. 12
`
`1.7 Contributions. ......... ... ... ........ ....................... .. .... . ... .... .... ... 12
`
`2
`
`Background and Related Work.... .
`
`14
`
`2.1
`
`Introduction
`
`,.... ... .
`
`. . ... .. .. .. ..... ... 14
`
`2.2 Traditional Information Retrieval..
`
`14
`
`2.3 Retrieval Problems.................. ...... . ..... .. ........... ......... ... ...... . 34
`
`2.4
`
`Intelligent Text RetrievaL.... ................ ... ............................... 39
`
`2.5 Analysis of Systems ,;
`
`2.6 Summary
`
`3
`
`The Basis of I3R
`
`43
`
`61
`
`62
`
`,
`
`3. I
`
`Introduction ........... ... .. ....... .. ...... ............... ....................... . 62
`
`3.2 Discussion
`
`3.3 Conclusion
`
`4
`
`Design and Implementation
`
`4.1
`
`Introduction
`
`"
`
`Vll
`
`62
`
`74
`
`75
`
`_..... 75
`
`009
`
`Facebook Inc. Ex. 1214 Part 1
`
`
`
`4.2 Design..... ....... .... ....... ... .... ...... .. ... .. ............ ..... .. ..... .. ... .... 75
`
`4.3
`
`Implementation of the Blackboard System
`
`4.4 Summary
`
`5
`
`Browsing Expert
`
`5.1 Introduction
`
`5.2 Definition
`
`5.3 Brow sing Operation
`
`5.4 Browsing Implementation
`
`5.5 Summary
`
`,
`
`6
`
`Example Scenarios
`
`"
`
`6.1 Introduction
`
`6.2 Evaluation
`
`6.3 Scenarios
`
`6.4 Possible Behavioral Changes
`
`6.5 Summary
`
`7
`
`Conclusion
`
`7.1 Summary
`
`7.2 Future Directions
`
`"
`
`BIBLIOGRAP1·IY
`
`,
`
`I 17
`
`124
`
`125
`
`125
`
`125
`
`131
`
`145
`
`147
`
`148
`
`148
`
`148
`
`150
`
`197
`
`198
`
`199
`
`199
`
`202
`
`207
`
`Vlll
`
`010
`
`Facebook Inc. Ex. 1214 Part 1
`
`
`
`LIST OF FIGURES
`
`1. 1 System organization of 13R . .. . . ... . .. . . .. . . .. . . . .. .. . . .. .. .. . .. . . ... . . ... . .. . .. . ... . .. . 11
`
`2.1 Showing how a cluster search can retrieve different documents................ .... 26
`
`2.2 "Contingency" table for computing evaluation measures. .. ....
`
`2.3 Typical precision/recall graph. .. .. ...
`
`3.1
`
`13R /
`
`IPM functional correspondence..
`
`31
`
`32
`
`73
`
`4 .1 Hearsay II high level architecture........ ........................................... .... 80
`
`4.2 Hearsay II control
`
`function.. ..
`
`4.3 High level 13 R design..
`
`4.4 Basic structure of document collection, showing the relationships between
`the
`levels..... .
`
`4.5 Sample Conceptual structure....... ...... .. .. ... ..........
`
`4.6 The user is making a connection between "concurrent processes" and
`"parallel processes."
`
`82
`
`85
`
`87
`
`89
`
`92
`
`4.7 After selecting Entry OK from the Content menu, the phrase "concurrent
`processes" is transferred to the Related Window... . . ... . . . .. . .. .. . . .. . . . .. . . .. .. . .. . 93
`
`4.8 The user has keyed re t ur n in the Text Entry window (which then
`disappears), causing the word "parallel" to appear in the Phrase window,
`and has selected "processes" from the text, which also appears in the Phrase
`window
`
`94
`
`4.9 The user selects Entry OK from the Content menu, which causes the
`phrase to be transferred to the Related window. ...................................... 94
`
`4.10 Representation of the domain knowledge added to the user's model... ........... 95
`
`4.11 The Document level.
`
`4.12 Document neighborhood taken from the CACM collection
`
`4.13 Control Expert States
`
`4. 14 Summary of control expert expectation values based on user stereotypes
`
`4. 15 Organi zation of Interface Manager data
`
`5. 1 Sample Neighborhood Map
`
`~
`
`;
`
`5.2 Sample Context Map
`
`_ 96
`
`97
`
`102
`
`103
`
`t24
`
`132
`
`t 33
`
`IX
`
`011
`
`Facebook Inc. Ex. 1214 Part 1
`
`
`
`5.3 Grid for browsing maps showing different node markings, but without
`labels
`
`5.4 User chooses the References selection
`
`5.5 Neighborhood Map showing the addition of Reference and Journal Issue
`nodes
`
`5.6 Context Map showing configuration if the Reference and the Journal Issue
`Nodes are expanded
`
`5.7 Example term neighborhoods
`
`5.8 Expansion of a document list.
`
`6.1
`
`First portion accomplished of the CE plan
`
`6.2
`
`Initial
`
`state of the interface
`
`6.3 System prompting the user to answer questions that will determine the ap-
`propriate stereotypes
`
`6.4 These choices determine domain knowledge expertise
`
`6.5 These choices determine search orientation
`
`6.6 New portion of CE Plan accomplished
`
`6.7 System asks the user for the kind of input form to initially specify his
`query
`
`6.8 The user has entered his query in a free text form at. .
`
`6.9 The CE's plan after CE operation in cycle 29
`
`6.10 User selecting phrases and important words
`
`6.11 Concepts presented for user evaluation
`
`6.12 Information about analysis of programs
`
`6.13 CE moves from $DNC to $SD using control expert rules 21, 15,5 and 25,
`making the search controller active
`
`6.14 Top five documents of initial search
`
`6.15 The control expert moves the system to state $ER, evaluate results, for
`evaluation of the search results :
`
`6.16 User makes relevance judgements of documents terms and phrases in the
`retrieved documents
`
`6.17 The exception transition back to $SD to enable the search controller.
`
`x
`
`135
`
`137
`
`138
`
`139
`
`142
`
`142
`
`152
`
`152
`
`154
`
`155
`
`156
`
`158
`
`159
`
`160
`
`161
`
`162
`
`164
`
`165
`
`166
`
`167
`
`169
`
`170
`
`171
`
`012
`
`Facebook Inc. Ex. 1214 Part 1
`
`
`
`173
`
`174
`
`177
`
`177
`
`178
`
`179
`
`180
`
`181
`
`182
`
`183
`
`186
`
`188
`
`189
`
`191
`
`192
`
`193
`
`193
`
`195
`
`196
`
`197
`
`6.18 CE moving back to $ER
`
`6.19 The CE moves the system to the $Finish state
`
`6.20 CE moving back to the state $DNC to allow the DKE to search the domain
`knowledge for other concepts
`
`6.21 Message advising the user on the next activity
`
`6.22 CE move system state to $SD to allow the SC to make another search
`
`6.23 Search results windows after the user is done with the second search
`
`6.24 Query elaboration with more choices for the expert user.
`
`6.25 Domain knowledge entry by a domain and system expert..
`
`6.26 Concepts from the user's domain knowledge
`
`6.27 Results of the first
`
`two searches (window menus not shown)
`
`6.28 Initial display on the Neighborhood Map (the doc ument 2889 and Context
`window is not shown)
`-_._----_ ... _--_ .. -..._ ._ ~ - ~- - - -- - - - _.-- ...
`.--. -_..._....
`6.29 User selects a recommend ed node to view its contents, and selects terms
`that arc particularly relevant or interesting
`
`-- - - ---
`
`-
`
`.
`
`.
`
`-" .-.
`
`..
`
`-
`
`6.30 Neighborhood Map with expanded document neighborhood
`
`6.31 User views document 2722 (text is incomplete)
`
`6.32 User selects a term to examine from document 2722
`
`6.33 Display for the concept mul tidimen sional.
`
`6.34 User selects Documents option
`
`6.35 Context Map after examining document #2846
`
`6.36 Context Map showing crowded region around node " A," and user desires to
`expand node "B."
`
`6.37 Use of connector to expand node "B."
`
`Xl
`
`013
`
`Facebook Inc. Ex. 1214 Part 1
`
`
`
`014
`
`Facebook Inc. Ex. 1214 Part 1
`
`
`
`CHAPTER 1
`
`OVERVIEW
`
`1 . 11
`
`Introduction
`
`In this chapter, an overview of this dissertation is presented. We begin by dis-
`
`cussing the problems of traditional information retrieval systems and how they are usually
`
`overcome. These problems form the basis for the requirements of a more sophisticated
`
`system called 13R, an Intelligent Interface for Information Retrieval. A design is then
`
`outlined that will meet the specified requirements. The design has two major aspects: the
`
`first is facilities that should be provided; the second is how these facilities are to be
`_
`__.
`.
`.
`-
`.
`..
`- --.
`-0_-
`supported in ways that allow easy modification.
`
`1 .2 Retrieval Problems
`
`Commercial retrieval systems have been available since the early 1960's. At that
`
`time, they were a significant breakthrough in the use of computers for non-numeric appJi-
`
`cations. They allowed scientists and engineers to sort through the many journals, technical
`
`reports, and other written works to find information that might be useful in helping them
`
`solve their problems. The utility of these systems has been recognized in other professions
`
`such as law and medicine, where major retrieval services are now available.
`
`While developments in storage technology, such as ever increasing densities in disk
`
`storage, and developments in communications technology, such as relatively inexpensive
`
`2400 baud modems, have made these systems more widely available, the interface
`
`technology has remained for the most part stagnant, reflecting the designs of the original
`
`systems. These interfaces were designed to operate with simple input/output devices such
`
`as 110 character/second printing terminals. This significantlly limits the kind of information
`
`1
`
`015
`
`Facebook Inc. Ex. 1214 Part 1
`
`
`
`2
`
`that can be displayed . Furthermore, the operation of the system has a command-line
`
`orientation, which is reflected in the use of specialized languages for query specification.
`
`These languages are based on Boolean logic and are usually augmented with
`
`proximity operators and "don' t care" or wildcard characters. The former specify how close
`
`words must be in sentences or paragraphs. The latter handle alternative spellings and in(cid:173)
`
`flected forms of words. The use of these languages requires specialized training for the
`
`user to teach them the semantics for AND, OR, and NOT. While the basic concepts are
`
`re latively simple, use of these languages is mastered only after a significant amount of
`
`experience. Furthermore, different systems have different query languages and many users
`
`do not have the time or the inclination to learn Boolean logic.
`
`Boolean logic cannot precisely specify many relationships between words. For ex(cid:173)
`
`ample, AND can be used to describe phrases or words that are required; OR may specify
`
`alternative words, synonyms, or components of "higher" level concepts. In addition, AND
`
`and OR in some situations in everyday language can be used synonymously. This lack of
`
`precision or multiple meaning can be overcome by adding other operators to specify re(cid:173)
`
`lationships more exactly or adding weights to the AND and OR to give "soft" Boolean op(cid:173)
`
`erators [Salton 83].
`
`Both solutions, while feasible, simply add to the amount of knowledge that the user
`
`mu st know in order to lise a system effectively. This increases the potential for confusion
`
`and, hen ce, frustration on the part of the end user. The casual user or "permanent novice"
`
`will , in all likelihood, never bother to learn how to use the advanced features of the query
`
`language.
`
`Compounding the problem of using the query language, which is a matter of query
`
`form, is the problem of determining precisely what is the content of the query. This is a
`
`problem of selecting the proper words to express what the user wants. Two potential
`
`problems arise here. The first is that the user may not know exactly what he wants, and the
`
`016
`
`Facebook Inc. Ex. 1214 Part 1
`
`
`
`3
`
`second is that he may not know the precise terminology required to express the need. In
`
`some systems, the user has recourse to an online thesaurus, which is a collection of words
`
`that is structured to show the relationships between them, to find the proper descriptive
`
`terms and to give the structure of the knowledge of a domain.
`
`In others, the best that he
`
`can do is get an alphabetical list of terms occurring in the database.
`
`The problems of query form and content are mani festations of the inflexible nature
`
`of retrieval systems. They have only one way to respond to every type of user and every
`
`type of problem.
`
`To overcome this inflexibility, end-users, the persons with the information need ,
`
`often resort to using the services of a search intermediary.
`
`Intermediaries have received
`
`specialized training in the use of retrieval systems. They often have a degree in librarianship
`
`or have a degree in the field in which they search or by constant use have developed a
`
`knowledge of the terminology of a domain. For example, an intermediary that searches
`
`Chemical Abstracts might have a Ph. D. in chemistry. This background allows them to
`
`concentrate on getting the best possible results from the retrieval system by knowing the
`
`correct terminology.
`
`One of the main advantages of using intermediary services is that the intermediary,
`
`being a person, can be much more flexible than the current commercial systems. The in-
`
`termediary can adapt to the needs of different users. If the session is the end-user's first
`
`experience, the intermediary can help the user understand the search process by explaining
`
`what he is doing as he goes along . The intermediary can adjust his explanations to match
`
`the kind of user that he is dealing with . A college freshman with an orientation to the
`
`humanities would require a different kind of assistance than a medical doctor with some
`
`computing experience. Another advantage of an intermediary is that he can continue to
`
`learn about the domains that people consistently search in and he can learn about the needs
`
`of the people that consistently use their services.
`
`017
`
`Facebook Inc. Ex. 1214 Part 1
`
`
`
`4
`
`While the use of intermediary services removes the burden from the end-user of
`
`having to deal with the query language, and often provides him with terminological assis(cid:173)
`
`tance, it adds a new difficulty, since the user is now often removed from participating di(cid:173)
`
`rectly in the search process. The user must now, as before, try to express his information
`
`need to the intermediary, but, in general, cannot take advantage of the recognition ability
`
`that humans have in the search process. This is due to the fact that often intermediaries will
`
`search without the user present. The preferred situation is to have the end-user present
`
`with the search intermediary while the search is taking place. This, however, often slows
`
`the intermediary down, since he often has to explain his actions to the end user. This
`
`situation is not always possible due to considerations such as scheduling, among others.
`
`Other factors such as the availability of intermediary services also come into play. These
`
`services may not be free; adding further to the cost of using the system. Furthermore, with
`
`the advent of extremely high density storage such as CD-ROM (Compact Disk-Read Only
`
`Memory), end-users may be searching for information in their own home, where search
`
`intermediaries are not available.
`
`1.3
`
`The Intermediary Model
`
`The search intermediary provides a model that can be useful in designing systems
`
`that can help overcome the problems of using IR systems. There are two ways that this
`
`concept can be used. One way is to simulate the activity of an intermediary, that is to at(cid:173)
`
`tempt to provide the same services as the intermediary. This has been the basis of a num(cid:173)
`
`ber of expert systems that provide such services as a common command language to mul(cid:173)
`
`tiple retrieval systems [Marcus 81a, Marcus 81b, Marcus 83] and rudimentary query for(cid:173)
`
`mulation assistance [Yip 79, Pollitt 84]. More sophisticated systems [Brajnik 85, Brajnik
`
`87, Chiaramella 87, Defude 85] that take this approach attempt to implement the strategies
`
`and tactics used by intermediaries for searching [Bates 79a, 79b] and attempt
`
`to
`
`018
`
`Facebook Inc. Ex. 1214 Part 1
`
`
`
`5
`
`incorporate a natural language dialogue with the user. All of the systems that attempt this
`
`kind of simulation have been designed to work with Boolean systems and therefore have
`
`the limitations on retrieval effectiveness [Salton 83] that plague Boolean systems.
`
`. The approach taken in this thesis is to look at the intermediary concept as an intelli-
`
`gent interface system which is composed of the intermediary and the retrieval system.
`
`Analysis can then be made of the kinds of facilities that this system provides or should
`
`provide to assist the user in expressing his need and finding information that will meet it.
`
`The system designer can then determine how best to implement those facilities, taking
`
`advantage of the current research in information retrieval, and not be limited to ineffective,
`
`immature, or inappropriately applied technologies in an effort to exactly simulate the human
`
`intermediary.
`.-..-_ .._._._. -- - -_._ -
`--- - .._. -- -
`1.4 System Analysis and Requirements
`
`-
`
`--
`
`- -
`
`--
`
`- -
`
`~------ - --- - - - '-
`
`In analyzing the combined intermediary/retrieval system, the four basic elements of
`
`a retrieval system are the basis of the analysis. These basic elements are:
`
`1.
`
`2 .
`
`3 .
`
`a representation of the content or meaning of the documents and the queries,
`
`a process, usually called indexing, that maps the content of the document and
`the queries into the content representation,
`
`a decision method, usually called a search strategy, that the system uses to
`determine whether or not a document should be retrieved,
`
`4.
`
`a user interface.
`
`The user interface element in the combined intermediary/retrieval system is COI11-
`
`posed of the services that the search intermediary provides, and the actual method (i.e. how
`
`the query is typed in, how results are displayed, etc.) of interacting with the system. The
`
`essential serv ices that the intermediary provides arc:
`
`1.
`
`2.
`
`explanation of system operation,
`
`term selection assistance,
`
`019
`
`Facebook Inc. Ex. 1214 Part 1
`
`
`
`6
`construction of a model of the information need, which consists of the query
`and the documents that have been retrieved,
`
`execution of the searches,
`
`overall control of the course of a session.
`
`3.
`
`4 .
`
`5.
`
`To adapt to the different kinds of end-users, the intermediary must make some as-
`
`sessment with regard to the end-user's familiarity with the domain, his familiarity with the
`
`search process, and the kind of results that he wants, such as whether he wants a few spe-
`
`cific documents or a comprehensive collection. Essentially, the intermediary forms a model
`
`of the end-user and adapts the session to that model.
`
`While the intermediary aspect of the system addresses most of the issues of in-
`
`flexibility, some of them are rooted in the retrieval portion of the system. In the past, sys-
`
`terns have been limited to a single decision method (retrieval method) for determining what
`
`documents ought to be retrieved. By having different methods for different kinds of
`
`queries the effectiveness of a system can be increased substantially [Croft 85J. A system's
`
`effectiveness can also be increased by providing direct access to the documents by brows-
`
`ing, a heuristically driven incremental search and evaluation technique [Oddy 77J.
`
`Browsing need not be limited to just the examination of documents; it can also be used to
`
`find the appropriate concepts to describe the information need.
`
`The prec eding high level analysis of the elements of the combined intermedi(cid:173)
`
`aryjretrieval system has pointed out the need for the system to support a number of facili-
`
`ties or functions that either provide services similar to that of an intermediary or support
`
`functions that are part of the underlying retrieval system. These functions or services can
`
`be summarized in the following modules:
`
`1.
`
`2.
`
`3.
`
`Explainer - explains system operation to the user,
`
`Domain Knowledge Expert - suggests additional concepts to the user and ac(cid:173)
`quires domain knowledge from the user,
`
`Request Model Builder - maintains information about the current state of the
`session such as relevant conc epts and relevant documents,
`
`020
`
`Facebook Inc. Ex. 1214 Part 1
`
`
`
`4.
`
`5.
`
`6.
`
`7.
`
`7
`Search Controller - chooses search techniques that are appropriate to the
`current state of the session and information need,
`
`User Model Builder - determines what kind of end-user is currently in(cid:173)
`teracting with the system,
`
`Browsing Expert - provides recommendations to the user about information
`to view that is likely to be relevant when the user is browsing, and remember
`the path that the user has taken during browsing,
`
`Control Module - determines the direction of the "dialogue" that system has
`with the user.
`
`The representation for the documents must contain all the information necessary to
`
`support multiple search strategies and browsing. Traditional systems have usually main-
`
`tained simple inverted files that would be inadequate in this casco In addition, thesaurus
`
`information in most systems has not been integrated into the overall retrieval process.
`
`A number of other factors come into play in determining what the requirements of
`
`the retrieval system should be. One important factor of traditional information retrieval
`
`systems that is desirable to maintain is their domain independence. This means that the
`
`system cannot depend on having a significant amount of domain specific knowledge.
`
`However, since domain knowledge is very useful in assisting the user to precisely express
`
`his information need, the system should have the ability to use whatever domain specific
`
`knowledge that is available, and should be able to acquire this knowledge from the user.
`
`1.5 Architecture
`
`In order to build a system that provides the kinds of facilities that the combined
`
`search intermediaryjretrieval system does, it must have an architecture that allows it to be
`
`flexible. This flexibility is manifested in a number of different ways. First, the system
`
`must adapt itself to different kinds of users and different kinds of information needs; this is
`
`external flexibility. Second, it must be flexible enough so that it can incorporate new tech-
`
`niques as they are developed; this is internal flexibility.
`
`021
`
`Facebook Inc. Ex. 1214 Part 1
`
`
`
`8
`
`The first kind of flexibility requires that the system changes the way it interacts with
`
`the user as does the intermediary. For a novice user, it should offer more explanation and
`
`assistance, and it should limit his choices so that he does not get in to a situation that he
`
`cannot handle; for an expert user it should not interfere with his use of the system, and
`
`should provide him access to all of the system's functionality. Another aspect of this flexi(cid:173)
`
`bility is that different kinds of information needs require different kinds of searches. The
`
`system must be able to respond appropriately.
`
`The second kind of flexibility requires an architecture that is modular in nature.
`
`This modularity should be at two levels.
`
`It should be able to support the addition of new
`
`large pieces of functionality. This would allow it to take advantage of new developments in
`
`information retrieval research. Each large scale function should also be modular, so that it
`
`can be adjusted to operate more effectively as the pattern of system usage is established. It
`
`also allows for the integration of new developments. For example, if a new search tech(cid:173)
`
`nique is developed that is particularly good at retrieving relevant information for one kind
`
`of information need, it can be incorporated into the search function of the system.
`
`The architecture that best supports the requirements of an intelligent IR interface is a
`
`modified blackboard architecture [Erman 80, Nii 86a Nii, 86b]. A blackboard architecture,
`
`of which Hearsay II is a typical example, consists of a number of independently operating
`
`modules, called knowledge sources, that work together to solve a problem. Each works on
`
`a particular aspect of a problem. The results of their work is posted on a shared data struc(cid:173)
`
`ture called a blackboard. This blackboard is typically organized as a series of levels that
`
`represent abstraction levels of the problem. The operation of the knowledge sources is
`
`coordinated by a scheduler.
`
`The basic operation of a blackboard system is as follows. First, each expert exam(cid:173)
`
`ines the state of the blackboard in its area of interest. It then decides if it has any action that
`
`it would like to perform based on the current conditions.
`
`If it does, it places an action
`
`022
`
`Facebook Inc. Ex. 1214 Part 1
`
`
`
`9
`
`(called an instantiation) on the system agenda. The agenda is examined by the scheduler
`
`and is sorted in order of importance based on criteria that are problem dependent. The
`
`scheduler then takes the most important action and runs it. The cycle then begins again.
`
`The blackboard architecture is appropriate since supports the easy addition of large
`
`scale functions by means of knowledge sources.
`
`In addition, the way that knowledge
`
`sources are to be implemented is not specified, so they can be implemented in the way that
`
`is most appropriate for their specific task. The knowledge sources in I3R are called experts
`
`since they are implemented as individual rule based systems. This provides a means of in-
`
`crementally developing the experts. These experts correspond to the functions that were
`
`derived in the system analysis.
`
`The basic blackboard architecture must be adapted to fit the nature of the in-
`
`formation retrieval problem. The first adaptation is to the structure of the blackboard; it is
`
`not structured into abstraction levels, since there is no single overall hierarchical rep-
`
`resentation that can be applied to IR.
`
`Instead, the blackboard, called the short term
`
`memory, consists of different models built by the experts in the course of the session .
`
`The purpose of the control function in I3R also differs from that of the scheduler in
`
`a typical blackboard system. In a typical system, the scheduler manages the system's re(cid:173)
`
`sources to corne to the solution of the problem in the shortest time possible.
`
`In I3R the
`
`control expert manages the dialogue the system has with the user, so that it is consistent
`
`and coherent. This difference stems from the fact that information retrieval is better likened
`
`to a process than to a problem to be solved. The control expert makes sure that the process
`
`is conducted correctly.
`
`The control function uses information provided by the user model builder and the
`
`request model to determine the course of a session. The information for the user model
`
`builder is based 011 the stereotypes that the UMB decides apply to the particular user for the
`
`particular session. Stereotypes are mod els of different kinds of typical users.
`
`In the
`
`023
`
`Facebook Inc. Ex. 1214 Part 1
`
`
`
`10
`
`current system three general categories are used, with two values for each category. The
`
`categories are domain expertise, search system expertise, and search type. The values are
`
`novice and expert for the first two categories, and selective or exhaustive for the third cat(cid:173)
`
`egory.
`
`The documents, concepts, and user histories are kept in a long term memory. The
`
`user histories store information about the user obtained from previous sessions with the
`
`system. This includes the original query, concepts that were judged relevant, documents
`
`that were judged relevant, and the stereotypes that were in effect at the end of the session.
`
`Also included in the user histories is a model of the whatever domain knowledge that the
`
`user has contributed in the course of his interaction with the system.
`
`The system also maintains a store of global domain knowledge that is derived from
`
`available sources such as thesauri, and domain experts that use the system. This store is
`
`organized as semantic net [Quillian 68] with the concepts being the nodes and the links be(cid:173)
`
`ing the relationships. Stored with the concepts nodes is their frequency of occurrence in
`
`the document collection. Included with the normal conceptual relationships is a statistical
`
`nearest neighbor relationship that reflects the occurrence of concepts together in documents
`
`of the collection.
`
`The documents are represented by lists of concepts that occur in them (authors are
`
`also consid ered concepts) and their frequency in the document. The lists are determined
`
`using a standard automati c indexing technique IPorter 80]. Additionally, citation informa(cid:173)
`
`tion is retained along with the docum ent nearest neighbors, which is a link based on the
`
`similarity of the representations of two documents. Other information such as the date and
`
`journal is included. The combination of the user domain knowledge model, global domain
`
`knowledge model, and document database forms the concept/document knowledge base.
`
`The concept/document knowledge base supports all of the traditional search tech(cid:173)
`
`niques, as well as providing a structure that the user can browse. Browsing is considered
`
`024
`
`Facebook Inc. Ex. 1214 Part 1
`
`
`
`11
`
`an important alterna