`
`example of how a portion of the conceptual knowledge might be structured. The node
`
`labeled "document collection" indicates that the attached single word concepts are
`
`components of the document representatives.
`
`89
`
`bt/nt
`
`phrase
`
`phrase
`
`phrase
`
`j3
`phrase
`~---OC;_- - ~-<§,~--~-~ --- -----
`
`phrase
`
`phrase
`
`~
`
`j
`
`...·······synonym
`
`~--o.....
`
`Document
`-~ Collection
`
`Figure 4.5: Sample Conceptual structure.
`
`The concept level, during a session, is a combination of three knowledge bases, the
`
`global domain knowledge, the user's domain knowledge, and the text database. The global
`
`domain knowledge is derived from published thesauri or other classification information .
`
`001
`
`Facebook Inc. Ex. 1214 Part 3
`
`
`
`90
`
`In the case of our test data, for example, it was derived from the old and new Computing
`
`Reviews classifications. The user's domain knowledge is gathered from him as he in(cid:173)
`
`teracts with the system. When a user is developing his query initially or is examining re(cid:173)
`
`trieved text, he may make connections between words. For example, a user examining a
`
`document may indicate that the words "concurrent" and "processes" form the concept
`
`"concurrent processing." This phrase is entered in to his domain knowledge model by the
`
`Domain Knowledge Expert. He may then think of the: concept "parallel processes," and
`
`decide that it is not close enough to be a synonym, but is certainly related. Figures 4.6
`
`through 4.9 show the user selecting these phrases and connecting them with the "related"
`
`link.
`
`The information from the text database simply gives the mapping from terms to the
`
`documents in which they occur. These links are represented by the dotted links in figure
`
`4.5.
`
`4.2.5.1.2
`
`Implementation of the Concept level
`
`The semantic net representation of the concept level is implemented by means of a
`
`bash table of record structures. Hash tables are a convenient structure in Common Lisp.
`
`They are expanded in size automatically when they reach a specified percentage of their to(cid:173)
`
`tal capacity, the equality test used to determine if a hit has been made can be user specified,
`
`and there is a special function, maphash, for iterating through all the values in a table.
`
`The definition of record structure used for storing each entry is:
`
`(Defstruct
`
`(Concept
`(: Include Commo n -Con t e nt -Info )
`;; the a b ov e is u sed for DK e n t r y
`( : Conc -N ame nil)
`(:Predicate Concept ?»
`
`002
`
`Facebook Inc. Ex. 1214 Part 3
`
`
`
`91
`
`Id
`
`St ern
`Tex t
`Synonym
`Relate d
`Broade r
`Narrowe r
`Phras e
`
`a t erm n u mber o r if a phr a s e a list
`; either
`;of t erm n umbers
`;either a stern or a l i st o f sterns
`; fu ll t ext of t he con cept
`;l i s t of s yn o n yms
`;li s t o f related words
`;list of broader te r ms
`; l ist of n a rrower terms
`t ha t make up p hrase or
`;te x t of si n g le wo rds
`in)
`;lis t o f ph r a s e s
`that t h i s word i s
`
`Each concept is stored using its full text as the key to the hash table. So, when the domain
`
`knowledge expert wants to find all of the phrases that a single word concept is a part of, it
`
`executes the function Get - DK which is defined as follows:
`(De fun Ge t-DK (Connec tion Wo rd Table )
`(Apply Connection (Getha sh Word Ha sh-Tab l e )) --)-
`
`- - -- --
`
`-
`
`-
`
`- -
`
`where Connection is the particular link, Word is the concept of interest and Ta bl e is
`
`either the user's domain knowledge or the global domain knowledge. The id field pro(cid:173)
`
`vides the connection into the document representations, which are stored in YMS/RMS
`
`files.
`
`003
`
`Facebook Inc. Ex. 1214 Part 3
`
`
`
`92
`
`e at Entry
`Suspend
`Help
`
`amport, L.
`
`Brooder
`Narrower
`Rela t ed
`Synonym
`Phrase
`Entry Ok:
`Cancel
`he problem of sharing data among asynchronous
`Done
`rncessess is considered. It is assumed that only
`Releuant
`Help
`ne process at a time can modify the data, but
`L..-.-----+oncurrent reading and writing is permitted.
`uro general theorems are proued, and some
`
`Figure 4.6: The user is making a connection between "concurrent
`processes" and " parallel processes." The first step is to select Related
`from the Content menu. This causes the related window to appear. The
`user then selects Phrase from the Content menu, causing the Phrase
`window to appear above the document. Selecting the words "concurrent"
`and "processes" using the mouse causes them to appear in the Phrase
`window.
`
`004
`
`Facebook Inc. Ex. 1214 Part 3
`
`
`
`93
`
`.
`
`.
`
`. __.__
`
`eut Entry
`Suspend
`Help
`
`arnport, l.
`
`Brooder
`Narrower
`Related
`Synonym
`Phrose
`En_~!JJ_.!L~ _
`Cancel
`he problem of sharing data among asynchronous
`Done
`Relevont
`rocessess is considered. I t is assumed that only
`Help
`ne process at a time can modify the data, but
`L-.------t'oncurrent reading and writing is permitted.
`wo general theorems are pruned, and some
`
`Figure 4.7: After selecting Entry OK from the Content menu,
`the phrase "concurrent processes" is transferred to the Related Window.
`The user then selects Phrase again from the Content menu and then
`Text Entry from the Window menu to allow him to enter the word
`"parallel."
`
`005
`
`Facebook Inc. Ex. 1214 Part 3
`
`
`
`94
`
`eut Entry
`Suspend
`Help
`
`lamport, l.
`
`2912
`
`roo er
`Narrower
`Related
`Synonym
`Phrase
`Entry Ole
`Cancel
`The problem of sharing data among asynchronous
`Done
`processess is considered!. I t is assumed that only
`IRe l eu o n t
`one process at a time can modify the data, but
`Help
`' - - - - - - - -1 concurrent reading and writing is permitted.
`Two general theorems are proued , and some
`
`..
`
`•
`
`I
`
`concurrent processes,
`
`Figure 4.8: The user has keyed return in the Text Entry window
`(which then disappears), causing the word "parallel" to appear in the Phrase
`window, and has selected "processes" from the text, which also appears in
`the Phrase window.
`
`~
`
`~ .
`
`,
`
`':,' 0 ~ .,Win 'd.0 IV ,
`', . ~
`e at Entry
`Suspend
`Help
`
`0 " , : "
`
`jo>" ,: _.-" _;
`·r .'
`
`'.-. .: »
`' .'
`
`Lamport, l.
`
`, ; Do'dlm (;11,• .0
`Con t.e nf o'- , :
`: : ': , co,
`"
`"
`."
`"'.
`,
`.. t ,- ,
`_ '.
`.
`' . .
`.
`~ :
`Broader
`Concurrent Reading and Writing
`Narrower
`Related
`Synonym
`Phrase
`Entry Ole
`2912
`Cancel
`Done
`The problem of sharing data among asynchronous
`Releuant
`processess is considered. It is assumed that only
`Help
`one process at 0 time ca n modify the data, but
`'-------'concurrent reading and writing is permitted.
`Two 0 eneral theorems are proued, and some
`
`concurrent processes, parallel processes
`
`Figure 4.9: The user selects Entry OK from the Content menu,
`which causes the phrase to be transferred to the Related window. When the
`user selects Entry OK again, the Related window disappears and the domain
`knowledge is entered into the user's domain knowledge model.
`
`006
`
`Facebook Inc. Ex. 1214 Part 3
`
`
`
`95
`
`9mre~
`processes <J-- related --t> rocesses
`parallel
`
`/
`~~j~= P~'V
`
`GOC£
`
`Figure 4.10: Representation of the domain knowledge added to the
`user's model
`
`4.~~.5.1.3 Document Level
`
`The next level is the document level.
`
`In this system, documents are the fun-
`
`darnental units of information. It is in a document that the user will find the desired in-
`
`__ _ ______ .- -
`
`fa nnation. _Each documenLis..rep.resented-h)'-<LCollection....oLc.oncepls.ibaLindic.ale_its
`
`_
`
`content. These are generally derived from the title, abstract, and whatever keywords are
`
`supplied by the author. Because of the automatic indexing method [Porter 80J used to de-
`
`- rive the document representations, they consist only of single word concepts (terms). The
`
`document level is, like the concept level, highly interconnected. Figure 4.11 shows the
`
`kinds of links that can be found. A document nearest neighbor link, which is similar to the
`
`concept nearest neighbor link, is used. It is based on the overlap of the terms contained in
`
`representations of two documents. Only the most highly related documents are linked by
`
`the nearest neighbor link. The algorithm used for generating these links was described in
`
`chapter two.
`
`007
`
`Facebook Inc. Ex. 1214 Part 3
`
`
`
`Many
`Docs
`
`96
`
`~MC
`
`C
`
`NN
`
`Figure 4.11: The Document level. The markings on the links are as
`follows: BC = Bibliographic Coupl ing, CC = Cocitation, NN = Nearest
`Neighbor, C = Citation.
`
`Each document also contains citation links, which connect it to other documents;
`
`these are important since they represent author judgements of what other documents are
`
`related to the contents of their documents. They can be used directly to facilitate finding
`
`other documents, or they can be used to generate two other kinds of document-document
`
`similarity links called bibliographic coupling (BC) links and co-citation (CC) links [Salton
`
`83, Mansur 80]. The first is based on the overlap of two documents' reference lists, and
`
`the second is based on the number of times that two documents appear together in reference
`
`lists of other documents. Due to some deficiencies in the test collection that is currently
`used by r3R, the BC and CC links are not generated. Specifically, the citation information
`
`available for eac h document only references other documents in the collection.
`
`Consequently, accurate determination of bibliographic coupling links could not be made,
`
`since much many references were missing.
`
`Even without bibliographic coupling and cocitation links, the document level even
`
`in the collection used is still highly interconn ected. Figure 4.12 shows a region of the test
`
`collection, containing documents that deal with tree data structures.
`
`In a few cases, the
`
`008
`
`Facebook Inc. Ex. 1214 Part 3
`
`
`
`nearest neighbor links link the same documents as the citation links do. In others, they link:
`
`documents whose connection by citations is more circuitous, or is not made at all.
`
`97
`
`2278
`
`2455
`
`2138
`
`3096
`
`3065
`
`1429
`
`2388 K"'---\-~
`
`2968
`
`2722
`
`1116
`
`NN
`~ - - from & to
`
`-t
`
`Cit.ations
`-----------1[>
`
`Figure 4.12: Document neighborhood taken from the CACM col(cid:173)
`lection. This neighborhood will be used in the fourth scenario in chapter
`SIX.
`
`4.2.5.1.4
`
`Implementation of
`
`the Document
`
`level
`
`The document level is implemented using VAXfRMS file structures. There are 9
`
`files which are:
`
`1.
`
`2.
`
`3.
`
`4.
`
`5.
`
`DOC_TERM - holds the basic document representations,
`
`DOC_DOC_CITATION - holds the citation links,
`
`DOC_DOC - holds nearest neighbor links,
`
`DOC_TEXT- holds the full text of the document as well as the dale,
`
`DOC_AUTHOR - maps documents to authors,
`
`009
`
`Facebook Inc. Ex. 1214 Part 3
`
`
`
`6.
`
`7.
`
`8.
`
`9.
`
`TERM_DOC - hold the inverted file of #1,
`
`TERM_TERM - holds term nearest neighbors,
`
`98
`
`TERM_PHRASE - maps the text of terms to the term numbers,
`
`COLLECtION_INFO - holds information about the size of the collection and
`the most frequently occurring term.
`
`These files are accessed directly by the search program. The rest of the system uses access
`
`programs that are written in C that can be called form Lisp. Originally, these files were
`
`stored as relations in a relational database, VAXfRDB. This database system, however,
`
`was very inefficient, causing the searches to take about five minutes. Searches in the
`
`current system take about 50 seconds.
`
`4.2.5.1.5
`
`Journal
`
`Issue Level and Abovle
`
`The journal issue level is based on the observation that many journals will have
`
`from time to time whole issues devoted to a specific topic. These issues might be the pro-
`
`ceedings from a particular conference, or tutorial articles on a specific area. Journal issues
`
`may also have sections which are topic specific. The primary motivation for including this
`
`level is to support browsing. A typical browsing heuristic is: "If this document is interest-
`
`ing, then are there any more documents in this issue that are interesting."
`
`There are other possible connections above the journal issue level. The most ob-
`
`vious in a collection that has more than one source of information is connecting the journal
`
`issues together into journals. The journals can then be categorized by higher level classifi-
`
`cation structures like the Library of Congress system.
`
`In the current system, the journal issues are not represented by a separate file
`
`structure. This information is available from the DOC__TEXT file which has fields repre-
`
`senting the month and year that the document appeared.
`
`010
`
`Facebook Inc. Ex. 1214 Part 3
`
`
`
`4.2.5.1.6
`
`The Test Collection
`
`99
`
`The test collection used to test the implementation of I3R consists of 3204 doc(cid:173)
`
`uments taken from the Communications of the ACM, from the years 1960 to 1979.
`
`4.2.5.2
`
`User Histories
`
`The other part of the long term memory is the user histories, which consists of two
`
`paris, the user's domain knowledge and the records of the user's previous searches. The
`
`user's domain knowledge is organized the same way as the global domain knowledge. The
`
`user record has the following structure.
`
`{Defstruet {User-Re c o rd
`(: Cone -Name UR-)
`(:Predie a t e UR?»
`(User-Name n il)
`(S ession- Re c ords nil)
`(Domain-Knowledge»
`
`This
`
`record is
`
`stored in the VMS file system In a file with the name
`
`<username >. model. The user name corresponds to what the user's account name is on
`
`the particular machine.
`
`The session records contain information that was in the short term memory when
`
`the session was suspended or ended. Briefly, a session record consists of:
`
`1.
`
`2.
`
`3.
`
`The stereotypes that were in effect when the session was closed. (These will
`bediscussed in the next section on experts under the user model builder.)
`
`The request model consisting of the concepts that were judged relevant, any
`relevant phrases, and the documents that have been judged relevant.
`
`If the session was suspended, the browse map if there was one, and the state
`-history.
`
`A session record has the following structure.
`
`{De f s t ru e t
`
`(Se ssi on- Re c o r d
`( : Cone - Name SR- )
`
`011
`
`Facebook Inc. Ex. 1214 Part 3
`
`
`
`100
`
`(:Predicate SR?))
`(Date nil)
`(Stereotypes nil)
`(Initial-Need nil)
`(Document-Evaluations nil)
`(Term-Weights nil)
`(Browsing-Path nil)
`(Session-Complete»
`
`4.2.6
`
`Experts and Short Term Memory Structure
`
`In the current implementation of 13R, six of the eight specified experts were im(cid:173)
`
`plemented. The natural language expert (NLE) was not implemented since the needed
`
`techniques are still under development [Croft 87]. The explainer (EXP) was not imple(cid:173)
`
`mented since it represented a significant piece of work in itself and it was felt that the utility
`of the 13R could be demonstrated without it. The browsing expert will be discussed in
`
`chapter five.
`
`4.2.6.1
`
`Control Expert
`
`(Scheduler)
`
`4.2.6.1.1
`
`Purpose
`
`One of the major differences between the traditional blackboard system and 13R is
`
`the purpose of the control expert or scheduler. In most blackboard systems, the purpose of
`
`control is to constrain the system's activity to processing those KS instantiations that will
`
`contribute the most to solution of the problem.
`
`In other words, the scheduler's purpose is
`
`to conserve the resource of time.
`
`In information retrieval, the purpose of control is to constrain the course of a
`
`session, so that it appears logical and consistent to the user, rather than being chaotic.
`
`In
`
`this sense, the control function could be considered a dialogue manager. What the control
`
`012
`
`Facebook Inc. Ex. 1214 Part 3
`
`
`
`101
`
`function determines is the relative importance of the activity of the system experts during a
`
`given part of the session.
`
`The control function must be organized to provide a flexible dialogue with the user.
`
`The work by Belkin [83], Brooks [83], and Daniels [85J has shown that, while the same
`
`basic steps are accomplished in every session, the order may vary considerably. Con-
`
`sequently, the control function has to be organized so that it can handle this variability.
`
`Their analysis goes fairly deep, examining the changes in focus down to the utterance level.
`
`In I3R these lower level utterances are determined by the individual experts. For example,
`
`the domain knowledge expert would decide what concepts should be presented to the user
`
`for approval, the control expert would determine when the domain knowledge expert
`
`should be engaged in this activity.
`
`Because the actual requests for and presentations of information, which correspond
`
`to the utterances of a human intermediary, are determined by the different experts, the
`
`control expert does not have to engage in a sophisticated process to determine the course of
`
`a session. Therefore, the control expert can be implemented as a state/transition network
`
`with relatively few states that encodes the general plan or sequence of activities for the
`
`information retrieval process.
`
`In this network, there are two types of states, intermediate
`
`and leaf states. The leaf states are where priority orderings for the experts are determined.
`
`The transitions are also of two types, normal and exception. Normal transitions encode the
`
`standard path through the states; exception transitions are taken when the session is not
`
`proceeding as expected. Figure 4.13 shows the structure of the network. Selection of
`
`what transition to take is based on parameters derived from the stereotypes determined by
`
`the UMB and by completion of certain functions; for example, the user has entered his
`
`query.
`
`Another interpretation of these states are that they are goals to be met. When all the
`
`goals are met then the session is complete. Associated with each state, then, are the criteria
`
`013
`
`Facebook Inc. Ex. 1214 Part 3
`
`
`
`to determine when that goal is satisfied. At present there are two criteria, the number of
`
`relevant documents expected to be found and the number of searches expected to find them.
`
`102
`
`----I.ll"i~
`
`RMB
`
`DKE
`BE
`RMB
`
`SC 111'
`'"
`
`....
`... ",
`'.'
`
`.... BE
`RMB
`DKE
`
`Figure 4.13: Control Expert States. Shaded states are leaf states
`where the relative priorities of the experts are established.
`
`4 .2.6.1.2 Conceptual Operation
`
`The top level goal of Conduct Retrieval Session (CRS) simply represents the goal
`
`of the system, and it is met by the three subgoals of Characterize User (CU), Characterize
`
`Information Need (CrN), and Search for Relevant Documents (SRD) being satisfied, or the
`
`user indicating that he wants to quit or suspend the session.
`
`The purpose of the first subgoal, Characterize User, is to allow the UMB to engage
`
`the user in a dialogue to determine what stereotypes apply to the user for the session. This
`
`allows the search and document expectations to be posted in the STM . Posting of these
`
`signals the satisfaction of the goal. Figure 4.14 shows the values that the control expert
`
`uses based on the user's stereotypes. These values are based on the following assump-
`
`tions.
`
`014
`
`Facebook Inc. Ex. 1214 Part 3
`
`
`
`1.
`
`2.
`
`103
`A domain expert will be able to specify the information that he is looking for
`precisely.
`
`A domain novice will not be able to specify the information he is looking for
`with the same precision as an expert.
`
`Because of the inability of the domain novice, it will require more searching to find relevant
`
`information. Furthermore, the domain novice may not recognize relevant documents. The
`
`values chosen reflect the abilities of these kinds of users.
`
`Search
`_Emphasis
`
`Exhaustive
`(recall
`oriented)
`
`Domain Knowledge Expertise
`
`Novice
`
`D = 15
`
`S =4
`
`Expert
`
`D=20
`
`S=2
`
`Selective
`tprecision- -
`oriented)
`
`-- --
`
`D=5
`------------ -
`S=2
`
`1--
`
`-
`
`-- --
`
`0=5
`- -
`S = 1
`
`-
`-
`
`-
`
`-
`
`--- - ---
`
`-
`
`--- _
`
`. _ -
`
`-
`
`-
`
`Figure 4.14: Summary of control expert expectation values based
`on user stereotypes. D is the number of relevant documents expected, and S
`is the maximum number of searches needed to find the relevant documents
`
`The next goal is Characterize Information Need (CIN) which is divided into two
`
`subgoals of Get Information Need (GIN) and Develop Need Context (DNC). The first
`
`goal (state) only allows the RMB to operate, during which the user enters his query in one
`
`of the query entry forms supplied by the RMB (see section 4.2.6.3). Completion of the
`
`state is marked by the internal form of the initial need being posted. The ONC state is
`
`characterized by an interaction between the domain knowledge expert (OKE) and the user
`
`where the DKE suggests, for user approval, additional concepts to be added to the devel -
`
`oping information need. Any terms approved will be added to the request model by the
`
`RMR.
`
`The browsing expert (BE) may also be active during thi s state, ONe. Whether or
`
`not it is depends on the user model and the history of the session. If control expert returns
`
`015
`
`Facebook Inc. Ex. 1214 Part 3
`
`
`
`104
`
`to this state as a result of failure to retrieve the expected number of relevant documents in
`
`the number of searches allowed or if two searches in a row fail to retrieve any relevant
`
`documents, then browsing will be enabled.
`
`It will also be enabled if the user is categorized
`
`as a system expert. The state is completed when either the DKE has no more terms to
`
`suggest, or the user quits browsing.
`
`Once the states of GIN and DNC are finished, then so is CIN, and the system
`
`passes to Search for Relevant Documents (SRD).
`
`If the search and relevant document
`
`expectations have not been met the control expert passes to the Search for Documents (SD)
`
`state where the search controller selects an appropriate strategy. When finished the SC
`
`posts the results in the STM and passes them to the interface manager (1M) for evaluation
`
`by the user.
`
`The control expert (CE) then goes to the Evaluate Results (ER) state. Here the user
`
`will determine what documents and concepts are relevant. This information is of interest to
`
`the RMB, DKE, and Sc. The user is also able to browse at this point depending on the
`
`context of the situation. If the user is browsing the RMB and the DKE will record judge(cid:173)
`
`ments made during the process. The ER state is finished when either the user exits
`
`browsing after having evaluated at least one document as relevant or after having evaluated
`
`the search results.
`
`After the results of the search are evaluated and the user has not found the expected
`
`number of documents the exception transition back to SD is taken, so that the SC can use
`
`the revised request model for a new search. If the expected number of searches has been
`
`taken and the expected number of documents has not been found, the CE will take the ex(cid:173)
`
`ception transition from SRD to CIN and then take the normal transition from CIN to DNC.
`
`This transition embodies the idea that the information need is still not defined sufficiently
`
`and needs further refinement by the DKE. If no new concepts are added the system will
`
`016
`
`Facebook Inc. Ex. 1214 Part 3
`
`
`
`suggest that the user browse if he has not already done so. If concepts are added, the CE
`
`will return to the SD to make use of the revised need model.
`
`105
`
`4.2.6.1.3
`
`Implementation
`
`The Control expert is implemented using Lisp symbols to represent the states (or
`
`goals) and rules to represent the transitions. State related information is kept on the prop(cid:173)
`
`erty list of a state. This information includes the expert priority list for those states where
`
`the experts are operable. It also includes the substates that must be completed for the state
`
`to be completed; for example, the state CRS has three substates, GIN, ONC, and SRD that
`
`must be completed for it to be completed.
`
`Since the transitions are implemented by rules, it is easy to add new transitions to
`
`the control expert. For example, say that the user model builder is expanded so that it also
`
`monitors how the user interacts with the system. At some point, the user model builder
`
`may wish to ask the user about how he works with the system, since the user's activity
`
`differs from that expected by the system based on the original model. A transition can be
`
`added that will take the system state back to the Characterize User state, so the user model
`
`builder can pose questions to make new determinations. Addition of this transition only
`
`requires a new rule that would respond to the conditions of the system and expectations set
`
`up by the UMB.
`
`4.2.6.2
`
`User Model Builder
`
`The purpose of the user model builder is to determine where the user fits into the
`
`kinds of users that the system expects to find . Recognition of these stereotypical users is
`
`embedded in the rules that make up the UMB . The second purpose is to perform house(cid:173)
`
`keeping functions for orher experts by maintaining the user models kept in the long term
`
`memory.
`
`017
`
`Facebook Inc. Ex. 1214 Part 3
`
`
`
`106
`
`In ,3R there are three kinds of information about the user that arc important to
`
`maintain. The first is the user's domain knowledge. Since this information is primarily
`
`used by and manipulated by the DKE, the UMB simply performs the housekeeping func(cid:173)
`
`tions of storing it at the end of a session and retrieving it at the beginning. The second kind
`
`of information is the characterization of the user along three lines, experience with comput(cid:173)
`
`ers, experience with the domain of the search topic, and interest in exhaustive or precise
`
`search. The third is the user's history, which is a record of the search sessions that the
`
`user has with the system. This information is used primarily by the request model builder,
`
`so here too the UMB performs housekeeping.
`
`The primary reason for the inclusion of the user model builder is to demonstrate
`
`how a user model can be used to modify the behavior of the system. This is different from
`
`the use that user models were put to in GRUNDY, where the purpose was to assess the
`
`user and find books that would match or fit with that assessment. To this end, what the
`
`model should contain is determined by what behaviors are important to modify. Adapt(cid:173)
`
`ability based on user models is another significant contribution of this thesis to the design
`
`and implementation of information retrieval systems.
`
`The behaviors in I3R that are important to modify are how the search controller re(cid:173)
`
`sponds to the information need, and how the interface changes with the respect to the abil(cid:173)
`
`ity and experience of the user. The second aspect is made up of a number of the kinds of
`
`choices that the user has for domain knowledge entry, the amount of information to be dis(cid:173)
`
`played on the two browsing maps, and how quickly the search controller will initiate a
`
`search while the user is browsing.
`
`The UMB determines what stereotypes apply to the user by questioning him di(cid:173)
`
`rectly. It asks directly whether the user is interested in an exhaustive or a specific search to
`
`determine the interest in recall or precision. It poses a number of choices to determine the
`
`user's domain and system experience. These choices are for domain experience:
`
`018
`
`Facebook Inc. Ex. 1214 Part 3
`
`
`
`107
`
`1.
`
`2.
`
`3.
`
`4.
`
`5 .
`
`6.
`
`7.
`
`Know very little.
`
`Have read a few news magazine articles on the subject.
`
`Have read a few science magazine articles on the subject.
`
`Have read a textbook on the subject.
`
`Have read a few journal articles on the subject.
`
`Have written journal articles on the subject.
`
`Have written a textbook on the subject.
`
`And for system experience are:
`
`1.
`
`2.
`
`3.
`
`Seldom use a computer.
`
`Use a word processor.
`
`Own a personal computer.
`
`.. - --. -
`
`- - ..-....---.- - ---4..~- - -Ha\le_ne¥_er_user- anjnformationJetrie Ya1 .s)'stem-before~_
`
`5.
`
`6.
`
`Have used an information retrieval service before.
`
`Frequently use an information retrieval service.
`
`To these the user can answer yes or no; the default answer is no. If the user answer yes to
`
`questions 5, 6, or 7 of the first group, he is considered a domain expert. If he answers yes
`
`to 3, 5, or 6 of the second group, he is considered a system expert. From these determina(cid:173)
`
`tions, the UMB posts on the STM on the *User-Model * place, its evaluations in the
`
`form of a list. For example,
`
`((Domain Novice)
`
`(System Expert)
`
`(Search
`
`Recall) ).
`
`4.2.6.3
`
`Request Model Builder
`
`The primary purpose of the Request Model Builder (RMB) is to keep track of the
`
`information provided by the user that pertains to the developing query. This includes the
`
`initial definition of the query, the single word concepts (also called terms) and multi-word
`
`concepts derived (called phrases) that the user considers important from the initial query,
`
`from concepts presented by the domain knowledge expert, from documents presented by
`
`019
`
`Facebook Inc. Ex. 1214 Part 3
`
`
`
`the search controller, and from browsing. Each term is stored in a hash table of records
`
`that is keyed by the term's number and its stem. The following is the record structure kept
`
`in the table.
`
`108
`
`(Defstruct
`(Term-Representative
`( : Cone-Name TR-)
`(:Predicate TR?»
`(Full-Text nil : Type String)
`: Type String)
`(Stem
`nil
`(TermNo
`0 : Type Integer)
`(Cfreq
`0 : Type Integer)
`0 : Type Integer)
`(Qfreq
`0 : Type Integer)
`(RelFreq
`(User-Judgement nil)
`(Source nil)
`(Concept nil)
`(DK-Checked nil)
`(B-Recommended) )
`
`; Collection Freq
`; Query Freq
`; Relevant Freq
`
`The RMB also performs stemming of any input text (see the example in chapter
`
`two), as well as providing different ways of initially specifying the information need.
`
`These specifications are:
`
`1.
`
`2.
`
`3.
`
`4.
`
`free text,
`
`a known document,
`
`terms occurring on the same line on the input form
`a simple Boolean query -
`are implicitly ORed together, the separate lines are implicitly ANDed, and
`NOT is not allowed,
`
`a complex Boolean query -
`operators and parentheses.
`
`an arbitrarily complex query using all three
`
`Choices 3 and 4 are provided because many users are used to this form of specification or
`
`they may have previously developed queries on another system that they wish to transport.
`
`Boolean-queries in either form would be processed in-the-following way [Croft 86a].
`
`020
`
`Facebook Inc. Ex. 1214 Part 3
`
`
`
`1.
`
`2.
`
`3.
`
`The component words are stemmed.
`
`109
`
`The query is transformed into a tree which is then used to generate candidate
`phrases for user approval. For example, the query, (parallel OR distributed)
`AND (processes OR programs) yields the phrases: parallel processes, parallel
`programs, distributed processes, and distributed programs.
`
`The candidate phrases are presented to the user for evaluation. This step is
`required because the combinatorial nature of step 2 can produce phrases that
`do not make sense.
`
`4.
`
`Phrases approved by the user are added to the request model.
`
`The RMB also maintains a list of documents, on the STM place * Doc-Eva 1 s * that have
`
`been seen by the user. Each document is represented by a record:
`(Defstruct
`(Document-Representative
`( : Cone-Name DR-)
`( : Predicate DR?)
`(ID a :Type Integer)
`···_- -(tJs-e---r=--J ud ge me-rrc-n-t t 1
`(Nearest-Neighbors nil)
`(Terms nil)
`(B-Recommend nil))
`
`Documents that have been judged relevant have their component terms retrieved and put in
`
`the appropriate field; documents that have been retrieved and are not judged relevant do not
`
`have their terms retrieved.
`
`4.2.6.4
`
`Domain Knowledge Expert
`
`The DKE is one of the major functions in I3R devoted to query refinement. Its re-
`
`sponsibilities are twofold. The first is to build a model of the user's domain knowledge.
`
`The second is to search the available domain knowledge models for concepts that are re-
`
`lated to those supplied by the user while describing his information need.
`
`An underlying assumption in the operation of the DKE is that the user is the final
`
`authority on what is and is not relevant to his need. Therefore, the DKE acts as an advisor
`
`021
`
`Facebook Inc. Ex. 1214 Part 3
`
`
`
`110
`
`to the user while the query is being developed. This means that the DKE will only suggest
`
`concepts for the user's approval, and will never automatically add any.
`
`The DKE has, in the current design of the system, two sources in which to look for
`
`concepts, the global domain knowledge and, if present, the user's domain knowledge.
`
`There will be at least a minimal amount of information in the global domain knowledge
`
`consisting of term nearest neighbors. The order in searching for candidate concepts, is to
`
`search the user's knowledge before the global knowledge.
`
`Searching for concepts proceeds by taking the terms in the request model that have
`
`not been checked previously by the DKE, and using them as entry points into the knowl(cid:173)
`
`edge. From these entry points a modified form of spreading activation is performed. The
`
`links emanating from them are chosen in a specific order to find candidate concepts. The
`
`basis for the order is to find the words that are most likely to be, in the mind of the user,
`
`associated with the information that he is looking for.
`
`The first links taken are the nearest neighbor links. These are relatively rare,
`
`therefore, if they exist it is very likely that t