`
`21
`
`Integrating Search and Retrieval
`with Hypertext
`
`By Edward A. Fox, Qi Fan Chen,
`and Robert K. France
`Dept. of Computer Science,
`Virginia Polytechnic Institute and
`State University
`
`Introduction
`
`While hypertext and hypermedia have numerous applications, they cannot solve all
`our problems relating to information access. Such access can build upon manually
`or automatically created indices, search algorithms, feedback procedures, and other
`techniques whose value has been proven on collections much larger than those that
`can be managed solely by browsing and following links. We propose a reference
`model for what we believe is an urgent need, that of integrating search and re(cid:173)
`trieval with hypertext and hypermedia. Further, we describe a prototype system
`designed around that model which would facilitate access to a wide variety of
`types of information.
`
`Current Status
`
`In this chapter, we focus on the problem of how to efficiently store and retrieve a
`' wide range .of disparate information (including data stored in databases, text bases,
`media archives, knowledge bases, and other organized collections). Once such col-
`
`329
`
`001
`
`Facebook Ex. 1211
`
`
`
`330 Hypertext 1 Hypermedia Handbook
`
`lections were managed almost exclusively by mainframes, which ensured centr~l
`ization and often enforced vendor, and sometimes industry-wide, standards. Wtth
`the shift to personal computers and workstations, there has been a rapid prolifera(cid:173)
`tion of specialized packages, which make interchange of information between soft(cid:173)
`ware and hardware platforms difficult.
`Accumulations of information in forms that are hard to integrate are being cre-
`ated and expanded at a rapid rate. Large online database vendors sue~ as Dial~g,
`Maxwell Online, STN, and others provide access to enormous collecttons of bib(cid:173)
`liographic records (usually titles, abstracts, and other citation data) or full text
`(reference works, newspapers, legal documents). ill house, corporate e~ployees
`are generating electronic mail messages, database files (separately orgamzed for
`information, CAD/CAM, chemical r~gistry data, and ~~ge
`geographic
`databanks), and word processing documents. The senous .problem:' of ~rovtdmg
`democratic access to all this information and of supportmg mampulatton tech(cid:173)
`niques as versatile and natural as speech or pencil-and-paper methods have yet to
`be solved.
`
`Prospects for Improvement
`
`The emergence of networked computer systems and optical storage devices have
`exacerbated the problem of incompatible data formats and, as a result, forced ven(cid:173)
`dors to seek solutions. For example, software designed to access data scattered
`across a heterogeneous network needs to follow rigorously enforced conventions.
`Optical publishing systems increase the amount of storage capacity availabl~ at
`a reasonable cost. This allows·in-house access to huge databases of text, graphics,
`images, and even interactive digital video [Fox, 1988d, 1989d, and 1990a]. The
`huge amount of information that can be stored optically can make particular facts
`difficult to fmd. Fortunately, locally stored information can be organized and more
`easily accessed with more modern software than is provided with on-line services
`such as Dialog. This allows more advanced information retrieval techniques to be
`implemeted [Fox, 1986a].
`.
`.
`Our work is motivated by an analysis of current potentials and problems wtth
`information access. In the first part of this chapter, we describe a framework for
`integrating hypertext systems with techniques for discovering information quickly,
`leading readers to particular nodes efflciently even in very large, unfamiliar
`hyperbases. In the second section, we consider some of the capabilities that must
`be provided by information systems, pointing out which are provided by various
`types of existing systems. The third section discusses progress made in that direc(cid:173)
`tion and its sequel provides further background. The fourth section discusses the
`VPI&SU CODER project, which provides a means of exploring a broad class of
`information access approaches. The fmal section of the chapter outlines how we
`are extending CODER for hypermedia and interoperability, and includes recom(cid:173)
`mendations for conceptual and practical integration of information retrieval with
`hypertext/hypermedia.
`
`Integrating Search and Retrieval with Hypertext 331
`
`Necessary CapabUiities
`
`Computer systems which are intended to facilitate user's access to information
`should:
`
`• Include help systems that can explain how to perform a search and what is done
`as part of that search;
`• Adapt to user requirements;
`• Help users describe and elaborate their information needs;
`• Be built in a flexible and modular fashion so that they can be easily extended.
`[Belkin, et al., 1983; Belkin, et al., 1987a]
`
`A user-centered, mixed initiative interaction (where both user and computer op(cid:173)
`erate concurrently and can each seize control of input or output activities at almost
`any time) is preferred, so that:
`
`• the user can request an explanation or follow a new train of thought at any time,
`• the computer can request clarification or display tentative results as appropriate
`to further what it believes are the user's current goals.
`
`Since these goals often relate to fmding relevant items in some information
`repository, it is important to pay special attention to efficient and effective search(cid:173)
`ing, as is discussed in the next subsection.
`
`Finding Barn Doors and Stringing Pearls
`
`The current generation of information retrieval systems rely on various search
`techniques to fmd relevant items. These can be viewed at both the abstract and
`system-specific levels [Bates, 1981]. One approach is to seek information by first
`looking for a neighborhood where useful items can be found and then selecting
`related or linked items from among the items at hand.
`One can think of first fmding the barn door and then, after glimpsing something
`of interest, i.e., a valuable pearl, following leads to closely related items. This is
`analogous to searching and then browsing, or using information retrieval tech(cid:173)
`niques and then applying hypertext methods.
`Most inforination being accessed has been processed to some degree in order to
`make access easier. Outlines, tables of contents, and indices (e.g., for author
`names, or specially selected descriptors) are usually developed by humans accord(cid:173)
`ing to some theory or organizing principles. Readers who can relate to existing
`organizational systems can often fmd items of interest; computer menus are some(cid:173)
`times applied to map these structures to available regions of display screens. Re(cid:173)
`lated items may be found if they occur nearby, if an index points to a series of
`items, or if explicit cross references are provided.
`
`002
`
`Facebook Ex. 1211
`
`
`
`332 Hypertext f Hypermedia Handbook .
`
`Integrating Search and Retrieval with !Hypertext 333
`
`Unfortunately, the searcher's perspective and/or vocabulary does not always
`match that of the author or indexer. This is particularly important when a complex
`idea is involved and when a very large collection of items is being searched. Here
`precoordinated systems (where some of the most important combinations of ideas
`are precombined and added to indices, sometimes as phrases like "information
`retrieval") may fail and force readers to build post-coordinated queries (like "mul(cid:173)
`timedia and retrieval"). Narrower queries can be formed using proximity opera(cid:173)
`tors. These include:
`
`• searches for adjacent words (two words placed side by side);
`• two or more words located in a single sentence;
`• within field combinations (e.g., two words located within a title line);
`• within a paragraph.
`
`Boolean query construction (i.e., using ANDs, ORs and/or NOTs to specify
`matches) can be exceedingly difficult, and indeed ten-to-one variations have been
`observed in the ability of different individu~ls (and even of the same individual at
`different times) to form good queries. Frequently, repeated efforts are required to
`form queries that do not retrieve many nonrelevant items (i.e., have high preci(cid:173)
`sion) and yet do retrieve a significant fraction of the relevant items (i.e., have high
`recall).
`Finding relevant items often involves one or more of the following steps:
`
`• Translating an information need into the organization of concepts represented by
`a table of contents, index or thesaurus.
`• Adopting a scheme to descend through some hierarchical organization of topics
`or descriptors to fmd specific section(s) of interest.
`• Combining words or terms used within an index or thesaurus or forming the free
`text into a formal Boolean query.
`• Refming the query by adding synonyms or related terms (broadening), by in(cid:173)
`creasing specificity (narrowing) or by reorganizing the query structure.
`• Visually examining all selected items to fmd those of value.
`
`Sometimes the task at hand makes the choice of search method obvious. If you
`are looking for a single known item, a Boolean search will work best. Looking for
`multiple items on the same topic calls for browsing. Attempts to retrace a pre(cid:173)
`viously viewed presentation suggests the use of links. Sometimes a combination of
`methods works best.
`
`Unified Metapl'nors
`
`The steps involved in searching and browsing hyperdocuments should be hidden
`behind a task-oriented interface that lets readers search at a conceptual, descriptive
`level instead of at either a word-oriented or a procedural leveL Boolean queries,
`for instance, fail to fulfill readers' information access needs because they focus
`
`· attention on the wrong objects and force users to think in terms of the mechanics
`of search algorithms. Conversely, properly implemented browsing systems allow
`readers to explore a range of conceptual dimensions easily and intuitively. A
`graphical presentation coupling the spatial metaphor of nearness to the hypermedia
`metaphor of links supports such explorations using simple pointing operations.
`.·Most importantly, both searching and browsing should be subsumed by a set of
`consistent metaphors and abstractions that encourage the integration of these ac(cid:173)
`cess techniques.
`Using naturally occurring metaphors as the backbone of the user interface offers
`several advantages. Unfortunately, while paper sketches and natural language input
`may be easy for humans to work with, they are often beyond the scope of all but
`the most expensive of today's computer systems.
`Another possibility is to have a standardized unified digital representation and
`manipulation system. We all take for granted the fact that telephones connect peo-
`ple all around the world, and that they do this because of international agreements
`on hardware and software standards. We can envision a collection of standards for
`. hardware, software, and information structures or data formats [see Devlin, 1991]
`built around both existing and exciting new technologies for digital encoding. This
`requires management of multimedia including: text, graphics, images, speech,
`audio, and video. It involves interconnection of multiple computers and diverse
`networks, and use of a variety of peripheral and input/output devices. Finally, it
`requir~ intelligent processing to improve efficiency and effectiveness and to en-
`, . sure smooth human-computer interaction.
`
`Progress Toward Unified Access
`
`Networki111g
`
`A great deal of progress has been made towards integration of computer networks
`[Tannenbaum, 1981]. Thousands of computers are internetworked, able to ex(cid:173)
`change electronic maiL Many are connected to allow even higher levels of integra(cid:173)
`tion: passing of files, remote log-ins, and even communication between programs.
`In the area of library information retrieval, the Z39.50 standard has been devel(cid:173)
`oped so that a user of one library system can cause that system to have a query
`processed on another system, and then indirectly receive the search results. Vari(cid:173)
`ous organizations, including universities such as VPI&SU, are developing and test(cid:173)
`ing prototype systems for information interchange that follow Z39.50 specifica(cid:173)
`tions. In addition, the X.500 standard has enabled many prototype systems to com(cid:173)
`bine into a partial global directory system-a computer analog of the telephone
`white pages.
`Much of the success in internetworking can be credited to the use of a layered
`approach to software, whereby physical connection, data connection, session man(cid:173)
`agement, presentation handling, and application program operation have separate
`interfaces and specifications. The International Organization for Standardization
`
`003
`
`Facebook Ex. 1211
`
`
`
`334 Hypertext 1 Hypermedia Handbook
`
`(ISO) Open Systems Integration (OSI) model uses physical, data link, network,
`transport, session, presentation, and application levels to encourage interoperability
`of software on heterogeneous computers [Zimmerman, 1980]. OS! has been the
`basis for Z39.50, X.500, and many other related standards.
`This layering approach also applies to storage systems. On the one hand, there
`are levels of physical units, operating system-identified devices, and network wide
`file systems. On the other hand, there are software divisions, such as the proposed
`standard of the Air Traffic and Air Transport Associations, which separates the
`search engine component of CD-ROM software (which itself builds upon the vol(cid:173)
`ume and file description layer specified by the ISO 9660 standard) from the user
`·
`interface layer.
`
`Object Model
`
`While computer users generally search .for relevant concepts, computer scientists
`have tried to meet user's needs by studying and building specialized processing
`systems for particular classes of information. Thus, the term "data" is often associ(cid:173)
`ated with database management systems, "information retrieval" is often limited to
`text databases, and "knowledge bases" are viewed as the proper repositories for
`collections of complex list structures and rules.
`To unify these approaches,. object-oriented databases have been proposed that
`might contain and support processing of any class of information. J[n hypertext
`systems that build upon these object bases, completing a search or following a link
`would lead to presentation of any of various objects: a screen of text, a bitmap
`display, an audio segment, a spreadsheet entry, an individual plane of a complex
`image, an explanation of some expert rules, or a tabular report.
`However, to date only limited progress has been made in integrating
`database management systems (which at present usually follow the relational
`model) with text retrieval systems (which frequently employ inverted files and
`require access by way of Boolean queries). One issue is the differences in
`items being manipulated; tables of numbers or short character strings have few
`data types as opposed to the variety encountered in multimedia systems, and
`database structures are much more regular than those found in the world of
`compound documents (e.g., books, magazines, letters, reports, and encyclope(cid:173)
`dias). A possible solution is to use abstract data types (e.g., sets, lists, vectors)
`as elements of a relational DBMS [Fox, 1981]. Some of these ideas were used
`in the redesign of the SMART system for the UNIX environment [Fox, 1983a],
`which eventually led to a version of that system that is widely used by docu(cid:173)
`ment retrieval researchers. Another approach is to directly use the relational
`model, with performance tuning and limited extensions where needed, to han(cid:173)
`dle bibliographic records [Lynch, 1987].
`Artificial intelligence (A]) research extends these efforts to solving the problems
`of knowledge representation. Initially, languages like LISP and Prolog focused on
`
`'
`
`Integrating Search and Retr!eva~ with Hypertext 335
`
`symbol and list manipulation. Real world objects were associated with atoms that
`could be located by their names. Property lists (sets of property type/value pairs)
`were attached to these atoms.
`But to better match human ability to deal with default values, groupings of key
`attributes with various types of objects, and taxonomic classification of similar
`classes of objects, the concept of a frame was developed. This concept has proven
`useful for information retrieval [Fox, 1987b, Weaver, et al., 1989].
`Essentially, a frame class (e.g., U.S. postal address) has various slots for import(cid:173)
`ant aspects of the object (e.g., state, zip code), and may inherit from more general
`classes (e.g., postal address with slots for name and locality), as well as be instan(cid:173)
`tiated for an individual (e.g., John Smith's U.S. postal address). Frames can be
`grouped together into extremely regular knowledge bases.
`A semantic network is an alternate knowledge representation, closer in orienta(cid:173)
`tion to hypertext [Findler, 1979, Morgado, 1986]. A semantic network can be en(cid:173)
`visioned as a graph where nodes are used to represent objects, and links to indicate
`significant, meaningful relationships. One well-known semantic network system,
`SNePS [Shapiro, 1989], supports knowledge representation and inference (i.e.,
`proving or drawing conclusions based on knowledge recorded previously in the
`system). ill SNePS a distinction is often made between path-based inference,
`whereby the reader follows a succession or chain of links from some node to
`identify some relationship, and node-based inference, whereby logical propositions
`are represented in the network and are combined with other propositions in varied
`regions of the network in accord with the rules of logic. Path-based inference can
`be very efficient, and corresponds to following links in hypertexts. Semantic net(cid:173)
`works can also support spreading activation, where several concepts are located in
`the network, all paths of length 1, 2, etc. from each are explored until paths from
`several concepts meet-a type of parallel search for important associations.
`Spreading activation has been used in the GRANT system to retrieve documents
`related to various topics, projects, and grant proposals [Cohen and Kjeldsen,
`1987].
`Semantic networks are useful for handling the inter-relationships in language.
`At VPI&SU, for example, we have taken machine-readable dictionaries, extracted
`and restructured the important data [Fox, 1986b], archived them in the form of
`Prolog facts, and loaded them into a semantic network to facilitate further process(cid:173)
`ing and access [Fox, 1988a; France, et al., 1989a]. In so doing we have found it
`helpful to build an elaborate taxonomy of relations (i.e., links) common between
`word senses in our lexicon. One goal is to extend earlier experiments that indi(cid:173)
`cated improvement in retrieval. effectiveness could result by fmding words that are
`lexically related to query terms [Fox, 1988e]. Thus, the system knows enough
`about the meaning of a query to search for "captain" if a query asks for "ship's
`officers," or "general" when "army leader" is sought. Others have considered net(cid:173)
`works as important support structures for probabilistic (Bayesian) inference sys(cid:173)
`tems that prove when a document is relevant to a query [Croft and Turtle, 1989].
`Ultimately, semantic networks can be combined with hypertext and hypermedia to
`lead to a uniform representation of objects, as discussed below.
`
`004
`
`Facebook Ex. 1211
`
`
`
`336 Hypertext I Hypermedia Handbook
`
`Integration of Various Search Techniques
`
`Research suggests that the best retrieval performance results when many different
`forms of information about documents are utilized by a variety of search methods
`[Belkin and Croft, 1987b]. Integration of various approaches has this same goal
`[Fox, 1987b). Thus, one might merge the results from using Boolean, vector, and
`probabilistic searches for a given information statement. Any of these search
`methods can also be modified by the use of feedback, where a reader indicates
`what documents or sections thereof are relevant, and the computer uses all the
`information it has available about that sample to perform another, better search.
`One can even throw in use of AI techniques, as discussed in the section on our
`work with CODER. Reference models provide a clear framework for integration
`and thus have become populat in regard to networking, where different cabling,
`interconnection, specialized hardware, firmware, and layers of software allow
`users of similar applications on different computers to collaborate.
`
`Reference Models for Hypertext
`
`fu order to develop interoperable information systems and a unified representation
`model, a reference model for information management systems must first be de(cid:173)
`veloped. Work on hypertext reference models has arisen in part as a result of
`standardization efforts for hypertext/hypermedia [see Chapter 26 by Devlin on
`standards]. We have found the Dexter model [Halasz and Schwartz, 1990] and the
`r-model [Furuta and Stotts, 1989; Furuta and Stotts, 1990; Stotts and Furuta, 1989]
`to be particularly insightful. However, interoperable information systems should
`properly include not only hypertext/hypermedia, but also searching, networking,
`and other applications and levels of manipulation.
`
`A More Complete Reference Model
`
`Our proposal for a reference model that integrates hypertext and other sources of
`information is shown in Figure 21.1, and explained in more detail in subsequent
`sections. Essentially it combines the seven OSI layers with seven other layers re(cid:173)
`lating to hypertext/hypermedia and other types of information systems.
`The bottom four layers are OSI layers that together support the secure transport
`of messages. Atop the transport layer is a layer that provides essential support for
`files and process communication (messages). Processes operate above this layer to
`support high-level communication between machines, including necessary transla(cid:173)
`tions between data representations. These six layers complete an extended ground(cid:173)
`work for communication among machines, languages, and environments at a
`highly abstract level.
`The node layer comes next, supporting atomic, structured, and multimedia ob(cid:173)
`jects. The anchor layer allows points or spans inside nodes to be addressed in
`ways appropriate for each node type. Links connect anchors, thus providing a
`
`Layer
`
`Application
`
`Presentation
`
`Session
`
`Base
`
`View
`
`Link
`
`Anchor
`
`Node
`
`Communication
`
`Physical machine
`
`Transport
`
`Network
`
`DataLink
`
`Physical
`
`Integrating Search and Retrieval with Hypertext 337
`
`Typical Contents
`
`hypertext, hypermedia, information retrieval,
`image management, authoring, training, tutoring,
`CAD/CAM, interactive digital video
`
`devices: windows, pointing devices, speakers;
`media: text, audio, video;
`operations: generalization, editing, translation, access,
`browsing, selection, picking, querying, sequencing.
`
`base selection, user identification, history management,
`versioning.
`
`base: knowledge, information, data;
`operations: search, inference.
`
`graphs, lists, sets, vectors, relations, frames.
`
`link Ids, labels.
`
`anchor Ids, span descriptions.
`
`textual strings, integers, object Ids.
`
`processes.
`
`file systems, storage media, message support.
`
`ISO lower level equivalent
`
`ISO lower level equivalent
`
`ISO lower level equivalent
`
`ISO lower level equivalent
`
`Figure 21.1
`
`Information Management System Reference. Model.
`
`layer that is essentially a graph or network of information items. The view layer
`allows various aggregations and associations as well as common·data and informa(cid:173)
`tion and knowledge structures to be manipulated (e.g., have a view of relations, a
`view of vectors, a view of frames, a view as linked collections of nodes). The base
`layer allows coordinated access to data, information and knowledge bases(cid:173)
`through search, navigation, and related operations. Higher levels correspond to
`upper level OSI layers, but are organized to support common information access,
`hypertext/hypermedia, and presentation/application programming activities.
`
`005
`
`Facebook Ex. 1211
`
`
`
`338 Hypertext f Hypermedia Handbook
`
`We view this model as providing a framework for integrating information man(cid:173)
`agement systems. Our aim is to validate this framework, to prove it adequate for a
`wide variety of activities. Towards the end of this chapter, we explain how our ·.
`work with the CODER system is being extended and layered along the lines of
`this integrated model.
`
`Previous Work
`
`In the broadest sense, a reference model for information management should en(cid:173)
`compass work in electronic publishing, document representation, text analysis,
`storage, information retrieval, and hypertext/hypermedia.
`
`Electronic Publishing
`
`In connection with ACM Press Database and Electronic Products [Fox, 1988c], we
`have ventured into a variety of electronic publishing ventures [see also Fox, Rous,
`Marchionini, 1991]. In Hypertext on Hypertext we dealt with three versions of a
`single hypertext, in order to accommodate user preferences for equipment and op(cid:173)
`erating systems. Further, in one of the versions [Shneiderman, 1988], the original
`hypertext system was extended to include more comprehensive searching capabili(cid:173)
`ties, in part because of our experiences with online and CD-ROM publishing.
`These experiences also pointed out the need for standards and integrated or inter(cid:173)
`operable systems to allow wide-scale electronic publishing [Fox, 1990b].
`For example, while developing the Virginia Disc series of CD-ROMs, we pur(cid:173)
`posely provided data and information bases in multiple forms to allow comparison
`of approaches and software packages. The Master Gardener Handbook on Vir(cid:173)
`ginia Disc One [Fox, 1990c] is available in hypertext form, or can be searched
`with a bibliographic retrieval system. Though the former is handy for reading and
`browsing, the latter is more convenient for fmding specific facts. Similar compari(cid:173)
`sons can be made between alternative retrieval systems (Personal Librarian applied
`to searching the King James Bible at either the chapter or verse level).
`
`Related Work
`
`The most popular database management software today-relational DBMS-is
`based on the simple mathematical concept of a relation. A database relation can
`be thought of as a flat table, but is much more precisely defmed-a relation is a
`set of individual items, called "tuples" and sometimes thought of as rows, each
`containing values for a fixed collection of attributes, usually associated with col(cid:173)
`umns of the table. Queries in the relational model are answered by combining
`these relations using a restricted set of operations that depend on the flat and
`uniform structure of the tuples. Traditional information retrieval also uses the no(cid:173)
`tion of sets [Artandi, 1971], but regards each item as a nonuniform-and often
`
`Integrating Search and Retrieval with Hypertext 339
`
`structured-collection of attributes. Information retrieval queries combine desired
`. · · attributes. using sentence-forming operators. Thus, a Boolean query specifies a set
`•. of documents by combining simpler queries (ultimately, individual indexed words
`or concepts) using logical operators.
`Other systems of operators are possible. For instance, significantly more effec(cid:173)
`tive retrieval results from a simple change related to human perception of language
`and Boolean logic-making the AND and OR operators softer or less strict. One
`realization of this change, the p-norm model, allows a parameter, p, to vary so
`there is a continuum between strict ANDs and strict ORs [Salton, et al., 1983].
`This model allows retrieval results to be ranked in descending order of presumed
`value, and gives users flexibility to indicate relative importance of search terms.
`Further extension is possible, whereby user judgments regarding which previously
`retrieved documents are relevant can be used to automatically build better Boolean
`or p-norm queries [Salton, et al., 1985].
`Feedback is also fundamental to the probabilistic model [Robertson and Jones,
`1976], whereby documents.are also ranked according to estimated probability of
`relevance. Ranking from probabilistic retrieval and various fuzzy set-based
`schemes [Bookstein, 1985] is of particular value to users so they need not be
`concerned with retrieval set size, but rather can let the computer carry out pattern
`matches to fmd closely related items.
`Another enhancement to retrieval results from adding information. This has
`been demonstrated using the vector space model, where linear algebra can be used
`to allow descriptions of term occurrences in document collections [Salton, 1989].
`Many people are familiar with keyword-based search approaches [called syntactic
`searches in Littleford, 1991], in which the reader searches for a particular string of
`characters in a database or uses entries from a controlled vocabulary for searching.
`Keyword-based searches are usually only available as alternatives to rather than
`complements to citation-based searching (e.g., as is done with the Science Citation
`Index). Yet ifterm and citation data are both made available [Fox, 1983b], espe(cid:173)
`cially when used in feedback searching, better results can be obtained than from
`either type of data alone [Fox, 1988b]. In the vector approach, readers select docu(cid:173)
`ments or groups of documents which can be treated as queries, so the system can
`searyh for these matches.
`Results with these advanced retrieval methods have been demonstrated with col(cid:173)
`lections of up to tens of thousands of documents. A pilot study indicated applica(cid:173)
`bility to roughly 300,000 documents [Fox, 1987d], and results are now being ana(cid:173)
`lyzed from a recent study at VPI&SU with 500,000 library catalog records and
`216 users, that compared Boolean, p-norm, vector, and vector with probabilistic
`feedback approaches. It seems that advanced methods for retrieval can be applied
`to very large collections.
`
`CODER
`
`Work on the COmposite Document Expertjextendedjeffective Retrieval (CODER)
`system began in 1985, to develop a testbed for the integration of AI and informa-
`
`006
`
`Facebook Ex. 1211
`
`
`
`340 Hypertext 1 Hypermedia Handbook
`
`.'--
`
`Integrating Search and Retrieval with Hypertext 341
`
`tion retrieval methods [Fox, 1986a]. The aim was to develop a next-generation
`retrieval system: a system that would have greater effectiveness and efficiency
`than previous systems, and one that would allow comparison and integration of
`advanced retrieval methods. Our plan is to integrate the various retrieval models
`into a unified system [Fox, 1989a].
`
`CODER Architecture
`
`As can be seen in Figure 21.2, CODER was conceived as a comprehensive
`standalone system that could deal directly with end users and could receive texts
`of bibliographic and full text documents ready for analysis. The architecture fol(cid:173)
`lows principles of software engineering, AI prototyping, and object oriented con(cid:173)
`struction [Fox, 1987a].
`CODER assumes a coarse-grained parallel operation of the sort now provided
`by computer networks. A central spine holds information for shared use by the
`analysis and retrieval subsystems. Each subsystem is specified as a collection of
`modules acting as a community and communicating through a central blackboard
`(a module with areas that experts can post to or read postings from). Attached to
`each blackboard is a strategist, with rules to indicate which modules should be
`requested to examine which types of postings.
`The retrieval subsystem is broken into modules along functional lines, based on
`studies of user and search intermediary interaction [Belkin, et al., 1983], as a ~dis
`tributed expert based information system" [Belkin, et al., 1987a]. Thus, there are
`modules for user modeling, dialog and discourse management, search, planning,
`morphological analysis, and other operations. These can operate on a single com(cid:173)
`puter, or can be spread across a network of machines on which they run in paral(cid:173)
`lel.
`The spine stores document and other text collections, relational data, and more
`complex knowledge structures. Frames are used extensively to describe docu(cid:173)
`ments, real-worl