throbber
Chapter
`
`21
`
`Integrating Search and Retrieval
`with Hypertext
`
`By Edward A. Fox, Qi Fan Chen,
`and Robert K. France
`Dept. of Computer Science,
`Virginia Polytechnic Institute and
`State University
`
`Introduction
`
`While hypertext and hypermedia have numerous applications, they cannot solve all
`our problems relating to information access. Such access can build upon manually
`or automatically created indices, search algorithms, feedback procedures, and other
`techniques whose value has been proven on collections much larger than those that
`can be managed solely by browsing and following links. We propose a reference
`model for what we believe is an urgent need, that of integrating search and re(cid:173)
`trieval with hypertext and hypermedia. Further, we describe a prototype system
`designed around that model which would facilitate access to a wide variety of
`types of information.
`
`Current Status
`
`In this chapter, we focus on the problem of how to efficiently store and retrieve a
`' wide range .of disparate information (including data stored in databases, text bases,
`media archives, knowledge bases, and other organized collections). Once such col-
`
`329
`
`001
`
`Facebook Ex. 1211
`
`

`

`330 Hypertext 1 Hypermedia Handbook
`
`lections were managed almost exclusively by mainframes, which ensured centr~l­
`ization and often enforced vendor, and sometimes industry-wide, standards. Wtth
`the shift to personal computers and workstations, there has been a rapid prolifera(cid:173)
`tion of specialized packages, which make interchange of information between soft(cid:173)
`ware and hardware platforms difficult.
`Accumulations of information in forms that are hard to integrate are being cre-
`ated and expanded at a rapid rate. Large online database vendors sue~ as Dial~g,
`Maxwell Online, STN, and others provide access to enormous collecttons of bib(cid:173)
`liographic records (usually titles, abstracts, and other citation data) or full text
`(reference works, newspapers, legal documents). ill house, corporate e~ployees
`are generating electronic mail messages, database files (separately orgamzed for
`information, CAD/CAM, chemical r~gistry data, and ~~ge
`geographic
`databanks), and word processing documents. The senous .problem:' of ~rovtdmg
`democratic access to all this information and of supportmg mampulatton tech(cid:173)
`niques as versatile and natural as speech or pencil-and-paper methods have yet to
`be solved.
`
`Prospects for Improvement
`
`The emergence of networked computer systems and optical storage devices have
`exacerbated the problem of incompatible data formats and, as a result, forced ven(cid:173)
`dors to seek solutions. For example, software designed to access data scattered
`across a heterogeneous network needs to follow rigorously enforced conventions.
`Optical publishing systems increase the amount of storage capacity availabl~ at
`a reasonable cost. This allows·in-house access to huge databases of text, graphics,
`images, and even interactive digital video [Fox, 1988d, 1989d, and 1990a]. The
`huge amount of information that can be stored optically can make particular facts
`difficult to fmd. Fortunately, locally stored information can be organized and more
`easily accessed with more modern software than is provided with on-line services
`such as Dialog. This allows more advanced information retrieval techniques to be
`implemeted [Fox, 1986a].
`.
`.
`Our work is motivated by an analysis of current potentials and problems wtth
`information access. In the first part of this chapter, we describe a framework for
`integrating hypertext systems with techniques for discovering information quickly,
`leading readers to particular nodes efflciently even in very large, unfamiliar
`hyperbases. In the second section, we consider some of the capabilities that must
`be provided by information systems, pointing out which are provided by various
`types of existing systems. The third section discusses progress made in that direc(cid:173)
`tion and its sequel provides further background. The fourth section discusses the
`VPI&SU CODER project, which provides a means of exploring a broad class of
`information access approaches. The fmal section of the chapter outlines how we
`are extending CODER for hypermedia and interoperability, and includes recom(cid:173)
`mendations for conceptual and practical integration of information retrieval with
`hypertext/hypermedia.
`
`Integrating Search and Retrieval with Hypertext 331
`
`Necessary CapabUiities
`
`Computer systems which are intended to facilitate user's access to information
`should:
`
`• Include help systems that can explain how to perform a search and what is done
`as part of that search;
`• Adapt to user requirements;
`• Help users describe and elaborate their information needs;
`• Be built in a flexible and modular fashion so that they can be easily extended.
`[Belkin, et al., 1983; Belkin, et al., 1987a]
`
`A user-centered, mixed initiative interaction (where both user and computer op(cid:173)
`erate concurrently and can each seize control of input or output activities at almost
`any time) is preferred, so that:
`
`• the user can request an explanation or follow a new train of thought at any time,
`• the computer can request clarification or display tentative results as appropriate
`to further what it believes are the user's current goals.
`
`Since these goals often relate to fmding relevant items in some information
`repository, it is important to pay special attention to efficient and effective search(cid:173)
`ing, as is discussed in the next subsection.
`
`Finding Barn Doors and Stringing Pearls
`
`The current generation of information retrieval systems rely on various search
`techniques to fmd relevant items. These can be viewed at both the abstract and
`system-specific levels [Bates, 1981]. One approach is to seek information by first
`looking for a neighborhood where useful items can be found and then selecting
`related or linked items from among the items at hand.
`One can think of first fmding the barn door and then, after glimpsing something
`of interest, i.e., a valuable pearl, following leads to closely related items. This is
`analogous to searching and then browsing, or using information retrieval tech(cid:173)
`niques and then applying hypertext methods.
`Most inforination being accessed has been processed to some degree in order to
`make access easier. Outlines, tables of contents, and indices (e.g., for author
`names, or specially selected descriptors) are usually developed by humans accord(cid:173)
`ing to some theory or organizing principles. Readers who can relate to existing
`organizational systems can often fmd items of interest; computer menus are some(cid:173)
`times applied to map these structures to available regions of display screens. Re(cid:173)
`lated items may be found if they occur nearby, if an index points to a series of
`items, or if explicit cross references are provided.
`
`002
`
`Facebook Ex. 1211
`
`

`

`332 Hypertext f Hypermedia Handbook .
`
`Integrating Search and Retrieval with !Hypertext 333
`
`Unfortunately, the searcher's perspective and/or vocabulary does not always
`match that of the author or indexer. This is particularly important when a complex
`idea is involved and when a very large collection of items is being searched. Here
`precoordinated systems (where some of the most important combinations of ideas
`are precombined and added to indices, sometimes as phrases like "information
`retrieval") may fail and force readers to build post-coordinated queries (like "mul(cid:173)
`timedia and retrieval"). Narrower queries can be formed using proximity opera(cid:173)
`tors. These include:
`
`• searches for adjacent words (two words placed side by side);
`• two or more words located in a single sentence;
`• within field combinations (e.g., two words located within a title line);
`• within a paragraph.
`
`Boolean query construction (i.e., using ANDs, ORs and/or NOTs to specify
`matches) can be exceedingly difficult, and indeed ten-to-one variations have been
`observed in the ability of different individu~ls (and even of the same individual at
`different times) to form good queries. Frequently, repeated efforts are required to
`form queries that do not retrieve many nonrelevant items (i.e., have high preci(cid:173)
`sion) and yet do retrieve a significant fraction of the relevant items (i.e., have high
`recall).
`Finding relevant items often involves one or more of the following steps:
`
`• Translating an information need into the organization of concepts represented by
`a table of contents, index or thesaurus.
`• Adopting a scheme to descend through some hierarchical organization of topics
`or descriptors to fmd specific section(s) of interest.
`• Combining words or terms used within an index or thesaurus or forming the free
`text into a formal Boolean query.
`• Refming the query by adding synonyms or related terms (broadening), by in(cid:173)
`creasing specificity (narrowing) or by reorganizing the query structure.
`• Visually examining all selected items to fmd those of value.
`
`Sometimes the task at hand makes the choice of search method obvious. If you
`are looking for a single known item, a Boolean search will work best. Looking for
`multiple items on the same topic calls for browsing. Attempts to retrace a pre(cid:173)
`viously viewed presentation suggests the use of links. Sometimes a combination of
`methods works best.
`
`Unified Metapl'nors
`
`The steps involved in searching and browsing hyperdocuments should be hidden
`behind a task-oriented interface that lets readers search at a conceptual, descriptive
`level instead of at either a word-oriented or a procedural leveL Boolean queries,
`for instance, fail to fulfill readers' information access needs because they focus
`
`· attention on the wrong objects and force users to think in terms of the mechanics
`of search algorithms. Conversely, properly implemented browsing systems allow
`readers to explore a range of conceptual dimensions easily and intuitively. A
`graphical presentation coupling the spatial metaphor of nearness to the hypermedia
`metaphor of links supports such explorations using simple pointing operations.
`.·Most importantly, both searching and browsing should be subsumed by a set of
`consistent metaphors and abstractions that encourage the integration of these ac(cid:173)
`cess techniques.
`Using naturally occurring metaphors as the backbone of the user interface offers
`several advantages. Unfortunately, while paper sketches and natural language input
`may be easy for humans to work with, they are often beyond the scope of all but
`the most expensive of today's computer systems.
`Another possibility is to have a standardized unified digital representation and
`manipulation system. We all take for granted the fact that telephones connect peo-
`ple all around the world, and that they do this because of international agreements
`on hardware and software standards. We can envision a collection of standards for
`. hardware, software, and information structures or data formats [see Devlin, 1991]
`built around both existing and exciting new technologies for digital encoding. This
`requires management of multimedia including: text, graphics, images, speech,
`audio, and video. It involves interconnection of multiple computers and diverse
`networks, and use of a variety of peripheral and input/output devices. Finally, it
`requir~ intelligent processing to improve efficiency and effectiveness and to en-
`, . sure smooth human-computer interaction.
`
`Progress Toward Unified Access
`
`Networki111g
`
`A great deal of progress has been made towards integration of computer networks
`[Tannenbaum, 1981]. Thousands of computers are internetworked, able to ex(cid:173)
`change electronic maiL Many are connected to allow even higher levels of integra(cid:173)
`tion: passing of files, remote log-ins, and even communication between programs.
`In the area of library information retrieval, the Z39.50 standard has been devel(cid:173)
`oped so that a user of one library system can cause that system to have a query
`processed on another system, and then indirectly receive the search results. Vari(cid:173)
`ous organizations, including universities such as VPI&SU, are developing and test(cid:173)
`ing prototype systems for information interchange that follow Z39.50 specifica(cid:173)
`tions. In addition, the X.500 standard has enabled many prototype systems to com(cid:173)
`bine into a partial global directory system-a computer analog of the telephone
`white pages.
`Much of the success in internetworking can be credited to the use of a layered
`approach to software, whereby physical connection, data connection, session man(cid:173)
`agement, presentation handling, and application program operation have separate
`interfaces and specifications. The International Organization for Standardization
`
`003
`
`Facebook Ex. 1211
`
`

`

`334 Hypertext 1 Hypermedia Handbook
`
`(ISO) Open Systems Integration (OSI) model uses physical, data link, network,
`transport, session, presentation, and application levels to encourage interoperability
`of software on heterogeneous computers [Zimmerman, 1980]. OS! has been the
`basis for Z39.50, X.500, and many other related standards.
`This layering approach also applies to storage systems. On the one hand, there
`are levels of physical units, operating system-identified devices, and network wide
`file systems. On the other hand, there are software divisions, such as the proposed
`standard of the Air Traffic and Air Transport Associations, which separates the
`search engine component of CD-ROM software (which itself builds upon the vol(cid:173)
`ume and file description layer specified by the ISO 9660 standard) from the user

`interface layer.
`
`Object Model
`
`While computer users generally search .for relevant concepts, computer scientists
`have tried to meet user's needs by studying and building specialized processing
`systems for particular classes of information. Thus, the term "data" is often associ(cid:173)
`ated with database management systems, "information retrieval" is often limited to
`text databases, and "knowledge bases" are viewed as the proper repositories for
`collections of complex list structures and rules.
`To unify these approaches,. object-oriented databases have been proposed that
`might contain and support processing of any class of information. J[n hypertext
`systems that build upon these object bases, completing a search or following a link
`would lead to presentation of any of various objects: a screen of text, a bitmap
`display, an audio segment, a spreadsheet entry, an individual plane of a complex
`image, an explanation of some expert rules, or a tabular report.
`However, to date only limited progress has been made in integrating
`database management systems (which at present usually follow the relational
`model) with text retrieval systems (which frequently employ inverted files and
`require access by way of Boolean queries). One issue is the differences in
`items being manipulated; tables of numbers or short character strings have few
`data types as opposed to the variety encountered in multimedia systems, and
`database structures are much more regular than those found in the world of
`compound documents (e.g., books, magazines, letters, reports, and encyclope(cid:173)
`dias). A possible solution is to use abstract data types (e.g., sets, lists, vectors)
`as elements of a relational DBMS [Fox, 1981]. Some of these ideas were used
`in the redesign of the SMART system for the UNIX environment [Fox, 1983a],
`which eventually led to a version of that system that is widely used by docu(cid:173)
`ment retrieval researchers. Another approach is to directly use the relational
`model, with performance tuning and limited extensions where needed, to han(cid:173)
`dle bibliographic records [Lynch, 1987].
`Artificial intelligence (A]) research extends these efforts to solving the problems
`of knowledge representation. Initially, languages like LISP and Prolog focused on
`
`'
`
`Integrating Search and Retr!eva~ with Hypertext 335
`
`symbol and list manipulation. Real world objects were associated with atoms that
`could be located by their names. Property lists (sets of property type/value pairs)
`were attached to these atoms.
`But to better match human ability to deal with default values, groupings of key
`attributes with various types of objects, and taxonomic classification of similar
`classes of objects, the concept of a frame was developed. This concept has proven
`useful for information retrieval [Fox, 1987b, Weaver, et al., 1989].
`Essentially, a frame class (e.g., U.S. postal address) has various slots for import(cid:173)
`ant aspects of the object (e.g., state, zip code), and may inherit from more general
`classes (e.g., postal address with slots for name and locality), as well as be instan(cid:173)
`tiated for an individual (e.g., John Smith's U.S. postal address). Frames can be
`grouped together into extremely regular knowledge bases.
`A semantic network is an alternate knowledge representation, closer in orienta(cid:173)
`tion to hypertext [Findler, 1979, Morgado, 1986]. A semantic network can be en(cid:173)
`visioned as a graph where nodes are used to represent objects, and links to indicate
`significant, meaningful relationships. One well-known semantic network system,
`SNePS [Shapiro, 1989], supports knowledge representation and inference (i.e.,
`proving or drawing conclusions based on knowledge recorded previously in the
`system). ill SNePS a distinction is often made between path-based inference,
`whereby the reader follows a succession or chain of links from some node to
`identify some relationship, and node-based inference, whereby logical propositions
`are represented in the network and are combined with other propositions in varied
`regions of the network in accord with the rules of logic. Path-based inference can
`be very efficient, and corresponds to following links in hypertexts. Semantic net(cid:173)
`works can also support spreading activation, where several concepts are located in
`the network, all paths of length 1, 2, etc. from each are explored until paths from
`several concepts meet-a type of parallel search for important associations.
`Spreading activation has been used in the GRANT system to retrieve documents
`related to various topics, projects, and grant proposals [Cohen and Kjeldsen,
`1987].
`Semantic networks are useful for handling the inter-relationships in language.
`At VPI&SU, for example, we have taken machine-readable dictionaries, extracted
`and restructured the important data [Fox, 1986b], archived them in the form of
`Prolog facts, and loaded them into a semantic network to facilitate further process(cid:173)
`ing and access [Fox, 1988a; France, et al., 1989a]. In so doing we have found it
`helpful to build an elaborate taxonomy of relations (i.e., links) common between
`word senses in our lexicon. One goal is to extend earlier experiments that indi(cid:173)
`cated improvement in retrieval. effectiveness could result by fmding words that are
`lexically related to query terms [Fox, 1988e]. Thus, the system knows enough
`about the meaning of a query to search for "captain" if a query asks for "ship's
`officers," or "general" when "army leader" is sought. Others have considered net(cid:173)
`works as important support structures for probabilistic (Bayesian) inference sys(cid:173)
`tems that prove when a document is relevant to a query [Croft and Turtle, 1989].
`Ultimately, semantic networks can be combined with hypertext and hypermedia to
`lead to a uniform representation of objects, as discussed below.
`
`004
`
`Facebook Ex. 1211
`
`

`

`336 Hypertext I Hypermedia Handbook
`
`Integration of Various Search Techniques
`
`Research suggests that the best retrieval performance results when many different
`forms of information about documents are utilized by a variety of search methods
`[Belkin and Croft, 1987b]. Integration of various approaches has this same goal
`[Fox, 1987b). Thus, one might merge the results from using Boolean, vector, and
`probabilistic searches for a given information statement. Any of these search
`methods can also be modified by the use of feedback, where a reader indicates
`what documents or sections thereof are relevant, and the computer uses all the
`information it has available about that sample to perform another, better search.
`One can even throw in use of AI techniques, as discussed in the section on our
`work with CODER. Reference models provide a clear framework for integration
`and thus have become populat in regard to networking, where different cabling,
`interconnection, specialized hardware, firmware, and layers of software allow
`users of similar applications on different computers to collaborate.
`
`Reference Models for Hypertext
`
`fu order to develop interoperable information systems and a unified representation
`model, a reference model for information management systems must first be de(cid:173)
`veloped. Work on hypertext reference models has arisen in part as a result of
`standardization efforts for hypertext/hypermedia [see Chapter 26 by Devlin on
`standards]. We have found the Dexter model [Halasz and Schwartz, 1990] and the
`r-model [Furuta and Stotts, 1989; Furuta and Stotts, 1990; Stotts and Furuta, 1989]
`to be particularly insightful. However, interoperable information systems should
`properly include not only hypertext/hypermedia, but also searching, networking,
`and other applications and levels of manipulation.
`
`A More Complete Reference Model
`
`Our proposal for a reference model that integrates hypertext and other sources of
`information is shown in Figure 21.1, and explained in more detail in subsequent
`sections. Essentially it combines the seven OSI layers with seven other layers re(cid:173)
`lating to hypertext/hypermedia and other types of information systems.
`The bottom four layers are OSI layers that together support the secure transport
`of messages. Atop the transport layer is a layer that provides essential support for
`files and process communication (messages). Processes operate above this layer to
`support high-level communication between machines, including necessary transla(cid:173)
`tions between data representations. These six layers complete an extended ground(cid:173)
`work for communication among machines, languages, and environments at a
`highly abstract level.
`The node layer comes next, supporting atomic, structured, and multimedia ob(cid:173)
`jects. The anchor layer allows points or spans inside nodes to be addressed in
`ways appropriate for each node type. Links connect anchors, thus providing a
`
`Layer
`
`Application
`
`Presentation
`
`Session
`
`Base
`
`View
`
`Link
`
`Anchor
`
`Node
`
`Communication
`
`Physical machine
`
`Transport
`
`Network
`
`DataLink
`
`Physical
`
`Integrating Search and Retrieval with Hypertext 337
`
`Typical Contents
`
`hypertext, hypermedia, information retrieval,
`image management, authoring, training, tutoring,
`CAD/CAM, interactive digital video
`
`devices: windows, pointing devices, speakers;
`media: text, audio, video;
`operations: generalization, editing, translation, access,
`browsing, selection, picking, querying, sequencing.
`
`base selection, user identification, history management,
`versioning.
`
`base: knowledge, information, data;
`operations: search, inference.
`
`graphs, lists, sets, vectors, relations, frames.
`
`link Ids, labels.
`
`anchor Ids, span descriptions.
`
`textual strings, integers, object Ids.
`
`processes.
`
`file systems, storage media, message support.
`
`ISO lower level equivalent
`
`ISO lower level equivalent
`
`ISO lower level equivalent
`
`ISO lower level equivalent
`
`Figure 21.1
`
`Information Management System Reference. Model.
`
`layer that is essentially a graph or network of information items. The view layer
`allows various aggregations and associations as well as common·data and informa(cid:173)
`tion and knowledge structures to be manipulated (e.g., have a view of relations, a
`view of vectors, a view of frames, a view as linked collections of nodes). The base
`layer allows coordinated access to data, information and knowledge bases(cid:173)
`through search, navigation, and related operations. Higher levels correspond to
`upper level OSI layers, but are organized to support common information access,
`hypertext/hypermedia, and presentation/application programming activities.
`
`005
`
`Facebook Ex. 1211
`
`

`

`338 Hypertext f Hypermedia Handbook
`
`We view this model as providing a framework for integrating information man(cid:173)
`agement systems. Our aim is to validate this framework, to prove it adequate for a
`wide variety of activities. Towards the end of this chapter, we explain how our ·.
`work with the CODER system is being extended and layered along the lines of
`this integrated model.
`
`Previous Work
`
`In the broadest sense, a reference model for information management should en(cid:173)
`compass work in electronic publishing, document representation, text analysis,
`storage, information retrieval, and hypertext/hypermedia.
`
`Electronic Publishing
`
`In connection with ACM Press Database and Electronic Products [Fox, 1988c], we
`have ventured into a variety of electronic publishing ventures [see also Fox, Rous,
`Marchionini, 1991]. In Hypertext on Hypertext we dealt with three versions of a
`single hypertext, in order to accommodate user preferences for equipment and op(cid:173)
`erating systems. Further, in one of the versions [Shneiderman, 1988], the original
`hypertext system was extended to include more comprehensive searching capabili(cid:173)
`ties, in part because of our experiences with online and CD-ROM publishing.
`These experiences also pointed out the need for standards and integrated or inter(cid:173)
`operable systems to allow wide-scale electronic publishing [Fox, 1990b].
`For example, while developing the Virginia Disc series of CD-ROMs, we pur(cid:173)
`posely provided data and information bases in multiple forms to allow comparison
`of approaches and software packages. The Master Gardener Handbook on Vir(cid:173)
`ginia Disc One [Fox, 1990c] is available in hypertext form, or can be searched
`with a bibliographic retrieval system. Though the former is handy for reading and
`browsing, the latter is more convenient for fmding specific facts. Similar compari(cid:173)
`sons can be made between alternative retrieval systems (Personal Librarian applied
`to searching the King James Bible at either the chapter or verse level).
`
`Related Work
`
`The most popular database management software today-relational DBMS-is
`based on the simple mathematical concept of a relation. A database relation can
`be thought of as a flat table, but is much more precisely defmed-a relation is a
`set of individual items, called "tuples" and sometimes thought of as rows, each
`containing values for a fixed collection of attributes, usually associated with col(cid:173)
`umns of the table. Queries in the relational model are answered by combining
`these relations using a restricted set of operations that depend on the flat and
`uniform structure of the tuples. Traditional information retrieval also uses the no(cid:173)
`tion of sets [Artandi, 1971], but regards each item as a nonuniform-and often
`
`Integrating Search and Retrieval with Hypertext 339
`
`structured-collection of attributes. Information retrieval queries combine desired
`. · · attributes. using sentence-forming operators. Thus, a Boolean query specifies a set
`•. of documents by combining simpler queries (ultimately, individual indexed words
`or concepts) using logical operators.
`Other systems of operators are possible. For instance, significantly more effec(cid:173)
`tive retrieval results from a simple change related to human perception of language
`and Boolean logic-making the AND and OR operators softer or less strict. One
`realization of this change, the p-norm model, allows a parameter, p, to vary so
`there is a continuum between strict ANDs and strict ORs [Salton, et al., 1983].
`This model allows retrieval results to be ranked in descending order of presumed
`value, and gives users flexibility to indicate relative importance of search terms.
`Further extension is possible, whereby user judgments regarding which previously
`retrieved documents are relevant can be used to automatically build better Boolean
`or p-norm queries [Salton, et al., 1985].
`Feedback is also fundamental to the probabilistic model [Robertson and Jones,
`1976], whereby documents.are also ranked according to estimated probability of
`relevance. Ranking from probabilistic retrieval and various fuzzy set-based
`schemes [Bookstein, 1985] is of particular value to users so they need not be
`concerned with retrieval set size, but rather can let the computer carry out pattern
`matches to fmd closely related items.
`Another enhancement to retrieval results from adding information. This has
`been demonstrated using the vector space model, where linear algebra can be used
`to allow descriptions of term occurrences in document collections [Salton, 1989].
`Many people are familiar with keyword-based search approaches [called syntactic
`searches in Littleford, 1991], in which the reader searches for a particular string of
`characters in a database or uses entries from a controlled vocabulary for searching.
`Keyword-based searches are usually only available as alternatives to rather than
`complements to citation-based searching (e.g., as is done with the Science Citation
`Index). Yet ifterm and citation data are both made available [Fox, 1983b], espe(cid:173)
`cially when used in feedback searching, better results can be obtained than from
`either type of data alone [Fox, 1988b]. In the vector approach, readers select docu(cid:173)
`ments or groups of documents which can be treated as queries, so the system can
`searyh for these matches.
`Results with these advanced retrieval methods have been demonstrated with col(cid:173)
`lections of up to tens of thousands of documents. A pilot study indicated applica(cid:173)
`bility to roughly 300,000 documents [Fox, 1987d], and results are now being ana(cid:173)
`lyzed from a recent study at VPI&SU with 500,000 library catalog records and
`216 users, that compared Boolean, p-norm, vector, and vector with probabilistic
`feedback approaches. It seems that advanced methods for retrieval can be applied
`to very large collections.
`
`CODER
`
`Work on the COmposite Document Expertjextendedjeffective Retrieval (CODER)
`system began in 1985, to develop a testbed for the integration of AI and informa-
`
`006
`
`Facebook Ex. 1211
`
`

`

`340 Hypertext 1 Hypermedia Handbook
`
`.'--
`
`Integrating Search and Retrieval with Hypertext 341
`
`tion retrieval methods [Fox, 1986a]. The aim was to develop a next-generation
`retrieval system: a system that would have greater effectiveness and efficiency
`than previous systems, and one that would allow comparison and integration of
`advanced retrieval methods. Our plan is to integrate the various retrieval models
`into a unified system [Fox, 1989a].
`
`CODER Architecture
`
`As can be seen in Figure 21.2, CODER was conceived as a comprehensive
`standalone system that could deal directly with end users and could receive texts
`of bibliographic and full text documents ready for analysis. The architecture fol(cid:173)
`lows principles of software engineering, AI prototyping, and object oriented con(cid:173)
`struction [Fox, 1987a].
`CODER assumes a coarse-grained parallel operation of the sort now provided
`by computer networks. A central spine holds information for shared use by the
`analysis and retrieval subsystems. Each subsystem is specified as a collection of
`modules acting as a community and communicating through a central blackboard
`(a module with areas that experts can post to or read postings from). Attached to
`each blackboard is a strategist, with rules to indicate which modules should be
`requested to examine which types of postings.
`The retrieval subsystem is broken into modules along functional lines, based on
`studies of user and search intermediary interaction [Belkin, et al., 1983], as a ~dis­
`tributed expert based information system" [Belkin, et al., 1987a]. Thus, there are
`modules for user modeling, dialog and discourse management, search, planning,
`morphological analysis, and other operations. These can operate on a single com(cid:173)
`puter, or can be spread across a network of machines on which they run in paral(cid:173)
`lel.
`The spine stores document and other text collections, relational data, and more
`complex knowledge structures. Frames are used extensively to describe docu(cid:173)
`ments, real-worl

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket