`. Integrating Search and Retrieval
`with Hypertext
`By Edward A. Fox, Qi Fan Chen,
`and Robert K. France
`Dept. of Computer Science,
`VIrginia Pol)i'lechnic Institute iand
`State University
`hypertext and hypermedia have nwnerous applicatiOOs, the1 cannot solve all
`problems rela1ing to infoiJDalion access. Such access can build upon manually
`aut<JIIIa'lical[y created indices, search algorithms, feedback procedures, and other
`·'' __ ,,...--.. -- whose value has been proven on collections much larger than those that
`·can be managed solely by broWsing and following links. We propose a reference
`for what vie l:ielievc is an urgent need, that of integrating search and re(cid:173)
`with hypertext and hypetmedia. Further, we describe a prototype syotem
`around that model which would facilitate access to a wide variety of
`of information.
`this chapter, we focus on the problem of how to efficiently store and retr!eve a
`range .of disparate infonnation (lnclocling data stored in databases, text bases,
`arcbives, knowltldge bases, and other org~ collections). Once $Uch col-
`Facebook, Inc. et al.
`Software Rights Archive, LLC
`CASE IPR2013-00481


`lections were managed almost exclusively by mainframes, which ensured central(cid:173)
`ization and often enforced vendor, and sometimes industry-wide, standards. With
`the shift to personal computers and workstations, there has been a rapid prolifera-.
`lion of specialized packages, which make interchange of information between soft(cid:173)
`ware and hardware platforms difficult.
`Accumulations of information in forms that are hard to integrate are being cre(cid:173)
`ated and expanded at a rapid rate. Large online database vendors such as Dialog;
`Maxwell Online, STN, and others provide access to enormous collections of bib(cid:173)
`liographic records (usually titles, abstracts, and other citation data) or full text
`(reference works, newspapers, legal documents). In house, corporate employees · ·
`are generating electronic mail messages, database files (separately organized for
`information, CAD/CAM, chemical
`registry data, and
`databanks), and word processing documents. The serious problems of providing
`democratic access to all this information and of supporting manipulation tech(cid:173)
`niques as versatile and natural as speech or pencil-and-paper methods have yet to
`be solved.
`Prospects for Improvement
`The emergence of networked computer systems and optical storage devices have
`exacerbated the problem of incompatible data formats and, as a result, forced ven(cid:173)
`dors to seek solutions. For example, software designed to access data scattered
`across a heterogeneous network needs to follow rigorously enforced conventions.
`Optical publishing systems increase the amount of storage capacity availabl~ at
`a reasonable cost. This allows· in-house access to huge databases of text, graphics,
`images, and even interactive digital video [Fox, 1988d, 1989d, and 1990a]. The :(cid:173)
`huge amount of information that can be stored optically can make particular facts ·
`difficult to fmd. Fortunately, locally stored information can be organized and more . · ·
`easily accessed with more modem software !han is provided with on-line services
`such as Dialog. This allows more advanced information retrieval techniques to be
`implemeted [Fox, 1986a].
`. _ ::
`Our work is motivated by an analysis of current potentials and problems wtth :. -
`information access. In the fu-st part of this chapter, we describe a framework for · ·
`integrating hypertext systems with techniques for discovering information quickly,
`leading readers to particular nodes efficiently even in very large, unfamiliar .
`hyperbases. In the second section, we consider some of the capabilities that must ·
`be provided by information systems, pointing out which are provided by v~ous
`rypes of existing systems. The third section discusses progress made in that direc(cid:173)
`tion and its sequel provides further background The fourth section discusses the
`VPI&SU CODER project, which provides a means of exploring a broad class of
`information access approaches.· The fmal section of !he chapter outlines how we
`are extending CODER for hypermedia and interoperability, and includes recom"
`mendations for conceptual and practical integration of information retrieval with
`Necessary Capabilities
`Computer systems which are intended to facilitate user's access to information
`should: ·
`• Include help systems that can explain how to perfonn a search and what is done
`.as part of that search;
`0 Adapt to user requirements;
`• Help users describe and elaborate their information needs;
`• Be built in a flexible and modular fashion so that they can be easily extended.
`[Belkin, eta!., 1983; Belkin, et al., 1987a]
`A user-centered, mixed initiative interaction (where both user and computer op(cid:173)
`erate concurrently and can each seize control of input or output activities at almost
`any time) is preferred, so that:
`· • the user can request an explanation or follow a new train of thought at any time,
`0 the computer can request clarification or display tentative results as appropriate
`to further what it believes are the user's current goals.
`Since these goals often relate to finding relevant items in some infonnation
`repository, it is important to pay special attention to efficient and effective search(cid:173)
`ing, a8 is discussed in the next subsection.
`Finding Barn Doors and Stringing Pearls
`The current generation of· information retrieval systems rely on various search
`techniques to find relevant items. These can be viewed at both the abstract and
`system-specific levels [Bates, 1981]. One approach is to seek information by first
`looking for a neighborhood where useful items can be found and then selecting
`related or linked items from among !he items at hand
`. One can think of first finding the barn door and then, after glimpsing something
`of interest, i.e., a valuable pearl, following leads to closely related items. This is
`analogous to searching and then browsing, or using information retrieval tech(cid:173)
`niques and then applying hypertext methods.
`Most inforination being accessed has been processed to some degree in order to
`.. make access easier. Outlines, tables of contents, and indices (e.g., for author
`names, or specially selected descriptors) are uSually developed by humans accord(cid:173)
`. ing to some theory or organizing principles. Readers who can relate to existing
`.. _organizational systems can often find items of interest; computer menus are some(cid:173)
`_times applied to map these structures to available regions of display screens. Re(cid:173)
`lated items may be found if they occur nearby, if an index points to a series of
`items, or if explicit cross references are provided.


`Unfortunately, the searcher's perspective and/or vocabulary does not always __ ·.
`match that of the author or indexer. This is particularly important when a
`idea is involved and when a very large collection of items is being searched. Here ·
`precoordinated systems (where some of the most important combinations of

`are precombined and added to indices, sometimes as phrases like "information
`retrieval") may fail and force readers to build post-coordinated queries Oike
`timedia and retrieval"). Narrower queries can be formed using proximity opera(cid:173)
`tors. These include:
`• searches for adjacent words (two words placed side by side);
`• two or more words located in a single sentence;
`• within field combinations (e.g., two words located within a title line);
`• within a paragraph.
`Boolean query construction (i.e., using ANDs, ORs andfor NOTs to specify
`matches) can be exceedingly difficult, and indeed ten-to-one variations have been
`observed in the ability of"different individuals (and even of the.same individual at
`different times) to form good queries. Frequently, repeated efforts are required to
`form queries that do not retrieve many nonrelevant items (i.e., have high preci(cid:173)
`sion) and yet do retrieve a significant fraction of the relevant items (i.e., have high
`Finding relevant items often involves one or more of the following steps:
`• Translating an information need into the organization of concepts represented by -~:
`a table of contents, index or thesaurus.
`• Adt>pting a scheme to descend through some _hierarchical organization of topics
`or descriptors to find specific section(s) of interest..
`• Combining words or terms used within an index or thesaurus or forming the free
`text into a formal Boolean query.
`• Refining the query by adding synonyms or related terms (broadening), by in(cid:173)
`creasing specificity (narrowing) or by reorganizing the query structure.
`• Visually examining all selected items to find those of value.
`Sometimes the task at hand makes the choice of search method obvious. If you ·
`are looking for a single known item, a Boolean search will work bes;. Looking for
`multiple items on the same topic calls for browsing. Attempts to retrace a pre- ·
`viously viewed presentation suggests the use of links. Sometimes a combination of
`methods works best.
`Unified Metaphors
`The steps involved in searching and browsing hyperdocuments should be hidden
`behind a task-oriented interface that lets readers search at a conceptual, descriptive·. ·
`·level instead of at either a word-oriented or a procedural level. Boolean queries,
`for instance, fail to fulfill readers' information access needs because they focuS
`attention on the Wrong objects and force users to think in terms of the mechanics
`.. · · of search algorithms. Conversely, properly implemented browsing systems allow
`· .: . readers to explore a range of conceptual dimensions easily and intuitively. A
`graphical presentation coupling the spatial metaphor of nearness to the hypermedia
`" . ,,,.~,4~'"v' of links supports such explorations using simple pointing operations.
`Most importantly, both searching and browsing should be subsumed by a set of
`consistent metaphors and abstiactions· that encourage the integration of these ac-

`Using naturally occurring metaphors as the backbone of the user interface offers
`several advantages. Unfortunately, while paper sketches and natural language input
`may be easy for humans t'o work with, they are often beyond the scope of all but
`the most expensive of today's computer systems.
`Another possibility is to have a standardized unified digital representation and
`. manipulation system. We all take for granted the fact that telephones connect peo(cid:173)
`ple all around the world, and that they do this because of international agreements
`·on hardware and software standards. We can envision a collection of standards for
`.. · hsrdware, software, and information structures or data formats [see Devlin, 1991]
`.. :. built aroun.d both existing and exciting new technologies for digital encoding. This
`. requires management of multimedia including: text, graphics, images, speech,
`. audio, and video. It involves interconnection of multiple computers and diverse
`.· networks, and use of a variety of peripheral and input/output devices. Finally, it
`. · . :. requiri'S intelligent processing to improve efficiency and effectiveness and to en(cid:173)
`: · ... sure smooth human-computer interaction.
`Progress Toward Unified Access
`A great deal of progress has been made towards integration of computer networks
`[Tannenbaum, 1981]. Thousands of computers are intemetworked, able to ex(cid:173)
`. change electronic mail. Many are connected to allow even higher levels of integra(cid:173)
`tion: passing of files, remote log-ins, and even communication between programs.
`In the area of library information retrieval, the Z39.50 standard has been devel(cid:173)
`oped so that a user of one library system can cause that system to have a query
`· .processed on ll!lOther system, and then indirectly receive the search results. Vari(cid:173)
`.. ous organizations, including universities such as VPI&SU, are developing and test(cid:173)
`ing prototype systems for information interchange that follow Z39.50 specifica(cid:173)
`tions. In addition, the X.500 standard has enabled many prototype systems to com(cid:173)
`bine into a partial global directory system-a computer analog of the telephone
`white pages.
`Much of the success in intemetworking can be credited to the use of a layered
`approach to software, whereby physical connection, data connection, session man(cid:173)
`agement, presentation handling, and application program operation have separate
`interfaces and specifications. The International Organization for Standardization
`- - - - ... ---------. - --------


`(ISO) Open Systems Integration (OSI) model uses physical, data link, network,
`transport, session, presentation, and application levels to encourage interoperability·
`of software on heterogeneous computers [Zimmerman, 1980]. OSI has been the
`basis for Z39.50, X.SOO, and many other related standards.
`Tiris layering approach also applies to storage systems. On the one hand, there
`are levels of physical units, operating system-identified devices, and network wide
`file systems. On the other hand, there are software divisions, such as the proposed
`standard of the Air Traffic and Air Transport Associations, which separates the
`search engine component of CD-ROM software (which itself builds upon the vol(cid:173)
`ume and file description layer specified by the ISO 9660 standard) from the user
`interface layer.

`Object Model
`While computer users generally search .for relevant concepts, computer scientists
`have tried to meet user's needs by studying and building specialized processing
`systems for particular classes of information. Thus, the term "data" is often associ(cid:173)
`ated with database management systems, "information retrieval" is often limited to
`text databases, and "knowledge bases" are viewed as the proper repositories for
`collections of complex list structures and rules.
`To unify these approaches,. object-9riented databases have been proposed that
`might contain and support processing of any class of information. In hypertext.
`systems that build upon these object bases, completing a search or following a link ..
`would lead to presentation of any of various objects: a screen of text, a bitmap
`display, an audio segment, a spreadsheet entry, an individual plane of a complex
`image, an explanation of some expert rules, or a tabular report.
`However, to date only limited progress has been made in integrating
`database management systems (which at present usually follow the relational
`model) with text retrieval systems (which frequently employ inverted ftles and
`require access by way of Boolean queries). One issue is the differences in
`items being manipulated; tables of numbers or short character strings have few
`data types as opposed to the varieiy encountered in multimedia systems, and
`database structures are much more regnlar than those found in the world of
`compound documents (e.g., books, magazines, letters, reports, and encyclope(cid:173)
`dias). A possible solution is to use abstract data types (e.g., S!'ts, lists, vectors)
`as elements of a relational DBMS [Fox, 1981]. Some of these ideas were used ·
`in the redesign of the SMART system for the UNIX environment [Fox, 1983a],
`which eventually led to a version of that system that is widely used by docu(cid:173)
`ment retrieval researchers. AnOther approach is to directly use the relational
`model, with performance tuning and limited extensions where needed, to han(cid:173)
`dle bibliographic records [Lynch, 1987].
`Artificial intelligence (AI) research extends these efforts to solving the problems
`of knowledge representation. Initially, languages like LISP and Prolog focused on
`, ....
`symbol and list manipulation. Real world objects were associated with atoms that
`could be located by their names. Property lists (sets of property type/value pairs)
`were attached to these atoms.
`But to better match human ability to deal with default values, groupings of key
`attributes with various types of objects, and taxonomic classification of similar
`.. classes of objects, the concept of a frame was developed. This concept has proven
`useful for information retrieval [Fox, 1987b, Weaver, et al., 1989].
`Essentially, a frame class (e.g., U.S. postal address) has various slots for import(cid:173)
`ant aspects of the object (e.g., state, zip code), and may inherit from more general
`classes (e.g., postal address with slots for name and locality), as well as be instan(cid:173)
`tiated for an individual (e.g., John Smith's U.S. postal address). Frames can be
`grouped together into extremely regular knowledge bases.
`A semantic network is an alternate knowledge representation, closer in orienta(cid:173)
`tion to hypertext [Findler, 1979, Morgado, 1986]. A semantic network can be en(cid:173)
`. visioned as a graph where nodes are used to represent objects, and links to indicate
`significant, meaningful relationships. One well-known semantic network system,
`SNePS [Shapiro, 1989], supports knowledge representation and inference (i.e.
`proving or drawing conclusions based on knowledge recorded previously in th~
`system). In SNePS a distinction is often made between path-based inference,
`whereby the reader follows a succession or chain of links from some node to
`identify some relationship, and node-based inference, whereby logical propositions
`are represented in the network and are combined with other propositions in varied
`regions of the network in accord with the rules of logic. Path-based inference can
`be very efficient, and corresponds to following links in hypertexts. Semantic net(cid:173)
`works can also support spreading activation, where several concepts are located in
`the network, all paths of length 1, 2, etc. from each are explored until paths from
`several concepts meet-a type of parallel search for important associations.
`Spreading activation has been used in the GRANT system to retrieve documents
`related to various topics, projects, and grant proposals [Cohen and Kjeldsen
`Semantic networks are useful for handling the inter-relationships in language.
`At VPI&SU, for example, we have taken machine-readable dictionaries, extracted
`and restructured the important data [Fox, 1986b], archived them in the form of
`~log facts, and loaded them into a semantic network to facilitate further process(cid:173)
`mg and access [Fox, 1988a; France, et a!., 1989a]. In so doing we have found it
`helpful to build an elaborate taxonomy of relations (i.e., links) common between
`word senses in our lexicon. One goal is to extend earlier expetiments that indi(cid:173)
`cated improvement in retrieval, effectiveness could result by fmding words that are
`lexically related to query terms [Fox, 1988e]. Thus, the system knows enough
`about the meaning of a query to search for "captain" if a query asks for "ship's
`officers," or "general" when "army leader" is sought Others have considered net(cid:173)
`works as important support structures for probabilistic (Bayesian) inference sys(cid:173)
`tems that prove when a document. is relevant to a query [Croft and Turtle,. 1989].
`Ultimately, ·semantic networks can be combined with hypertext and hypermedia to
`lead to a uniform representation of objects, as discussed below.


`Integration of Various Search Techniques
`Research sugges1s that the best retrieval performance results when many different
`forms of information about documents are utilized by a variety of search methods
`[Belkin and Croft, 1987b]. Integration of various approaches has this same goal
`[Fox, 1987b]. Thus, one might merge the results from using Boolean, vector, and
`probabilistic searches for a given information statement. Any of these search
`methods can also be modified by the use of feedback, where a reader indicates
`what documents or sections thereof are relevant, and the computer uses all the
`information it has available about that sample to perform another, better search.
`One can even throw in use of AI techniques, as discussed in the section on our
`work with CODER. Reference models provide a clear framework for integration
`and thus have become popular in regard to networking, where different cabling,
`interconnection, specialized hardware, firmware, and layers of software allow.>''>_;,
`users of similar applications on different computers to collaborate.
`Reference Models for Hypertext
`In order to develop interoperable information systems and a unified representation
`model, a reference model for information management systems must first be de(cid:173)
`veloped. Work on hypertext reference models has arisen in part as a result of
`standardization efforts for hypertext/hypermedia [see Chapter 26 by Devlin on
`standards]. We have found the Dexter model [Halasz and Schwartz, 1990] and the
`r-model [Furuta and Stotts, 1989; Furuta and Stotts, 1990; Stotts and Furuta, 1989]
`to be particularly insightful. However, interoperable information systems should
`properly include not only hypertext/hypermedia, but also searching, networking,
`and other applications and levels of manipulation.
`A More Complete Reference Model
`Our proposal for a reference model that integrates hypertext and other sources of
`information is shown in Figure 21.1, and explained in more detail in subsequent
`sections. Essentially it combines the seven OSI layers with seven other layers re- ,
`lating to hypertext/hypermedia and other types of information systems.
`The bottom four layers are OSI layers. that together support the secure transport <·,,
`of messages. Atop the transport layer is a layer that provides essential support for .. ·.:_.· .:;
`files and process communication (messages). Processes operate above this layer to .: cc;
`support high-level communication between machines, including necessary transla- : _.,.
`tions between data representations. These six layers complete an extended ground-
`work for communication among machines, languages, and environments at a
`highly abstract level.
`The node layer comes next, supporting atomic, structured, and multimedia ob'
`jects. The anchor layer allows points or spans inside nodes to be addressed in
`ways appropriate for each node type. Links connect anchors, thus providing a
`Typical Contents
`hypertext, hypennedia, infonnation retrieval,
`image management, authoring, !mining, tutoring,
`CAD/CAM, interactive digital video
`deviCes: windows, pointing devices, speakers;
`media: text, audio, video;
`operations: generalization, editing, translation, access,
`browsing, selection, picking, querying, sequencing.
`base selection, user identification, history management,
`base: knowledge, infonnalion, data;
`operations: search, inference.
`graphs, lists, sets, vectors, rel~tions, frames.
`link Ids, labels.
`anchor Ids. span descriptions.
`textual strings, integers, object Ids.
`file systems, storage media, message support.
`ISO lower level equivalent
`ISO lower level equi~ent
`ISO lower level equivalent
`ISO lower level equivalent
`Figure 21.1
`Information Management System Reference.Model.
`·j:' ·
`·:': ·layer that is essentially a graph or network of information items. The view layer
`allows various aggregations and associations as well as common-data and informa-
`lion and knowl~ge structureS to be manipulated (e.g., have a view of relations, a
`.,'._::.view of vectors, a view of frames, a view as linked collections of nodes). The base
`· ,,- layer allows coordinated access to data, information and knowledge bases-
`•: ' through search, navigation, and related operations. Higher levels correspond to
`upper level OSI layers, but are organized to support common information access,
`hypertext/hypermedia, and presentation/application programming activities.


`We view this model as providing a framework for integrating information
`agement systems. Our aim is to validate this framework, to prove it adequate
`wide variety of activities. Towards the end of this chapter, we explain how
`work with the CODER system is being extended and layered along the lines of .·
`this integrated model.
`· ·
`Previous Work
`In the broadest sense, a reference model for information management should
`compass work in electronic publishing, document representation, text analysis,
`storage, information retrieval, and hypertext/hypermedia.

`Electronic Publishing
`In connection with A.CM Press Database and Electronic Products [Fox, 1988c], we.
`have ventured into a variety of electronic publishing ventures [see also Fox, Rous,
`Marchionini, 1991). In Hypertext on Hypertext we dealt with three versions of a
`single hypertext, in order to accommodate user preferences for equipment and op(cid:173)
`erating systems. Further, in one of the versions [Shneiderman, 1988], the original
`hypertext system was extended to include more comprehensive searching capabili(cid:173)
`ties, in part because of our experiences with online and CD-ROM publishing.
`These experiences also pointed out the need for standards and integrated or inter(cid:173)
`operable systems to allow wide-scale electronic publishing [Fox, 1990b].
`For example, while developing the Virginia Disc series of CO-ROMs, we pur- .· ··
`posely provided data and information bases in multiple forms to allow comparison
`of approaches and software packages. The Master Gardener Handbook on Vir(cid:173)
`ginia Disc One [Fox, 1990c] is available in hypertext foiiD, or can be searched
`with a bibliographic retrieval system. Though the former is handy for reading and
`browsing, the latter is more convenient for fmding specific facts. Similar compari(cid:173)
`sons can be made between alternative retrieval systems (Personal Librarian applied
`to searching the King James Bible at either the chapter or verse level).
`Related Work
`The most popular database management software today-relational DBMS-is
`based on the simple mathematical concept of a relation. A database relation can ·
`be thought of as a flat table, but is much more precisely defined-a relation is a
`set of individual items, called "tuples" and sometimes thought of as rows, each
`containing values for a fixed c:O!Iection of attributes, usually associated with col(cid:173)
`umns of the table. Queries in the relational model are answered by combining
`these relations using a restricted set of operations that depend on the flat and
`unifoiiD structure of the tuples. Traditional infomation retrieval also uses the no(cid:173)
`tion of sets [Artandi, 1971], but regards each item as a nonuniform-and often
`:.<s:tro•::tu:red.-•,oUlec1tion of attributes. Information retrieval querieS combine desired
`· ·
`using sentence-forming operators. Thus, a Boolean query specifies a set
`. documents by combining simpler queries (ultimately, individual indexed words
`concepts) using logical operators.
`Other systems of operators are possible. For instance, significantly more effec(cid:173)
`retrieval results from a simple change related to human perception of language
`Boolean logic-making the AND and OR operators softer or less strict. One
`. · realization of this change, the p-norm model, allows a parameter, p, to vary so
`. there is a continuum between strict ANDs and strict ORs [Salton, et al., 1983).
`model allows retrieval results to be ranked in descending order of presumed
`:· .value, and gives users flexibility to indicate relative importance of search terms.
`, Further exiension is possible, whereby user judgments regarding which previously
`' retrieved documents are relevant can be used to automatically build better Boolean
`p-norm queries [Salton, et al., 1985],
`Feedback is also fundamental to the probabilistic model [Robertson and Jones,
`'' 1976], whereby documents.are also ranked acc()rding to estimated probability of
`.. relevance. Ranking from probabilistic retrieval and various fuzzy set-based
`· schemes [Bookstein, 1985] is of particular value to users so they need not be
`concerned ~ith retrieval set size, but rather can let the computer carry out pattern
`matches to fwd closely related items.
`Another. enhancement to retrieval results from adding information. This has
`been demonstrated using the vector space model, where linear algebra can be used
`to allow descriptions of term occurrences in document collections [Salton, 1989).
`Many people are familiar with keyword-based search approaches [called syntactic
`searches in Littleford, 1991], in which the reader searches for a particular string of
`characters in a database or uses entries from a controlled vocabulary for searching.
`· · · Keyword-based searches are usually only available as alternatives to rather than
`complements to citation-based searching (e.g., as is done with the Science Citation
`Index). Yet if term and citation data are both made available [Fox, 1983b], espe(cid:173)
`cially when used in feedback searching, better results can be obtained than from
`either type of data alone [Fox, 1988b]. In the vector approach, readers select docu(cid:173)
`ments or groups of documents which can be treated as queries, so the system can
`sear~h for these matches.
`Results with these advanced retrieval methods have been demonstrated with col(cid:173)
`lections of up to tens of thousands of documents. A pilot study indicated applica(cid:173)
`bility to roughly 300,000 documents [Fox, 1987d], and results are now being ana(cid:173)
`lyzed from a recent study at VPI&SU with 500,000 library catalog records and
`216 users, that compared Boolean, p-nonn, vector, and vector with probabilistic
`feedback approaches. It seems that advanced methods for retrieval can be applled
`to very large collections.
`Work on the COmposite Document Experf/extendetf!effective Retrieval (CODER)
`system began in 1985, to develop a testbed for the integration of A1 and informa-


`lion retrieval methods [Fox, 1986a}. The aim was to develop a next-generation
`retrieval system: a system that would have greater effectiveness and efficiency . , .. ·
`than previous systems, and one that would allow comparison and integration of ·'
`advanced retrieval methods. Our plari is to integrate the various retrieval models
`into a unified system [Fox, l989a}.
`CODER Architecture
`As can be seen in Figure 21.2, CODER was conceived as a comprehensive
`standalone system that could deal directly With end users and could receive texts
`of bibliographic and full text documents ready for analysis. The architecture fol(cid:173)
`lows principles of software engineering, AI prototyping, and object oriented con(cid:173)
`struction [Fox, 1987a}.
`CODER asswnes a coarse-grained parailel operation of the sort now provided
`by computer networks. A central spine holds information for shared use by the
`analysis and retrieval subsystems. Each subsystem is specified as a collection of
`modules acting as a community and communicating through a central blackboard
`(a module with areas that experts can post to or read postings from). Attached to
`each blackboard is a strategist, with rules to indicate which modules should be
`requested to examine which types of postings.
`The retrieval subsystem is broken into modules along functional lines, based on
`studies of user and search intermediary interaction [Belkin, eta!., 1983}, as a "dis(cid:173)
`tributed expert based information system" [Belkin, et al., 1987aJ. Thus, there are
`modules for user modeling, dialog and discourse management, search, plarining,
`morphological analysis, and other operations. These can operate on a single com(cid:173)
`puter, or can be spread across a network of machines on which they run in paral(cid:173)
`The spine stores document and other text collections, relational data, and more
`complex knowledge structures. Frames are used extensively to describe docu(cid:173)
`ments, real-world entities, sessio

