`Integration of Heterogeneous Information Sources
`
`Sudarshan Chawathe, Hector Garcia-Molina, Joachim Hammer,
`Kelly Ireland, Yannis Papakonstantinou, Je rey Ullman, Jennifer Widom
`
`Department of Computer Science
`Stanford University
`Stanford, CA -
`
`last-name@cs.stanford.edu
`
`Abstract
`
`The goal of the Tsimmis Project is to develop tools that
`facilitate the rapid integration of heterogeneous information
`sources that may include both structured and unstructured
`data. This paper gives an overview of the project, describ-
`ing components that extract properties from unstructured
`objects, that translate information into a common object
`model, that combine information from several sources, that
`allow browsing of information, and that manage constraints
`across heterogeneous sites. Tsimmis is a joint project be-
`tween Stanford and the IBM Almaden Research Center.
`
`
`
`Overview
`
`A common problem facing many organizations today
`is that of multiple, disparate information sources and
`repositories, including databases, object stores, knowl-
`edge bases, le systems, digital
`libraries, information
`retrieval systems, and electronic mail systems. Decision
`makers often need information from multiple sources,
`but are unable to get and fuse the required information
`in a timely fashion due to the di culties of accessing the
`di erent systems, and due to the fact that the informa-
`tion obtained can be inconsistent and contradictory.
`
` Research sponsored by the Wright Laboratory, Aeronautical
`Systems Center, Air Force Material Command, USAF, under
`Grant Number F - - - .
`The US Government is
`authorized to reproduce and distribute reprints for Government
`purposes notwithstanding any copyright notation thereon. The
`views and conclusions contained in this document are those of the
`authors and should not be interpreted as necessarily representing
`the o cial policies or endorsements, either express or implied, of
`Wright Laboratory or the US Government. This work was also
`supported by the Reid and Polly Anderson Faculty Scholar Fund,
`the Center for Integrated Systems at Stanford University, and by
`Equipment Grants from Digital Equipment Corporation and IBM
`Corporation.
`
`The goal of the TSIMMIS project is to provide
`tools for accessing, in an integrated fashion, multiple
`information sources, and to ensure that the information
`obtained is consistent. Numerous other recent projects
`have similar goals, of course. Before describing the
`di erences between Tsimmis and other data integration
`projects,
`let us give an overview of the Tsimmis
`architecture, describing the functions of the various
`components and the philosophy of our approach. Refer
`to Figure .
`
`Translators and Common Model
` .
`Figure shows a collection of disk-shaped heteroge-
`neous information sources. Above each source is a trans-
`lator or wrapper that logically converts the underlying
`data objects to a common information model. To do
`this logical translation, the translator converts queries
`over information in the common model into requests
`that the source can execute, and it converts the data
`returned by the source into the common model.
`For the Tsimmis project we have adopted a simple
`self-describing or tagged object model. Similar models
`have been in use for years; we call our version the Object
`Exchange Model, or OEM. OEM allows simple nesting of
`objects, and a complete speci cation is given in Section
`. The fundamental idea is that all objects, and their
`subobjects, have labels that describe their meaning. For
`example, the following object represents a Fahrenheit
`temperature of degrees:
`
`htemp-in-Fahrenheit, int, i
`
`where the string temp-in-Fahrenheit" is a human-
`readable label, int" indicates an integer value, and "
`is the value itself.
`If we wish to represent a complex
`object, then each component of the object has its own
`label. For example, an object representing a set of two
`temperatures may look like:
`
` As an acronym, TSIMMIS stands for The Stanford-IBM
`Manager of Multiple Information Sources." In addition, Tsimmis
`is a Yiddish word for a stew with heterogeneous" fruits and
`vegetables integrated into a surprisingly tasty whole.
`
`
`
`Microsoft Corp. Exhibit 1316
`
`
`
`Constraint
`Manager
`
`Application
`
`Mediator
`
`Mediator
`
`Local
`Cons. Mgr.
`
`Local
`Cons. Mgr.
`
`Translator
`
`Translator
`
`Translator
`
`Info
`Source
`
`Classifier/Extractor
`
`Info
`Source
`
`Plain
`Text
`
`Mediator
`Generator
`
`Definition
`
`Translator
`Generator
`
`Definition
`
`Info
`Source
`
`Classifier/Extractor
`
`Plain
`Text
`
`Figure : Tsimmis Architecture
`
`hset-of-temps, set, fcmp , cmpgi
`cmp : htemp-in-Fahrenheit, int, i
`cmp: htemp-in-Celsius, int, i
`
`OEM is very simple, while providing the expressive
`power and exibility needed for integrating information
`from disparate sources. We also have developed a
`query language, OEM-QL, for requesting OEM objects.
`OEM-QL is an SQL-like language extended to deal with
`labels and object nesting; see Section .
`
` . Mediators
`Above the translators in Figure lie the mediators. A
`mediator is a system that re nes in some way informa-
`tion from one or more sources . A mediator em-
`beds the knowledge that is necessary for processing a
`speci c type of information. For example, a mediator
`for current events" might know that relevant informa-
`tion sources are the AP Newswire and the New York
`Times database. When the mediator receives a query,
`say for articles on Bosnia," it will know to forward the
`query to those sources. The mediator may also pro-
`cess answers before forwarding them to the user, say
`by converting dates to a common format, or by elim-
`inating articles that duplicate information. While the
`task of converting dates is probably straightforward, the
`task of eliminating duplicate information could be very
`
`complex,| guring out that two articles written by dif-
`ferent authors say the same thing" requires real intelli-
`gence. In Tsimmis we are focusing on relatively simple
`mediators based on patterns or rules. Still, even simple
`mediators can perform very useful information process-
`ing and merging tasks.
`Implementing a mediator can be complicated and
`time-consuming, but we believe that much of the cod-
`ing involved in mediators can be automated. Hence,
`one important goal of the Tsimmis project is to au-
`tomatically or semi-automatically generate mediators
`from high level descriptions of the information process-
`ing they need to do. This is illustrated by the mediator
`generator box on the right side of Figure . Similarly, we
`provide a translator generator that can generate OEM
`translators based on a description of the conversions
`that need to take place for queries received and results
`returned. This component, also illustrated in Figure ,
`signi cantly facilitates the task of implementing a new
`translator.
`
` .
`
`System and User Interfaces
`
`Mediators export an interface to their clients that is
`identical to that of translators. Both translators and
`mediators take as input OEM-QL queries and return
`OEM objects. Hence, end users and mediators can
`
`
`
`Microsoft Corp. Exhibit 1316
`
`
`
`obtain their information either from translators andor
`other mediators. This approach allows new sources to
`become useful as soon as a translator is supplied, it
`allows mediators to access new sources transparently,
`and it allows mediators to be stacked," performing
`more and more processing and re nement of the relevant
`information.
`End users top of Figure can access information ei-
`ther by writing applications that request OEM objects,
`or by using one of the generic browsing tools we have
`developed. Our most recent browsing tool provides ac-
`cess through Mosaic or other World-Wide-Web viewers
`, . The user writes a query on an interactive world-
`wide-web page, or selects a query from a menu. The
`answer is received as a hypertext document. The root
`of this document shows one or more levels of the answer
`object, with hypertext links available to take the user
`to portions of the answer that did not appear on the
`root document. This tool provides a mechanism for ex-
`ploring heterogeneous information sources that is easy
`to interact with and that is based on a commonly used
`interface. The browser is described in more detail in
`Section .
`
` .
`
`Labels and Mediator Processing
`
`It is important to note that there is no global database
`schema, and that mediators can work independently.
`For instance, to build a mediator it is only necessary
`to understand the sources that the mediator will use.
`In fact, it is not even necessary to fully understand the
`sources used. For example, returning to our current
`events" mediator, suppose one source exports objects
`with subobjects labeled by title, date, author, and
`country. The mediator might always pass the author
`and country subobjects to its client with no additional
`processing. Now suppose a second source provides topic
`and date subobjects. The mediator might convert the
`dates from both sources into a common format, and it
`will know how to convert a mediator query about the
`subject of an article into the appropriate topic or title
`queries to be sent to the sources.
`When a mediator simply passes subobjects to its
`clients as in author and country above,
`it might
`append the source name to the labels so that the client
`can interpret the objects correctly. For example, a
`mediator subobject might have label NYTimes.author,
`indicating that this author is from the New York Times
`source and follows its conventions for authors. Another
`object might have the label AP.author. Of course, the
`mediator could also make the formats consistent and
`export subobjects with label author, but here we are
`illustrating a simple mediator that does not do such
`processing.
`The key points are that a mediator does not need to
`
`understand all of the data it handles, and no person or
`software component needs to have a global view of all
`the information handled by the system.
`
` .
`
`Constraint Management
`
`Another important component in the Tsimmis archi-
`tecture is constraint management, illustrated in Fig-
`ure by aConstraint Manager and two Local Con-
`straint Managers. Integrity constraints specify seman-
`tic consistency requirements over stored information;
`such constraints arise even when the information resides
`in loosely coupled, heterogeneous systems. For exam-
`ple, a construction company keeps data about a build-
`ing under construction. This data must be consistent
`with the architect’s design e.g., walls must be in the
`same places, which may be stored in an entirely di er-
`ent system. Constraint management in the distributed,
`heterogeneous environments addressed by Tsimmis is
`a much more di cult and complex problem than con-
`straint management in centralized systems: Transac-
`tions across multiple information sources usually are not
`provided, and each information source may support dif-
`ferent capabilities for accessing and monitoring the data
`involved in a constraint.
`In current environments, constraints across heteroge-
`neous information sources usually are monitored or en-
`forced by humans, in an ad-hoc fashion or, frequently,
`not checked at all. For example, an architect may freeze
`the building design and send the latest speci cations to
`the construction company so that consistency is guar-
`anteed." Of course, it is clear that these ad-hoc mech-
`anisms do not work well in general; in our example, it
`is likely that the building may eventually not meet its
`speci cations.
`Since in a loosely coupled environment it is generally
`not possible to guarantee that every user or applica-
`tion sees consistent data every time it interacts with
`the system, the Tsimmis constraint manager enforces
`constraints with weaker guarantees than what a cen-
`tralized system may provide. Tsimmis makes relaxed"
`guarantees, e.g., a constraint is true from am to pm
`every day, or a constraint is true if some Flag" is set.
`Ensuring relaxed consistency is especially challenging
`because one now has to deal with the timing of actions
`and of guarantees. However, the advantages of being
`able to handle relaxed guarantees in heterogeneous sys-
`tems are signi cant; knowing precisely what holds and
`what does not hold, and when, will clearly lead to more
`trustworthy systems.
`The Tsimmis constraint manager supports the de -
`nition of the interfaces that a source supports for the
`information involved in a constraint e.g., can a trig-
`ger be set on a data item?, speci cation of the desired
`constraint e.g., two items should have the same value,
`
`
`
`Microsoft Corp. Exhibit 1316
`
`
`
`and speci cation of the strategy that is to be followed
`for enforcing the constraint or for detecting violations.
`The Local Constraint Managers in Figure are respon-
`sible for describing and supporting interfaces, while the
`Constraint Manager processes constraints and executes
`strategies. Note that the Constraint Manager actually is
`not centralized as illustrated in Figure , but rather is a
`set of distributed components that jointly manage con-
`straints. Constraint management is described in more
`detail in Section .
`
` .
`
`Classi cation and Extraction
`
`The nal component of the Tsimmis architecture con-
`sists of the Classi erExtractors shown at the bottom
`of Figure . Many important information sources are
`completely unstructured, consisting of plain les or in-
`coming bit strings e.g., from a newswire. Often it
`is possible to automatically classify the objects in such
`sources e.g., is the le an email message, a text le,
`or a gif image?, and to extract key properties e.g.,
`creation date, author. The Classi erExtractor per-
`forms this task, based on identifying simple patterns in
`the objects. The information collected by the Classi-
` erExtractor can then be exported via a translator, if
`necessary to the rest of the Tsimmis system, together
`with the raw data. The Classi erExtractor component
`is based on the Rufus system developed at the IBM Al-
`maden Research Center and is not discussed further
`in this paper.
`
` .
`
`Related Work
`
`There are a number of di erences between integration
`of information sources in the Tsimmis project and other
`database integration e orts e.g. , , , and many
`others:
`
` Tsimmis focuses on providing integrated access to
`very diverse and dynamic information. The infor-
`mation may be unstructured or semi-structured, of-
`ten having no regular schema to describe it. The
`components of objects may vary in unpredictable
`ways e.g., some pictures may be color, others black
`and white, others missing, some with captions and
`some without. Furthermore, the available sources,
`their contents, and the meaning of their contents
`may change frequently.
`
`and the integration strategy may change if certain
`unexpected data is encountered.
`
` Integration in our environment requires more human
`participation.
`In the extreme case, integration is
`performed manually by the end user. For example,
`a stock broker may read a report saying that IBM
`has named a new CEO, then retrieve recent IBM
`stock prices from a database to deduce that stock
`prices will rise.
`In other cases,
`integration may
`be automated by a mediator, but only after a
`human studies samples of the data, determines the
`procedure to follow, and develops an appropriate
`speci cation for the mediator generator.
`
`is not to perform
`In summary, the Tsimmis goal
`fully automated information integration that hides
`all diversity from the user, but rather to provide a
`framework and tools to assist humans end users andor
`humans programming integration software in their
`information processing and integration activities.
`Regarding the constraint management aspects, there
`has been substantial prior work on database constraints,
`focusing on centralized databases e.g., , tightly-
`coupled homogeneous distributed databases e.g., ,
`, or loosely-coupled heterogeneous databases with
`special constraint enforcement capabilities e.g., ,.
`The multidatabase transaction approach weakens the
`traditional notion of correctness of schedules e.g., ,
` , but this approach cannot handle a situation in
`which di erent databases support di erent capabilities.
`In its modeling of time, our work has some similarity
`to work in temporal databases and temporal logic
`programming , although our approach is closer to the
`event-based speci cation language in Rapide .
`
` .
`
`Remainder of Paper
`
`In the rest of this paper we provide additional details
`on some of the Tsimmis components. In Section we
`describe the OEM object model and its query language.
`In Section we present the TsimmisMosaic object
`browser. In Section we outline the main components
`of the constraint management toolkit. In Section we
`conclude, describe the status of the Tsimmis prototype,
`and discuss future directions of our work.
`
`
`
`Object Exchange
`
` Tsimmis assumes that information access and in-
`tegration are intertwined. In a traditional integra-
`tion scenario, there are two phases: an integration
`phase where data models and schemas or parts
`thereof are merged, and an access phase where data
`is fetched. In our environment, it may not be clear
`how information is merged until samples are viewed,
`
`As described in Section . , our Object Exchange
`Model OEM is used as the unifying object model for
`information processed by Tsimmis components. Note
`that information need not actually be stored using
`OEM, rather OEM is used for the processing of logical
`queries, and for providing results to the user.
`Each object in OEM has the following structure:
`
`
`
`Microsoft Corp. Exhibit 1316
`
`
`
`Label Type Value Object-ID
`
`where the four elds are:
`
` Label: A variable-length character string describing
`what the object represents.
`For each label a
`translator or mediator exports, it should provide a
`help" page that describes to a human the meaning
`and use of the label. These help pages can be very
`useful during exploration of information sources, and
`for deciding how to integrate information.
`
` Type: The data type of the object’s value. Each
`type is either an atom or basic type such as
`integer, string, real number, etc., or the type set or
`list. The possible atom types are not xed and may
`vary from information source to information source.
`
` Value: A variable-length value for the object.
`
` Object-ID: A unique variable-length identi er for
`the object or for null. The use of this eld is
`described below.
`
`In denoting an object on paper, we often drop the
`Object-ID eld, i.e. we write hlabel,type,valuei, as in
`the examples in Section . .
`Suppose an object representing an employee has
`label employee and a set value. The set consists of
`three subobjects, a name, ano ce, and a photo. All
`four objects are exported by an information source IS
`through a translator, and they are being examined by
`a client C. The only way C can retrieve the employee
`object is by posing a query that returns the object as
`an answer.
`Assume for the moment that the employee object is
`fetched into C’s memory along with its three subobjects.
`The value eld of the employee object will be a set of
`object references, say fo ; o; o g. Reference o will be
`the memory location for the name subobject, o for the
`o ce, and o for the photo. Thus, on the client side,
`the retrieved object will look like:
`
`hemployee, set, fo , o, o gi
`o : location of hname, str, some name"i
`o: location of ho ce, str, some o ce"i
`o : location of hphoto, bitmap, some bits"i
`
`On the information source side, the employee object
`may map to a real object of the same structure, or
`it may be an illusion" created by the translator from
`
`We assume that identi ers are unique for each information
`source. Uniqueness across information sources can be achieved
`by, e.g., prepending each object identi er with a unique ID for
`the information source.
`
`other information. If IS is an object database, and the
`employee object is stored as four objects with object
`identi ers id employee, id name, id o ce, and
`id photo, then the retrieved object on the client side
`would have id in the Object-ID eld for the employee
`object, id in the Object-ID eld for the name object,
`and so on. The non-null Object-ID elds tell client C
`that the objects it has correspond to identi able objects
`at IS. Suppose instead that IS is a relational database,
`and that the employee object" is actually a tuple.
`Then, the name, o ce, and photo objects attributes
`of the tuple will not have object identi ers, and their
`Object-ID elds at the client side will be null.
`So far we have assumed that the client retrieves the
`employee object and all of its subobjects. However, for
`performance reasons, the translator may prefer not to
`copy all subobjects. For example, if the photo subobject
`is a large bitmap with a unique identi er, it may be
`preferable to retrieve the name and o ce subobjects
`in their entirety, but retrieve only a placeholder" for
`the photo object. In this case, the value eld for the
`employee object at the client will contain fo ; o; id g.
`This indicates that the name and o ce subobjects can
`be found at memory locations o and o, but the photo
`subobject must be explicitly retrieved using id .
`Note that, regardless of the representation used in set
`and list values, the translator always gives the client the
`illusion of an object repository. Thus, we can think of
`our employee object as:
`
`hemployee, set, fcmp , cmp, cmp gi
`cmp : hname, str, some name"i
`cmp: ho ce, str, some o ce"i
`cmp : hphoto, bits, some bits"i
`
`where each cmpi is some mnemonic identi er for the
`subobject. We use this generic notation for examples
`throughout the remainder of this section.
`As mentioned in Section , self-describing models
`have been used in many systems, including le systems
` , Lotus Notes , by Teknekron Software Systems
` , and for electronic mail. In many of these systems,
`nesting of objects is not allowed, so OEM can be viewed
`as a generalization of these models. OEM is simpler
`than conventional object models, but it does support
`the two key features required by object models :
`object nesting and object identity.
`Our primary reason for choosing a very simple model
`is to facilitate integration. As pointed out in ,
`simple data models have an advantage over complex
`models when used for integration, since the operations
`to transform and merge data will be correspondingly
`simpler. Meanwhile, a simple model can still be very
`powerful: advanced features can be emulated" when
`they are necessary. For example, if we wish to model
`
`
`
`Microsoft Corp. Exhibit 1316
`
`
`
`an employee class with subclasses active and retired, we
`can add a subobject to each employee object with label
`subclass and value active" or retired." Of course this
`is not identical to having classes and subclasses, since
`OEM does not force objects to conform to the rules for
`a class. While some may view this as a weakness of
`OEM, we view it as an advantage, since it lets us cope
`with the heterogeneity we expect to nd in real-world
`information sources.
`
`Query Language and Examples
`.
`To request OEM objects from an information source, a
`client issues queries in a language we refer to as OEM-
`QL. OEM-QL adapts existing SQL-like languages for
`object-oriented models e.g., , , , to OEM.
`Here we will give two examples to illustrate the avor"
`of OEM-QL; additional details and examples can be
`found in .
`For the examples, suppose that we are accessing a
`bibliographic information source called Biblio with the
`object structure shown in Figure . Note that we are
`using mnemonic object references. Although much of
`this object structure is regular|components have the
`same labels and types|there are some irregularities.
`For example, the call number format is di erent for each
`document shown, and the nth document uses a di erent
`structure for author information.
`
`Example . Our rst example retrieves the topic of
`each document for which Ullman" is one of the authors:
`
`SELECT bib.doc.topic
`FROM Biblio
`WHERE bib.doc.authors.author-ln = "Ullman"
`
`Intuitively, the query’s WHERE clause nds all paths
`through the subobject structure with the sequence of
`labels bib, doc, authors, author-ln such that the
`object at the end of the path has value Ullman." For
`each such path, the SELECT clause speci es that one
`component of the answer object is the object obtained
`by traversing the same path, except ending with label
`topic instead of labels authors, author-ln. Hence,
`for the portion of the object structure shown in Figure
` the query returns:
`
`hanswer, set, fo , ogi
`o : htopic, str, Databases"i
`o: htopic, str, Algorithms"i
`
`
`
` Note that
`e.g.
`some proposed interchange standards,
`CORBA’s Object Request Broker , tend to be signi cantly
`more complex than OEM. We expect that if such standards are
`adopted, OEM could be used to provide a simpler, more client-
`friendly" front end. Other proposed standards, such as ODMG’s
`Object Database Standard , are directed towards interoperabil-
`ity and portability of object-oriented database systems, rather
`than towards facilitating object exchange in highly heterogeneous
`environments.
`
`hbib, set, fdoc , doc, : : :, docngi
`doc : hdoc, set, fau , top , cn gi
`
`au : hauthors, set, fau gi
`au
` : hauthor-ln, str, Ullman"i
`top : htopic, str, Databases"i
`cn : hlocal-call, integer, i
`doc: hdoc, set, fau, top, cngi
`,au
`, au
`au: hauthors, set, fau
`gi
`au
`: hauthor-ln, str, Aho"i
`au
`: hauthor-ln, str, Hopcroft"i
`au
`: hauthor-ln, str, Ullman"i
`top: htopic, str, Algorithms"i
`cn: hdewey-decimal, str, BR "i
`
`...
`docn: hdoc, set, faun, topn, cnngi
`aun: hone-author, str, Michael Crichton"i
`topn: htopic, str, Dinosaurs"i
`cnn: h ction-call, int, i
`
`Figure : Object structure for example queries
`
`Example . Our next example illustrates how vari-
`ables are used to specify di erent paths with the same
`label sequence. This query retrieves each document for
`which both Aho" and Hopcroft" are authors:
`
`SELECT bib.doc
`FROM Biblio
`WHERE bib.doc.authors.author-lna = "Aho"
`AND bib.doc.authors.author-lna = "Hopcroft"
`
`Here, the query’s WHERE clause nds all paths through
`the subobject structure with the sequence of labels bib,
`doc, authors, and with two distinct path completions
`with label author-ln and with values Aho" and
`Hopcroft" respectively. The answer object contains
`one doc component for each such path. Hence, for the
`portion of the object structure shown in Figure the
`query returns:
`
`hanswer, set, fogi
`o: hdoc, set, fau, top, cngi
`au: hauthors, set, fau , au,au gi
`
`
`
`au
`: hauthor-ln, str, Aho"i
`au
`: hauthor-ln, str, Hopcroft"i
`au
`: hauthor-ln, str, Ullman"i
`top: htopic, str, Algorithms"i
`cn: hdewey-decimal, str, BR "i
`
`
`
`Implementation
`.
`We have argued that OEM and its query language are
`designed to facilitate integrated access to heterogeneous
`data sources. To support this claim we have used the
`OEM model and language to integrate a variety of bibli-
`ographic information sources, including a conventional
`
`
`
`Microsoft Corp. Exhibit 1316
`
`
`
`library retrieval system, a relational database holding
`structured bibliographic records, and a le system with
`unstructured bibliographic entries. Using our OEM-
`based system, these sources are accessible through the
`Tsimmis browser Section , allowing evaluation of
`queries and object exploration.
`As an example, consider one of our operational trans-
`lators that accesses the Stanford University Folio Sys-
`tem. Folio provides access to over repositories, in-
`cluding a catalog of the holdings of Stanford’s libraries,
`and several commercial sources such as INSPEC that
`contain entries for Computer Science and other pub-
`lished articles. Folio is the most di cult of our infor-
`mation sources, partly because the translator must em-
`ulate an interactive terminal. The translator initially
`must establish a connection with Folio, giving the nec-
`essary account and access information. When the trans-
`lator receives an OEM-QL query to evaluate, it converts
`the query into Folio’s Boolean retrieval language. Then
`it extracts the relevant information from the incoming
`screens and exports the information as an OEM answer
`object. The Folio translator is written in C and runs
`as a server process on Unix BSD. systems. Trans-
`lators for the other bibliographic sources have involved
`substantially less coding because the underlying sources
`e.g., a relational database are much easier to use.
`We also have implemented mediators that fuse in-
`formation from multiple bibliographic sources. For ex-
`ample, one mediator provides a simple union" of the
`sources, making the information appear as if it all comes
`from one source. Another mediator performs a join"
`of two sources, combining entries that refer to the same
`document into a single entry that contains all informa-
`tion on the document available from either source.
`Finally, we also have implemented OEM Support Li-
`braries to facilitate the creation of future translators,
`mediators, and end-user interfaces. These libraries con-
`tain procedures that implement the exchange of OEM
`objects between a server either a translator or a medi-
`ator and a client either a mediator, an application, or
`an interactive end-user. The Support Libraries handle
`all TCPIP communications, transmission of large ob-
`jects, timeouts, and many other practical issues. A Unix
`BSD. and a Windows version of the package have been
`implemented and demonstrated. The Support Libraries
`are described in .
`
`
`
`Object Browsing
`
`The goal of the object browsing component of Tsimmis
`is to provide a platform-independent tool for displaying
`and exploring the OEM objects that are returned as a
`result of OEM-QL queries. Due to the nested structure
`of OEM objects, it is necessary to provide mechanisms
`that let end users navigate easily through the answer
`
`space, much like they would navigate through a tree
`structure. We have implemented MOBIE MOsaic
`Based Information Explorer, a graphical browsing
`tool based on Mosaic and the World-Wide-Web ,
` for submitting Tsimmis queries and exploring the
`results. MOBIE lets end users connect to mediators
`or translators and specify queries using OEM-QL. An
`important advantage of using Mosaic as the basis for
`our user interface is its widespread use and popularity.
`Mosaic currently operates on Unix workstations, on
`Macintosh computers, and on many PC’s. Hence,
`ultimately anyone on the internet should be able to use
`Tsimmis and MOBIE to explore any information source
`on the net, provided there is an appropriate translator
`or mediator available for it.
`We illustrate MOBIE’s operation by walking through
`a particular interaction. The rst step in accessing
`information through MOBIE is to select a translator or
`mediator henceforth referred to as TM and connect to
`it. Figure shows of MOBIE’s home page with a list
`of currently available TMs. The user may select any of
`the TMs on the list, enter its name in the provided box,
`and click on the Connect button. Information shown
`below the CONNECT button is used to ne-tune" the
`communication between the source and the client, and
`can generally be left in its default con guration.
`After the connection is established, a Query Request
`page not shown is displayed and the system is ready
`to accept an OEM-QL query. In the current version of
`MOBIE, queries must be entered by hand, meaning that
`the user must ll in the boxes provided on the screen
`one box for the SELECT clause, one for the FROM
`clause, and one for the WHERE clause. However,
`future extensions will
`include the ability to select
`parameterized frequently asked queries" by clicking on
`menus.
`If a submitted query is valid and successfully executed
`by the TM, the answer object is returned to MOBIE
`and displayed on a Query Result page. Except for
`very small objects, to see the complete result the
`user will move through the structure of the answer
`object using MOBIE’s navigational capabilities. This
`is best understood by thinking of the answer object
`as a tree or a graph,
`in the most general case,
`where the atom objects are the leaves, and the set
`objects are the internal nodes. Initially, only the root
`of the answer object and its immediate subobjects
`are displayed on the Query Result page not shown.
`For our bibliographic data, the root is typically a set
`containing a set of documents labelled doc. The
`user can move from the current level
`in the object
`structure to a lower level by clicking on the FETCH
`
`Mosaic displays information through a series of text screens
`or pages, the rst of which is always called the home page.
`
`
`
`Microsoft Cor