`Multiple Databases
`
`ULLA MERZ and ROGER KING
`University
`of Colorado
`
`and design of a multldatabase
`is the architecture
`project
`research
`this
`The subject of
`for business
`applications.
`facility.
`These databases
`contain
`structured
`data,
`typical
`addressed
`are: presenting
`a uniform interface
`for
`retrieving
`data from multiple
`providing
`autonomy
`for
`the component
`databases,
`and defining
`an architecture
`services.
`definitions
`databases. The databases and their
`for heterogeneous
`is a query facility
`DIRECT
`in their
`data models, names,
`types, and encoded values.
`Instead
`of creating
`a global
`can differ
`descriptions
`of different
`databases
`are allowed
`to coexist. A multidatabase
`query
`schema,
`provides a uniform interface
`for retrieving
`data from different
`databases. DIRECT has
`language
`been exercised with operational
`databases
`that are part of an automated
`business
`system.
`
`query
`Problems
`databases,
`for semantic
`
`H.2.3 [Database
`and Subject Descriptors:
`Categories
`Administration
`base Management]:
`Database
`
`General Terms: Languages, Management
`
`Management]:
`
`Languages; H.2.7 [Data-
`
`Additional
`languages
`
`Key Words
`
`and Phrases: Data models,
`
`design,
`
`heterogeneous
`
`databases,
`
`query
`
`1.
`
`INTRODUCTION
`
`query
`of a multidatabase
`and design
`architecture
`the
`presents
`article
`This
`an interactive
`software
`system for
`known
`as DIRECT.
`DIRECT,
`facility
`heterogeneous
`databases,
`has been
`the
`vehicle
`for exploring
`and
`querying
`this
`design.
`realizing
`inventory
`information,
`order
`a company maintains
`that
`It
`is not unusual
`on how
`databases.
`Depending
`ratings
`in different
`credit
`data,
`and customer
`were
`developed
`and on the
`and their
`applications
`long
`ago these
`databases
`the choice
`of database management
`system,
`the
`requirements
`that
`governed
`inventory
`data may
`be stored
`in an IMS database
`and the customer
`credit
`ratings
`in a DB2 database.
`Generating
`a report
`that
`lists
`all
`items
`on order,
`
`Research Grant.
`This work was funded by an IBM Shared University
`CO
`Boulder,
`3035 Center Green Drive,
`Authors’
`addresses: U. Merz, Micro Decisionware,
`80301-5404; R. King, Department
`of Computer Science, University
`of Colorado, Campus Box 430,
`Boulder, CO 80309.
`the copies are
`is granted provided that
`fee all or part of this material
`Permission
`to copy without
`the ACM copyright
`notice and the title
`not made or distributed
`for direct commercial
`advantage,
`of the publication
`and its date appear, and notice is given that
`copying is by permission
`of
`the
`Association
`for Computing Machinery.
`To copy otherwise,
`or to republish,
`requires
`a fee and/or
`specific permission.
`01994
`ACM 1046-8188/94/1000-0339
`
`$3.50
`
`ACM
`
`Transactions
`
`on Information
`
`Systems,
`
`Vol
`
`12, No.
`
`4, October
`
`1994,
`
`Pages
`
`339-359.
`
`Meta Platforms, Inc.
`Exhibit 1026
`Page 001
`
`
`
`340
`
`.
`
`U Merz and R King
`
`credit
`
`condi-
`
`the customer’s
`given
`charge
`and the total
`date,
`shipping
`their
`from three
`databases.
`retrieving
`and merging
`data
`requires
`tions
`is
`system
`needed
`that
`retrieves
`data
`from heterogeneous
`A software
`databases. Many
`services,
`such as network
`communication
`services,
`transac-
`tion-processing
`services,
`and semantic
`services
`are needed
`to automate
`the
`retrieval
`of data
`stored
`in multiple
`databases.
`Each
`service
`has
`to resolve
`different
`aspects
`of heterogeneity.
`1991]
`[Segev
`services
`is on the semantic
`The focus
`of
`this
`research
`effort
`in spite
`whose objective
`is to provide
`a uniform
`application
`and user
`interface
`of heterogeneous
`database
`schemas
`and
`data manipulation
`languages.
`In
`addition,
`the semantic
`services must
`address
`the problem of
`identifying
`data
`with
`the same meaning
`duplicated
`and distributed
`across
`several
`databases.
`It
`is possible
`to reconcile
`differences
`in data models
`and query
`languages
`by
`defining
`a global
`schema
`using
`a common
`data model.
`This
`solution
`favors
`centralized
`control.
`A second
`solution
`defines
`a uniform
`query
`language;
`it
`is
`characterized
`by
`providing
`autonomy
`and
`extensibility.
`Inevitably,
`new
`database
`technologies
`with
`new data models will
`be introduced
`as software
`systems
`adjust
`to changing
`business
`needs. Therefore,
`it
`is necessary
`that
`an
`architecture
`for semantic
`services
`provides
`autonomy
`and extensibility.
`elements
`Independent
`of
`the solution
`chosen,
`semantically
`equivalent
`data
`must
`be identified
`and specified
`to make
`it possible
`to join
`and merge
`data
`from different
`databases.
`Two
`or more
`data
`elements
`containing
`atomic
`printable
`data
`values
`that
`represent
`the
`same
`real-world
`fact
`are semanti-
`cally
`equivalent
`if
`their
`data
`values
`belong
`to the same domain.
`Also,
`data
`values
`in different
`domains
`are semantically
`equivalent
`when
`functions
`exist
`(not necessarily
`invertible)
`for mapping
`or converting
`them from one domain
`into
`the other.
`position
`best
`in the
`are
`application
`the
`with
`Users
`familiar
`Their
`understanding
`data
`elements.
`semantically
`equivalent
`in the data
`definitions,
`and their
`knowledge
`names
`and comments
`applications
`using
`the data
`are important
`judging
`which
`data
`contain
`semantically
`equivalent
`or
`related
`Given
`the considerations
`stated
`above,
`the following
`design
`decisions:
`
`to identify
`of
`the
`data
`about
`the
`elements
`
`solution
`
`is based
`
`on
`
`for
`data.
`the proposed
`
`(1) Allow the coexistence
`
`of different
`
`data models
`
`—to guarantee
`provide
`
`—to
`
`extensibility
`autonomy
`for
`
`and
`the individual
`
`databases.
`
`(2) Unify
`
`heterogeneous
`
`databases
`
`through
`
`multidatabase
`
`queries
`
`—to retrieve
`—to
`create
`
`as needed
`data
`alternative
`views
`
`and
`of existing
`
`data
`
`definitions,
`
`1 This
`
`definition
`
`does
`
`not
`
`consider
`
`semantic
`
`eqmvalence
`
`of
`
`database
`
`constructs,
`
`such
`
`as
`
`sets,
`
`tables,
`
`or
`
`relations.
`
`ACM
`
`Transactions
`
`on Information
`
`Systems,
`
`Vol
`
`12, No
`
`4, October
`
`1994
`
`Meta Platforms, Inc.
`Exhibit 1026
`Page 002
`
`
`
`DIRECT: A Query Facillty
`
`.
`
`341
`
`(3) Define
`
`an architecture
`
`for semantic
`
`services
`
`—to help
`
`a user
`
`to specify
`
`a multidatabase
`
`query
`
`and
`
`provide
`—to
`semantically
`
`a cooperative
`equivalent
`
`problem-solving
`and related
`data
`
`environment
`elements.
`
`for
`
`identifying
`
`Since coexistence
`databases.
`for heterogeneous
`facility
`is a query
`DIRECT
`database
`descriptions
`and data
`allowing
`different
`for autonomy,
`is the basis
`of
`models
`to coexist
`is at
`the center
`this
`solution.
`Coexistence
`also provides
`extensibility,
`because
`new data-modeling
`constructs
`do not have to be mapped
`into the semantically
`equivalent
`data-modeling
`constructs
`of a global
`schema.
`In contrast
`to previous
`solutions,
`DIRECT
`does not
`create
`a global
`schema,
`but maintains
`the
`syntax
`and
`semantics
`of
`the
`data
`definitions
`for
`the
`individual
`databases.
`Instead
`of defining
`static
`functions
`for mapping
`seman-
`tically
`equivalent
`data elements,
`DIRECT
`provides
`human-computer
`interac-
`tion
`techniques
`that
`assist
`users
`in identifying
`them.
`on the user, whose
`The proposed
`architecture
`for semantic
`services
`focuses
`objective
`is to report
`data
`stored
`in multiple
`databases
`and whose judgment
`is needed
`to identify
`semantically
`equivalent
`data elements.
`The architecture
`is based
`on an architecture
`for
`cooperative
`problem-solving
`systems.
`Their
`goal
`is to assist users
`in tasks
`that
`are based on human
`judgment
`rather
`than
`on analytical
`rules.
`[Gould
`process
`design
`an iterative
`using
`developed
`DIRECT
`has
`been
`have been developed,
`two of which were evaluated
`for
`1988]. Three
`prototypes
`their
`usability
`in informal
`experiments.
`The
`purpose
`of
`the
`usability
`tests
`was to understand
`the user’s
`task
`of specifying
`a multidatabase
`query.
`The
`usability
`tests played
`an invaluable
`role in defining
`the solution.
`Understand-
`a
`ing the user’s
`task
`of specifying
`a multidatabase
`query
`led to identifying
`suitable
`user
`interface
`architecture.
`In turn,
`this
`architecture
`served
`as the
`basis
`for
`the proposed
`architecture
`for semantic
`services.
`3 provides
`Section
`The following
`section
`describes
`related
`research
`efforts.
`an overview
`of
`the proposed
`architecture
`for semantic
`services
`and its partic-
`ular
`realization
`in DIRECT.
`Section
`4 describes
`a sample
`scenario
`using
`DIRECT,
`and Section
`5 provides
`an assessment
`and summary
`of
`the research
`results.
`
`2. DIRECT
`
`IN PERSPECTIVE
`
`This
`ment
`
`summarizes
`section
`systems. Differences
`
`offered
`solutions
`to the proposed
`
`in database manage-
`by research
`solution
`are outlined.
`
`2.1 Schema
`
`Integration
`
`centralized
`and imposing
`redundancy
`of removing
`The principles
`global
`integration.
`The
`schema
`research
`influenced
`the
`for centralized
`control.
`databases
`autonomy
`of the individual
`1986]
`achieve
`uniformity
`et al.
`tion methodologies
`[Batini
`virtual,
`global
`schema
`based
`on a semantic
`data model.
`database
`schemas
`are mapped
`into
`the
`constructs
`and
`
`in
`
`have
`control
`trades
`schema
`integra-
`Schema
`by
`creating
`The
`individual
`notation
`of
`this
`
`a
`
`ACM
`
`TransactIons
`
`on Information
`
`Systems,
`
`Vol
`
`12, No
`
`4, October
`
`1994.
`
`Meta Platforms, Inc.
`Exhibit 1026
`Page 003
`
`
`
`342
`
`.
`
`U. Merz and R. King
`
`into
`that
`
`a global
`assumes
`
`and are then merged
`data model
`semantic
`in a uniform
`notation
`description
`a single
`tion will meet
`all data
`needs.
`to
`the extensibility
`or
`schemas
`database
`The capability
`to change
`existing
`by
`constructs
`is not easily
`achieved
`add databases
`with
`new data-modeling
`schema
`integration
`methodologies.
`Because
`global
`schemas
`are static,
`schema
`integration
`methodologies
`do not
`easily
`accommodate
`changes
`to existing
`database
`schemas
`or the addition
`of new schemas. Also,
`there
`is no guarantee
`that
`all
`future
`data-modeling
`constructs
`can be mapped
`into the constructs
`a particular
`semantic
`data model.
`has
`to coexist
`data models
`and
`Allowing
`different
`database
`descriptions
`is preserved
`several
`advantages.
`The
`autonomy
`of
`the individual
`databases
`by
`retaining
`the
`original
`data
`names
`and
`constructs.
`Additionally,
`new
`data-modeling
`constructs
`are more
`easily
`accommodated,
`because
`existing
`data
`definitions
`do not
`have
`to be mapped
`into
`semantically
`equivalent
`constructs
`of a particular
`data model.
`
`schema.
`a single
`
`The result
`representa-
`
`is
`
`of
`
`2.2 Multidatabase
`
`Query
`
`Languages
`
`pro-
`[ 1987]
`Abdellatif
`and
`Litwin
`is important,
`autonomy
`database
`Because
`a uniform
`of creating
`Instead
`global
`schema.
`with
`the
`pose
`to dispense
`query
`lan-
`database
`schemas,
`a multidatabase
`different
`description
`of
`the
`interface
`for
`joining
`and merging
`data values
`from
`guage
`provides
`a uniform
`different
`databases.
`Multidatabase
`query
`languages
`refer
`to data
`names
`as
`defined
`in the individual
`databases
`rather
`than
`the data
`names
`of a global
`schema.
`1987]
`and Abdellatif
`et al. 1991; Litwin
`[Krishnamurthy
`proposals
`Current
`features
`to specify
`with
`new language
`query model
`relational
`extend
`the
`data
`elements,
`to resolve
`schematic
`differences,
`and
`equivalent
`semantically
`In
`particular,
`the multidatabase
`query
`language
`data
`values.
`to convert
`MDSL
`[Litwin
`and Abdellatif
`1987]
`uses multiple
`identifiers
`and semantic
`variables
`to define
`semantically
`equivalent
`data
`elements.
`The design
`of
`the
`multidatabase
`query
`language
`IDL
`[Krishnamurthy
`et al. 1991]
`focuses
`on
`creating
`different
`views
`for
`resolving
`schematic
`differences.
`seman-
`These proposals
`do not address
`the problem users have identifying
`from
`tically
`equivalent
`data
`elements
`and
`creating
`multidatabase
`queries
`English
`statements.
`They
`also do not discuss
`the issues
`involved
`in extending
`these
`languages
`to data models
`other
`than
`the relational
`model.
`This
`research
`project
`defines
`a multidatabase
`query
`language
`relational
`algebra.
`Also,
`it
`is
`shown
`that
`semantically
`equivalent
`ments
`be
`defined
`implicitly
`through
`query
`functions
`such
`can
`for specifying
`semantically
`union.
`Instead
`of providing
`statements
`data
`elements,
`human-computer
`interaction
`techniques
`are
`help users
`to identify
`them.
`
`on
`based
`ele-
`data
`and
`as join
`equivalent
`provided
`that
`
`3. THE ARCHITECTURE
`
`OF DIRECT
`
`of
`Part
`feasibility
`
`effort
`the research
`of
`the design
`
`involved
`decisions
`
`a prototype
`implementing
`stated
`above. The prototype
`
`to validate
`DIRECT
`
`the
`was
`
`ACM
`
`TransactIons
`
`on Information
`
`Systems,
`
`Vcd
`
`12, No
`
`4. October
`
`1994
`
`Meta Platforms, Inc.
`Exhibit 1026
`Page 004
`
`
`
`DIRECT: A Query Facility
`
`.
`
`343
`
`to explore
`used
`semantic
`services
`
`services.
`semantic
`for
`an architecture
`addresses
`the following
`problems:
`
`An
`
`architecture
`
`for
`
`—differences
`
`in query
`
`languages,
`
`—differences
`
`in data models,
`
`and
`
`equivalent
`—semantically
`several
`databases.
`
`data
`
`elements
`
`duplicated
`
`and
`
`distributed
`
`across
`
`1).
`a
`
`approaches
`define
`products
`and commercial
`projects
`of research
`A number
`databases
`(see Fi~re
`heterogeneity
`of different
`the
`semantic
`to resolve
`integration
`methodologies,
`surveyed
`by Batini
`et al.
`[1986],
`define
`Schema
`schema with
`a uniform
`data model
`and
`a methodology
`for defining
`global
`semantically
`equivalent
`data elements. Multidatabase
`query
`languages,
`such
`as MDSL
`and IDL,
`define
`a common
`high-level
`query
`language
`with
`exten-
`sions
`for specifying
`semantically
`equivalent
`data
`elements.
`The commercial
`product
`Ingres/Star
`[Ingres
`Corp.
`1991] provides
`a relational
`data dictionary
`that
`creates
`a uniform
`name
`space for all data
`elements.
`A software
`system,
`called Carnot
`[Rasmus
`1991],
`unifies
`all database
`descriptions
`in a knowl-
`edge-based
`system that
`includes
`functions
`to match
`and map
`semantically
`equivalent
`data
`elements
`automatically;
`database-specific
`query
`languages
`are mapped
`into
`a canonical
`representation.
`This
`research
`effort
`investigates
`yet
`another
`alternative
`by
`keeping
`the
`database
`schemas
`and
`their
`data
`models
`separate
`and by making
`ii, a joint
`effort
`the software
`system and
`the user
`to identify
`interactively
`the semantically
`equivalent
`data
`elements.
`The
`following
`sections
`propose
`an architecture
`for
`the
`semantic
`services.
`Particular
`components
`this
`architecture,
`as implemented
`in DIRECT,
`are
`discussed
`also.
`
`of
`
`of
`
`3.1 An Architecture
`
`for Semantic
`
`Services
`
`(see
`services
`the semantic
`for
`of an architecture
`components
`The functional
`as described
`in Lemke
`Figure
`2) are similar
`to those of Design Environments
`defines
`interaction
`tech-
`[1989].
`The
`architecture
`of Design
`Environments
`to solve
`problems
`and
`users
`niques
`and
`visual
`representations
`that
`help
`application
`domain
`knowledge
`make
`decisions;
`it allows
`users
`to bring
`their
`into
`the
`solution
`process.
`A Design
`Environment
`consists
`of
`the
`following
`functional
`components:
`
`previously
`The parser makes
`Parser.
`able by transforming
`them and importing
`The work
`area provides
`under
`construction.
`representation
`a visual
`The palette
`provides
`Palette.
`that
`can be used to create
`a design
`specification.
`The catalog
`stores
`a collection
`of design
`and modified.
`descriptions
`to invalid
`users
`The critic
`alerts
`Critic.
`tive solutions
`during
`the construction
`process.
`
`Area.
`Work
`specification
`
`Catalog.
`be reused
`
`avail-
`specifications
`design
`created
`them into the Design Environment.
`a visual
`representation
`of
`the design
`
`of
`
`the parts
`
`or
`
`tools
`
`specifications
`
`that
`
`can
`
`and suggests
`
`alterna-
`
`ACM
`
`TransactIons
`
`on Information
`
`Systems,
`
`VO1. 1~. No. 4, October
`
`1994
`
`Meta Platforms, Inc.
`Exhibit 1026
`Page 005
`
`
`
`344
`
`.
`
`U. Merz and R, King
`
`Match Compatbie
`Data Elements
`
`User
`
`Data Description
`
`Single Cuery
`Global Language
`
`Single Query
`Local Language
`
`Separate Database Schemas
`Different Data Models
`
`Separate Database Schemas
`Single Data Model
`
`Global Schema
`Single Data Model
`
`~
`
`I
`
`lx
`
`lx
`
`x
`
`DIRECl
`
`x
`
`MDB~ IDL
`
`Ingres
`
`x
`
`System
`
`Metro
`
`Carnot
`
`Fig
`
`1.
`
`Matrix
`
`of solutions
`
`The code generator
`Code Generator.
`ated within
`the Design
`Environment
`outside
`the system.
`
`transforms
`into
`an external
`
`specifications
`design
`representation
`
`cre-
`for use
`
`(see Figure
`
`2) as realized
`
`in DIRECT
`
`descriptions,
`
`queries,
`queries,
`
`and
`
`services
`for semantic
`The architecture
`consists
`of
`the following
`components:
`—a component
`for
`importing
`data
`
`element
`
`definitions,
`for storing
`—a catalog
`—a component
`for defining
`multitdatabase
`—a component
`for processing
`multidatabase
`
`—a component
`
`for exporting
`
`data
`
`values
`
`and descriptions.
`
`a Design Environment,
`As with
`catalog
`component.
`It contains
`
`—Descriptions
`schemas,
`
`different
`the
`of
`and data types.
`
`the architecture
`
`for semantic
`
`services
`
`has a
`
`data models,
`
`query
`
`languages,
`
`database
`
`ACM
`
`Transactions
`
`m [nformatlon
`
`Systems,
`
`Vol
`
`12, No
`
`4, October
`
`1994
`
`Meta Platforms, Inc.
`Exhibit 1026
`Page 006
`
`
`
`DIRECT A Query Facility
`
`.
`
`345
`
`IMPORT
`
`DDL Paraer
`Data Oeclaratlon
`Parser
`DB Catalog Procedures
`
`7
`
`CATALOG
`tlcm
`Date Model Oescrl
`1’
`Schema LXscrlpt
`on
`Query Oescrlptlon
`Data Type t%crlptlon
`Result Format Dascrlptiom
`
`>
`
`EXPORT
`
`*
`
`Create
`
`S*@temertB
`DDL
`Create Fieport
`
`GATEWAYS
`
`DATA MANAGER
`
`Retrieve
`Remote
`
`Data from
`‘-
`Databases
`
`i
`
`Process Global Query
`
`1
`
`i
`
`I
`
`Remote
`
`Databases
`
`Database
`
`for Sub-query
`
`Results
`
`Fig. 2. Architecture
`
`for semantic
`
`services.
`
`data
`for parsing
`—Procedures
`query
`transforming
`mation,
`one data type to another.
`
`definitions,
`languages,
`
`database
`extracting
`and converting
`data
`
`catalog
`values
`
`infor-
`from
`
`the purpose
`architectures,
`In both
`specification.
`of
`the
`DIRECT
`uses
`the
`correctness
`of
`data
`assure
`specification.
`
`the
`
`the catalog
`of
`the
`information
`names
`and
`
`is to guarantee
`the
`stored
`in
`the multitdatabase
`
`the validity
`to
`catalog
`query
`
`ACM
`
`Transactions
`
`on Information
`
`Systems,
`
`Vol.
`
`12, No.
`
`4, October
`
`1994
`
`Meta Platforms, Inc.
`Exhibit 1026
`Page 007
`
`
`
`346
`
`.
`
`U, Merz and R. King
`
`for semantic
`the architecture
`for Design Environments,
`to a parser
`Similar
`the
`data
`definitions
`of
`the
`contains
`a component
`for
`importing
`services
`database
`management
`systems.
`The import
`component
`upon
`calls
`different
`and database
`catalog
`procedures
`for acquiring
`the database
`schemas
`parsers
`of the different
`databases.
`The objective
`of
`the semantic
`services
`and a design
`environment
`is to make
`data
`stored
`in different
`formats
`and media
`available
`for processing.
`Automating
`this
`task
`relieves
`the user of
`this
`clerical
`task.
`The functions
`of
`the work
`area, palette
`and critic,
`are realized
`in the query
`editor
`component.
`The query
`editor
`component
`defines
`the representation
`and
`functions
`of
`the multidatabase
`query
`language.
`Its
`purpose
`is to provide
`guidance
`to users
`for specifying
`a multidatabase
`query. Design
`environments
`In
`are used
`to create
`static
`specifications,
`such
`as the layout
`of a kitchen.
`comparison,
`the query
`editor
`component
`of
`the semantic
`services
`is used to
`specify
`functional
`behavior
`by selecting
`query
`operators.
`architecture
`the
`Parallel
`to the
`generator
`in a Design
`Environment,
`on the descriptions
`semantic
`services
`contains
`an export
`component.
`It
`relies
`in the
`catalog
`for
`the data
`types,
`data
`definitions,
`and query
`languages
`create
`either
`data
`definition
`statements
`or a report
`of
`the query
`result.
`Unlike
`a design
`environment,
`the
`architecture
`for
`the
`semantic
`services
`provides
`a component
`for executing
`the query
`specification.
`This
`component
`for
`decomposes
`the multidatabase
`query
`into
`subqueries
`the
`component
`databases
`and
`a global
`query
`for merging
`the
`subquery
`results.
`Global
`queries
`consist
`of database
`definition
`statements
`to
`store
`the
`subquery
`results
`and requests
`for
`joining
`and merging
`these
`results.
`Subqueries
`and
`global
`queries
`are specified
`in a plan
`that
`is used for coordinating
`the query
`execution.
`This
`component
`uses a database manager
`for storing
`the results
`of
`the
`subqueriesj
`for processing
`the
`global
`query,
`and for
`creating
`the
`final
`result.
`It also calls
`on the services
`of database
`gateways
`[Hackathorn
`1993]
`which
`provide
`location
`transparency
`(hardware,
`operating
`system,
`and com-
`munication
`networks)
`for
`remote
`databases.
`on four
`relies
`services
`for semantic
`In summary,
`the proposed
`architecture
`transforming
`technologies:
`(1)
`data
`dictionary
`technology
`for
`storing
`and
`definitions,
`(2) human-computer
`interaction
`techniques
`for assisting
`users
`specifying
`a multidatabase
`query,
`(3) database
`technology
`for
`executing
`global
`query,
`and (4)
`transaction
`management
`for managing
`the distributed
`execution
`of sub queries.
`
`for
`
`to
`
`in
`a
`
`3.2 The Data Catalog
`
`in DIRECT
`
`and the
`schemas
`database
`the different
`stores
`in DIRECT
`catalog
`The data
`It also stores
`the defini-
`of
`the different
`data-modeling
`constructs.
`semantics
`definitions
`are encoded
`the different
`query
`specifications.
`The data
`tions
`for
`as a network
`of data
`structures
`with
`data values
`describing
`the properties
`of
`the different
`data-modeling
`constructs
`and data
`elements.
`DIRECT
`can ac-
`commodate
`a variety
`of data models with
`data structures
`and types
`similar
`to
`those
`found
`in general-purpose
`programming
`languages.
`
`ACM
`
`Transactions
`
`on Information
`
`Systems,
`
`Vol
`
`12, No,
`
`4, October
`
`1994
`
`Meta Platforms, Inc.
`Exhibit 1026
`Page 008
`
`
`
`DIRECT: A Query Facility
`
`.
`
`347
`
`of data
`number
`to a large
`common
`on properties
`relies
`catalog
`The data
`data
`elements
`as
`and constructed
`provide
`simple
`data models
`models. Most
`be defined
`as an
`well
`as relationships
`between
`them.
`Relationships
`may
`equivalence
`relation
`between
`simple
`data
`elements,
`a distinct
`data-modeling
`construct,
`or a reference
`to a constructed
`data element.
`Simple
`data elements
`have a name,
`a type,
`and a single
`printable
`value. Simple
`data elements
`have
`the same
`semantics,
`independent
`of a particular
`data model. Depending
`on
`the data model,
`constructed
`data
`elements
`can be tuples,
`sets,
`lists,
`classes,
`or segments.
`They
`have different
`semantics.
`Constructed
`data elements
`have
`a name and consist
`of either
`simple
`or constructed
`data elements.
`They define
`static,
`structural
`relationships.
`Data
`relationships
`between
`constructed
`data
`elements
`differ with
`different
`data models.
`They
`can be parent/child,
`IS-A,
`object
`references,
`or
`foreign
`key references.
`These relationships
`are dynamic,
`because
`they
`are based
`on data values.
`structures.
`as data
`are represented
`Simple
`and constructed
`data
`elements
`these
`data
`Links
`(pointers)
`are used to reflect
`the structural
`composition
`of
`elements,
`their
`data
`relationships,
`and
`the
`category
`(simple,
`constructed)
`they
`are part
`of. Data
`values
`describe
`information,
`such as the type
`of data
`relationship
`being
`a referential
`integrity
`constraint
`or a simple
`data element
`being
`a primary
`key.
`languages
`definition
`the data
`for
`parsers
`contains
`The prototype
`DIRECT
`and Yacc, which
`are
`using
`Lex
`IMS and DB2.
`They were
`implemented
`of
`well-known
`Unix
`development
`tools. The parsers
`transform
`the definition
`for
`each data
`element
`into
`an internal
`representation
`consisting
`of properties
`common
`to different
`data models
`and data-model-specific
`properties
`such as
`foreign
`keys or
`logical
`parents.
`
`3.3 The Query
`
`Language
`
`in DIRECT
`
`language
`of a query
`design
`The
`syntactical
`representations.
`their
`by two considerations:
`guided
`properties
`duplicated
`their
`to share query
`processing
`a multi
`database
`query
`
`the
`choosing
`involves
`the
`The
`selection
`of
`(1) to retrieve
`semantically
`and distributed
`across
`several
`with
`existing
`database management
`has the following
`properties:
`
`and
`functions
`query
`was
`functions
`query
`equivalent
`data and
`databases
`and (2)
`systems.
`Thus,
`
`and joining
`for merging
`—operators
`subqueries,
`—decomposability
`into
`—query
`operators
`provided
`by most
`
`data,
`
`query
`
`languages.
`
`therefore
`and
`languages
`query
`of many
`basis
`algebra—the
`Relational
`these
`considerations.
`Set
`achieve
`candidate
`to
`widely
`known—is
`a good
`can be used to retrieve
`difference,
`and intersection,
`operations,
`such as union,
`databases.
`Attribute
`com-
`data
`duplicated
`in several
`semantically
`equivalent
`parison
`operations
`can be used to join
`related
`data that
`are distributed
`across
`several
`databases.
`
`ACM
`
`TransactIons
`
`on Information
`
`Systems,
`
`Vol.
`
`1’2, No.
`
`4, October
`
`1994
`
`Meta Platforms, Inc.
`Exhibit 1026
`Page 009
`
`
`
`348
`
`.
`
`U Merz and R. King
`
`algebra,
`on relational
`are based
`functions
`query
`the
`If
`pertinent
`functions
`by grouping
`can be decomposed
`query
`These
`subqueries
`can be submitted
`a separate
`subquery.
`into
`systems
`for processing.
`The results
`ual database management
`queries
`are merged
`and combined
`into
`the final
`query
`result.
`The
`set of query
`functions
`provided
`by DIRECT
`is relationally
`[Codd
`1977]. The following
`query
`flmctions
`are supported
`by DIRECT:
`
`a multidatabase
`to each database
`to the individ-
`of
`these
`sub-
`
`complete
`
`—union
`—intersection
`
`—difference
`
`—projection
`
`—restriction
`
`—equi-join
`
`(based
`
`on predefined
`
`relationships)
`
`—theta-join
`
`(general
`
`comparison
`
`of data values).
`
`it
`
`tuples
`are sets of
`which
`are relations,
`by DIRECT
`produced
`results
`The query
`that
`the
`different
`DIRECT
`expects
`elements.
`of
`simple
`data
`consisting
`operators
`to reference
`and
`extract
`management
`systems
`support
`database
`data
`elements
`(projection,
`selection)
`and
`operators
`to traverse
`the
`simple
`structural
`and
`dynamic
`data
`relationships
`provided
`by
`their
`data
`static
`for
`(equi-join).
`DIRECT
`provides
`operators
`joining
`and merging
`the
`model
`subquery
`results
`by supporting
`the union,
`intersection,
`difference,
`and gen-
`eral
`join
`operators;
`therefore,
`requires
`the services
`of a relational-database
`manager.
`This
`diagram.
`definition
`as a data
`graphically
`a query
`represents
`DIRECT
`representation
`proposed
`by Trimble
`[1989].
`Because
`to a query
`is similar
`is more
`abstract
`than
`a keyword-based
`query
`state-
`representation
`graphical
`constructs
`can be represented
`in a uniform
`way.
`ment,
`similar
`data-modeling
`are represented
`as boxes containing
`lists of simple
`Constructed
`data elements
`either
`the
`query
`result,
`restrictions,
`or
`data
`data
`elements
`that
`define
`If
`the constructed
`data
`elements
`belong
`to the
`relationships
`between
`them.
`same database,
`they
`are connected
`by lines
`representing
`predefine
`relation-
`ships
`such
`as foreign
`key
`references
`defined
`in
`a relational
`database
`hierarchical
`paths
`for
`hierarchical
`databases.
`If
`they
`belong
`to different
`databases,
`they
`are connected
`by lines
`labeled
`with
`the data
`element
`names
`from
`the
`different
`databases
`and
`an
`operator
`to
`specify
`the
`comparison
`between
`them.
`Additional
`lines
`connect
`the
`box
`representing
`the
`query
`result
`with
`the
`boxes
`for
`the
`constructed
`data
`elements
`that
`contribute
`to its
`result.
`(See Figure
`6 for a query
`graph
`representing
`a complete
`query.)
`specifies
`DIRECT
`contains
`a direct-manipulation
`query
`facility.
`A user
`query
`by selecting
`icons
`representing
`the query
`operators,
`such as selection
`or equi-join,
`followed
`by
`selecting
`data
`elements
`that
`are
`part
`of
`the
`operation.
`The query
`facility
`guides
`in their
`selection;
`the objective
`is to
`
`a
`
`or
`
`a
`
`the
`users
`
`ACM
`
`TransactIons
`
`on [nformatlon
`
`Systems,
`
`Vo]
`
`12, No
`
`4, October
`
`1994
`
`Meta Platforms, Inc.
`Exhibit 1026
`Page 010
`
`
`
`a connected
`
`query
`
`graph
`
`as the graphical
`
`representation
`
`of a complete
`
`create
`query.
`
`DIRECT: A Query Facillty
`
`.
`
`349
`
`4. SAMPLE
`
`SCENARIO
`
`busi-
`of an entire
`the automation
`supporting
`data
`that
`is not uncommon
`It
`in multiple
`databases.
`New databases
`are created
`ness process
`are stored
`because
`of changes
`in the structure
`of an organization,
`changes
`in technology,
`and changes
`in the way a business
`operates.
`
`4.1 A Sample Application
`
`of manu-
`and export
`the import
`automates
`system that
`software
`An existing
`of view of customs,
`factured
`parts
`has evolved
`in this way. From the point
`imports
`and exports
`are governed
`by different
`rules.
`This may
`explain
`why
`data
`about
`invoicing,
`order
`tracking,
`and
`tariffs
`are
`stored
`in
`separate
`databases
`the import
`and export
`business.
`The same products
`are manu-
`for
`factured
`in several
`countries.
`Furthermore,
`the components
`of a particular
`product
`are manufactured
`in one country
`and
`shipped
`to another
`for
`final
`assembly.
`Additional
`databases
`store
`data
`about
`parts
`that
`can be imported
`and exported
`and record manufacturing
`costs
`related
`to customs
`and ship-
`ping.
`To complicate
`matters,
`the
`original
`applications
`use
`a hierarchical
`database management
`system,
`and the newer
`applications
`take
`advantage
`relational
`technology.
`As a result,
`the same and related
`data
`are duplicated
`and distributed
`throughout
`several
`databases.
`Creating
`summary
`reports
`and
`the
`analyzing
`data
`require
`retrieval
`of data
`from several
`databases.
`The
`company
`needs
`a uniform,
`easy-to-use
`interface
`for
`retrieving
`data
`from
`heterogeneous
`databases.
`multi-
`to formulate
`of DIRECT
`the implementation
`The scenario
`describes
`from the existing
`operational
`database
`queries.
`The example
`uses definitions
`databases
`that
`are
`described
`above.
`The
`entire
`system consists
`of 20-25
`databases.
`The
`invoice
`database,
`a hierarchical
`IMS database,
`stores
`data
`concerning
`the
`origin
`and
`destination
`of
`the
`goods,
`currency,
`weight,
`and
`shipping
`conditions.
`A second
`database
`is a relational
`DB2
`database
`called
`It
`stores
`properties
`of
`the manufactured
`parts
`being
`imported
`and
`parts.
`exported.
`on an
`numbers
`part
`lists
`that
`a report
`to generate
`is used
`DIRECT
`report
`is used for
`This
`imported.
`of parts
`being
`invoice
`as well
`as descriptions
`its destination.
`The data
`the invoice
`on arrival
`at
`verifying
`the correctness
`of
`database
`and the
`needed
`to generate
`this
`report
`are stored
`in the
`invoice
`database maintained
`by different
`database management
`systems. Cur-
`parts
`rent
`technology
`requires
`the writing
`of application
`programs
`that
`extract
`and
`merge
`data from different
`databases.
`The scenario
`illustrates
`visual
`representations
`finding
`data
`elements
`in different
`databases
`equivalent
`data
`elements.
`It also illustrates
`
`for
`cally
`
`and interaction
`and for
`identifying
`a strategy
`for
`
`techniques
`semanti-
`creating
`
`of
`
`a
`
`the
`
`ACM
`
`Transactions
`
`on Information
`
`Systems,
`
`Vol
`
`12, No
`
`4, October
`
`1994.
`
`Meta Platforms, Inc.
`Exhibit 1026
`Page 011
`
`
`
`350
`
`.
`
`U Merz and R King
`
`multidatabase
`representation
`
`creating
`by
`query
`of a complete
`query.
`
`a connected
`
`query
`
`graph
`
`as the
`
`visual
`
`4.2 Specifying
`
`a Multidatabase
`
`Query
`
`describes
`scenario
`The following
`descriptions
`for
`their matching
`specify
`a multidatabase
`query,
`several
`databases.
`They
`identify
`select
`and relate
`data
`instances
`tasks,
`the
`user
`completes
`the
`database
`query:
`
`and
`numbers
`all part
`listing
`how a report
`To
`is obtained
`using DIRECT.
`invoice
`123xyz
`from
`and
`select
`data
`elements
`users
`name
`to
`data
`elements
`with
`the
`same meaning
`from the
`different
`databases.
`Given
`these
`following
`steps
`in order
`to specify
`a multi-
`
`—Select
`
`the databases
`
`that
`
`contain
`
`the needed
`
`data.
`
`elements
`the data
`—Name
`be included
`in the result.
`—Define
`how the selected
`related.
`
`to be reported
`
`and those restricting
`
`the values
`
`to
`
`instances
`
`from the same or different
`
`databases
`
`are
`
`4.2.1
`databases
`definitions.
`DIRECT.
`query.
`
`Knowing
`Databases.
`the
`Selecting
`the user
`data,
`the needed
`contain
`be stored
`will
`The data
`definitions
`The user will
`refer
`to them later while
`
`and
`the
`that
`inuoice
`their
`by importing
`starts
`internal
`in the data catalog
`specifying
`the multidatabase
`
`parts
`data
`to
`
`the
`
`definitions
`data
`the
`for
`A browser
`4.2.2
`Elements.
`Data
`Naming
`and identifying
`in locating
`assist
`a user
`techniques
`that
`interaction
`provides
`for
`the query. With
`the help
`of
`the browser,
`the user
`data
`elements
`needed
`can locate
`the case number,
`order
`number,
`part
`number,
`and part
`description
`for
`the invoice
`number
`123xyz.
`of each
`an overview
`users
`gives
`that
`The
`browser
`provides
`a function
`with
`data
`elements
`constructed
`database,
`showing
`a diagram
`of
`its named
`the
`overview
`of
`the
`their
`predefine
`relationships.
`For
`example,
`inz~oice
`the case
`as a child
`segment
`of
`database
`shows
`the item ( IVITEi!4
`) segment
`segment,
`just
`as parts
`on an invoice
`are grouped
`by the cases in
`( IVCASE)
`which
`they
`are shipped
`in (see Figure
`3).
`hierarchy.
`an inclusion
`in
`The
`browser
`organizes
`the
`data
`definitions
`they
`are
`data
`elements
`Simple
`data elements
`are grouped
`by the constructed
`part
`of.
`In turn,
`the constructed
`data
`elements
`are grouped
`by the database
`they
`belong
`to. Users
`can navigate
`through
`this
`hierarchy
`by selecting
`the
`data
`element
`names;
`this
`is similar
`to browsing
`in a hypertext
`system.
`For
`example,
`a user
`can inspect
`the detailed
`description
`of
`the case number
`by
`segment
`in the
`database,
`followed
`by
`first
`selecting
`the
`IVCASE
`irzuoice
`a field
`in the case segment.
`selecting
`the IVCACAS#,
`data
`to locate
`users
`The browser
`also contains
`a search
`facility
`that
`allows
`elements
`based
`on keyword
`descriptions.
`A user
`can locate
`the data
`element
`for
`the part
`description
`by searching
`for
`the keywords
`“part”
`and “description”
`in the dictionary
`description
`of
`the data
`catalog.
`
`ACM
`
`Transactions
`
`on Information
`
`Systems,
`
`Vol.
`
`12, No
`
`4, October
`
`1994
`
`Meta Platforms, Inc.
`Exhibit 1026
`Page 012
`
`
`
`ofitssegments.
`
`databaseshowingthehierarchy
`
`inuoice
`
`the
`
`of
`
`Overview
`
`3.
`
`Fig.
`
`Meta Platforms, Inc.
`Exhibit 1026
`Page 013
`
`
`
`352
`
`.
`
`U. Merz and R. King
`
`Path,
`Overview,
`for
`browsers
`of
`provide
`an overview
`navigate
`along
`a path
`offers
`an alternative
`
`features:
`above-described
`the
`identified
`[ 1981]
`Bobrow
`as requirements
`Access,
`and Alternative
`Views,
`Multiple
`programming
`languages.
`Multiple
`views
`object-oriented
`description
`the data
`elements.
`Users
`and a detailed
`of
`definin