throbber
Proceedings of the1989 ACM SIGMOD
`International Conference on the
`Management of Data
`Portland, Oregon
`
`~
`I
`
`SIGMOD RECORD
`Volume 18, Number 2, June 1989
`
`Editors: James Clifford, Bruce Lindsay
`and David Maier
`
`

`

`) '
`QA 7,. ~
`.03
`A3><
`1~!7
`
`The Association for Computing Machinery
`11 West 42nd Street
`New York, New York 10036
`
`c 1989 by the Association for Computing Machinery,
`Inc. Copying without fee is permitted provided that the
`copies are not made or distributed for direct commercial
`advantage, and credit to the source is given. Abstract(cid:173)
`ing with credit is permitted. For other copying of
`articles that carry a code at the bottom of the first
`page, copying is permitted provided that the per - copy
`fee indicated in the code is paid through the Copyright
`Clearance Center , 27 Congress Street , Salem, MA 01970.
`For p~.~ ssion to republish write
`to: Director of
`Pul:tli cations, Association for Computing Machinery. To
`c,o
`otherwise, or republish, requires a fee and/or
`.~,.,._.~;. .·.s 'P.!'l «?,ific permission .
`··~'tl
`'-r ·•
`b·:
`r
`
`.. ; ··. "
`. .
`
`'to\
`
`ISBN 0-89791-317-5
`
`Additional copies may be ordered prepaid from:
`
`ACM Order Department
`P.O. Box 64145
`Baltimore, MD 21264
`
`Price:
`Members .... . .. .
`All others
`
`$25.00
`$33.00
`
`ACM Order Number:
`
`472890
`
`i i
`
`

`

`Extending a Relational Database with
`Deferred Referential Integrity Checking and Intelligent Joins1
`
`Stephanie Cammarata (steph@rand.org)
`Prasadram Ramachandra (ram@rand.org)
`Darrell Shane (shane@rand.org)
`
`The RAND Corporation
`1700 Main Street
`Santa Monica, CA 90406-2138
`(213) 393-0411
`
`ABSTRACT
`
`1. Introduction
`
`Interactive use of relational database management
`systems (DBMS) requires a user to be knowledgeable about the
`semantics of the application represented in the database. In
`many cases, however, users are not trained in the application
`field and are not DBMS experts. Two categories of
`functionality are problematic for such users: (1) updating a
`database without violating integrity constraints imposed by the
`domain and (2) using join operations to retrieve data from more
`than one relation. We have been conducting research to help an
`uninformed or casual user interact with a relational DBMS.
`
`This paper describes two capabilities to aid an
`interactive database user who
`is neither an application
`specialist nor a DBMS expert. We have developed deferred
`Referential Integrity Checking (RIC) and Intelligent Join (IJ)
`which extend the operations of a relational DBMS. These
`facilities are made possible by explicit representation of
`database semantics combined with a relational schema.
`Deferred RIC is a static validation procedure that checks
`uniqueness of tuples, non-null keys, uniqueness of keys, and
`inclusion dependencies. IJ allows a user to identify only the
`"target" data which is to be retrieved without the need to
`additionally specify "join clauses". In this paper we present the
`motivation for these facilities, describe the features of each, and
`present examples of their use.
`
`' This :macarch was sponsored by the Defense Advanced Research Projects
`
`Agency under the auspices of RAND's National Defense Research Instiwte, a
`
`Federally Funded Research and Development Center sponsored by the Office of
`
`the Secretary of Defense. Views and concluSions contained in this document are
`those of the authors and should not be inte2preted as rcpm<enting the official
`opinion of DARPA, the U.S. Government, or any person or agency connected
`with them.
`
`Permission to copy without fee all or part of this material is granted provided that
`the copies are not made or distributed for direct commercial advantage, the ACM
`copyright notice and the title of the publication and its date appear, and notice is
`given that copying is by permission of the Association for Computing Machinery.
`To copy otherwise, or to republish, requires a fee and/ or specific permission.
`© 1989 ACM 0-89791-317-5/ 89/0005/0088$1.50
`
`88
`
`the advent of workstation environments,
`With
`interactive software, and public domain databases, the use of
`DBMS is no longer limited to database administrators (DBAs),
`operations managers, and application programmers. Personnel
`in many different facets of a workplace are experimenting with
`DBMS for organizing, maintaining, and sharing information
`[McCa82). In many cases, little or no database design is
`undertaken before a database is generated. Concerns for update
`anomalies and consistency maintenance, studied in theoretical
`discussions of relational database management, are rarely
`addressed in the practical data management activities of many
`organizations. Often, a novice user · simply "relationalizes" a
`flat file into an intuitive set of tables. The resulting first normal
`form database
`implicitly relates
`tables
`through common
`attributes among the relations.
`
`A complete representation of a user's application
`database should attempt to encode 1) the schema for every
`relation, 2) the data stored in the relations, and 3) the semantic
`relationships among relations. The first two categories are
`captured in every relational system. However, the semantics of
`the application is seldom expressed explicitly and is usually left
`to an individual user to interpret. Unfortunately, few DBMS
`tools and languages have facilities to store semantics and aid
`users
`in
`interpreting
`these
`semantics
`[Tou82,Blum87,Neuh88, Jone87].
`Although
`common
`attributes between tables are based on underlying semantic
`relationships between relations, the relational model and its
`various implementations place no restrictions on the naming of
`attributes. Experienced users may establish
`their own
`conventions for relation and attribute names but no relational
`DBMS represents or enforces such conventions. Therefore,
`without an explicit conceptual model, it is difficult for users to
`access and validate the information they need [Curt81).
`
`Developing an information model during database
`design is one means of expressing this information [Nava86).
`However, commitment to such a formal effort is infrequent and
`the resulting model is usually a paper documentation aid,
`unavailable to interactive users. Another DBMS support tool,
`the data dictionary,
`interfaces a database
`to external
`applications by defining
`interface ent!I.Jes,
`application
`transactions, and generating reports, but is not suitable for a
`casual
`interactive user
`(Alle82, Dolk87). Our approach
`recommends generating a knowledge base or information
`dictionary to capture previously implicit semantics of an
`existing relational schema and database. Research efforts
`toward integrating DBMS with expert systems have also
`
`

`

`adopted similar techniques [Reha85, Al-Z87, Schu88]. The
`universal
`relation model
`indirectly
`represents
`relational
`to achieve complete access path
`metadata by aiming
`independence [Maie82].
`
`two capabilities, deferred
`We have developed
`Referential Integrity Checking (RIC) and Intelligent Join (IJ),
`which extend
`the operations of a relational database
`management system by utilizing explicit semantics and
`supplemental metadata combined with a relational schema.
`Deferred RIC is a database validation process that checks
`uniqueness of tuples, non-null keys, uniqueness of keys, and
`inclusion dependencies. This procedure can be invoked by the
`user or the system at specified intervals. U allows a user to
`identify only the "target" data which is to be retrieved without
`the need to additionally specify "join clauses". IJ subsequently
`navigates through the relations to generate the necessary join
`operations. These capabilities are supported by a metadata
`network constructed from both the relational schema and
`extended mctadata stored in an information dictionary.
`
`In the next section we present our previous work
`focussing on relational metadata and introduce an example
`database that we refer to throughout the remainder of the paper.
`Section 3 discusses a network representation for relational
`metadata which facilitates both Referential Integrity Checking
`and Intelligent Join algorithms. Discussion and examples of
`RIC are presented in section 4, and section 5 details IJ. We
`conclude with a discussion of the limitations, and suggestions
`for future work.
`
`1. liD: An extended lnformatlob dictionary
`
`In previous work we have developed an Intelligent
`Information Dictionary (liD) to address the issues described
`above. no serves as an interface between an interactive user
`and the query language of a relational DBMS [Camm88]. no
`is
`implemented in Franzlisp Flavors running on a Sun
`Microsystems workstation. The dictionary communicates
`directly with the Ingres relational DBMS (also resident on a
`Sun machine) through the Lingres system, a Lisp to lngres
`interface we previously implemented. Lingres provides the full
`functionality of the Quel query language, accessed from Lisp
`and Flavors.
`
`1.1IID functionality
`no aids a user in understanding the organization of a
`relational database by representing both the constructs of a
`relational database, and domain specific knowledge acquired
`to augment a
`from an application specialist. liD serves
`relational schema with supplemental metadata. By combining
`domain knowledge with knowledge of relational database
`concepts, no supports
`interactive
`tools
`for browsing,
`customized data manipulation, and interactive value checking.
`Figure 1 shows the schema and extensional data of our test
`database, atlas. This geography database consists of seven
`relations, each with a primary key (which is underlined,
`"===", in figure 1). In many relational database applications,
`users are supplied with only the information shown in figure 1.
`The atlas database includes many of the typical anomalies
`found in first normal form databases.
`
`In figure 2 we present portions of liD metadata,
`supplied by a domain expert, for the relations animal and
`country including a metadata description for the atlas
`database. In addition to metadata entries for relations, liD also
`represents column metadata such as value constraints, units
`
`information and conversion, and default values. no is intended
`to serve the users' need for extended database capabilities, and
`not as a data model. However, because liD represents and uses
`the semantics of the database, considerable overlap exists
`between the capabilities of liD and ER (Entity-Relationship) or
`semantic modeling [Hull87]. In the future, we plan to develop
`a more complete modeling environment based on the existing
`no framework.
`
`One important component of an no knowledge base
`information about "interlinks". Interlinks represent the
`is
`semantic information prescribing exactly how two or more
`relations are implicitly related. This information is generally
`referred
`to as
`integrity constraints, expressing structural
`conditions of a relational database. Knowledge of semantic
`is essential for
`interlink
`information and key attributes
`interactive users. However, without a facility like liD, there is
`no repository for this information. In figure 3, we show
`interlink metadata describing the relationships between the
`relations weather,
`and
`country,
`vegetation.
`Interlink identifiers correspond to the attribute "interlink-list" in
`figure 2. Specification of common columns or "join fields" are
`indicated
`in
`the "from-column-list" and "to-column-list".
`Information contained in interlinks combined with the attribute
`"key-list" found in the relation metadata make explicit the
`information needed by users for checking the referential
`consistency of a relational database and for manipulating the
`underlying data.
`
`Until now, interlink information in liD was strictly
`passive. Facilities were available for a user to browse through
`an TID knowledge base to learn about the database. However,
`information about
`interlinks and keys did not actively
`of
`the DBMS
`contribute
`during
`the manipulation
`[Koss87, Gray88]. Our objective for the work described in this
`paper, was to build an operational extension to the capabilities
`of a relational DBMS which would automatically use this
`information to aid the user in retrieving data and checking for
`structural integrity.
`
`2.2. Extended capabilities supported by liD
`
`One important reason for representing relational
`metadata is to enable referential integrity checking (RIC).
`Validating referential integrity involves two categories of
`constraints: key constraints and referential constraints. A key
`constraint is implied by the existence of candidate keys and
`requires unique and non-null key values. Referential
`constraints are entailed by the relationship between a key in
`one relation and a foreign key in another. At any given time,
`the value of a foreign key in the first relation must be either
`null, or must be a key value in some tuple of the other relation.
`
`Much research has addressed referential integrity,
`however, with the exception of Sybase, we know of no other
`(excluding PC-based DBMS) which
`commercial DBMS
`enforces referential integrity in real time during database
`manipulation [Casa88]. Our philosophy, however, is not to
`provide "immediate" referential checking. Instead, we are
`promoting "deferred" referential checking. In many cases, the
`benefits of immediate checking do not justify the excessive
`overhead [Lafu82, Hato88]. Deferred RIC can be initiated by
`the user or by the system at specified points in time. For
`example, many application databases arc not designed using
`theoretical principles of relational normalization. Therefore, the
`consistency of such databses is questionable. Furthermore,
`users are not prevented from changing the value of a key
`attribute or violating inclusion dependencies. Deferred RIC
`
`89
`
`

`

`•,
`
`fauna table
`
`1
`I distfeature
`Jlono
`tlat
`I animal
`I country
`1---------- ----- - - ------ - -J ••••••••.......••••••••••••.. • .••• /
`I india
`1 tioer
`1
`601
`751 burnino eyes
`1
`I
`I
`tussr
`Jpenouin
`701
`100Jmajestic
`I usa
`I doo
`I
`40 I
`100 I up-to-tricks
`I
`I canada
`I penouin
`I
`60 I
`100 I sloppy
`I
`I china
`I doo
`I
`40 I
`125/listless
`1
`I australia
`130 I jumps
`1 kanoaroo
`1
`301
`1
`!elephant
`I
`601
`75/trunked
`I
`I
`!india
`I
`I
`601
`751
`I
`I
`I
`I
`60 I
`751
`I
`I india
`I doo
`I
`60 I
`751 hooded
`I
`I japan
`I tioer
`I
`7 51
`10 I burnino-eyes
`I
`I
`Jindia
`Jtioer
`601
`75Jbody-stripes
`1
`1--- ---------------------------------------------------------J
`
`country table
`
`I country
`
`Jlatnorth
`
`tlatsouth
`
`Jlonoeast
`
`Jlonowest
`
`1
`
`I ------------1 ... .................................................... I
`
`90 I
`60 I
`8 I
`78 I
`I
`!india
`I
`1901
`201
`401
`751
`tussr
`65/
`125/
`30/
`50/
`1
`tusa
`I
`601
`125/
`501
`751
`!canada
`I
`1351
`75/
`251
`55/
`!china
`I
`1551
`115/
`401
`101
`!australia
`I
`I
`150/
`1001
`30/
`50/
`0 I
`0 I
`0 I
`0 I
`I
`/india
`J--------------------------------------------------------------------1
`
`weather table
`
`Jlatnorth
`
`tlatsouth
`
`tlonoeast
`
`Jlonowest
`
`1 zone
`
`1 avorain
`
`1
`
`I -------------------------------------------------------1 ............................. I
`
`I
`I
`781
`81
`60 I
`90 I tropic
`60 I
`I
`I
`751
`401
`20\
`190Jtundra
`201
`I
`I
`50 I
`30 I
`50 I
`1251
`651 temperate
`I
`1
`751
`501
`1251
`60Jtundra
`401
`I
`1
`551
`251
`751
`135Jtropic
`601
`I
`I
`101
`401
`1151
`155jtemperate
`201
`I
`I
`40 I
`451
`351
`201
`0 !mediterranean
`I
`I
`751
`OJ
`501
`301
`501
`1-------------------------------------------------------------------------------------l
`natwildlife table
`economy table
`
`jcountry
`
`Jnatanimal
`
`jnatbird
`
`I
`
`I ------------1 ......................... I
`
`I
`I india
`1 peacock
`1 tioer
`I ussr
`I
`I
`I penouin
`I
`I eaole
`I
`I usa
`I
`Jkiwi
`Jkanoaroo
`!australia
`I
`I
`I
`I china
`I
`I
`I
`I canada
`J--------------------------------------1
`
`animal table
`
`I
`1 percapita
`Jonp
`1 country
`I ==--=------- 1 •...••...•...••.••...•..••. I
`210 I
`300 I
`1 india
`1
`I
`tussr
`24001
`83701
`jusa
`1
`42001
`172201
`!canada
`1
`4501
`174301
`260 I
`1 china
`1
`2751
`12000 I
`1 australia
`1
`2311
`I
`jusa
`42001
`172201
`1---------------- ----------------------- -J
`
`1 animal
`
`J distfeat ure
`
`J country found
`
`Jmaxspeed
`
`1 anmtype
`
`I avoheioht
`
`I avoweioht
`
`I avo life
`
`I
`
`I ------------1 ...................................................... ............................................. · · · · I
`
`I
`/tiger
`Jburnino-eyes
`!india
`1
`40/carnivorous
`361
`1501
`201
`I
`!elephant
`Jtrunk
`tchina
`1
`10Jherbivorous
`1001
`5001
`301
`/kangaroo
`Jjumps
`/australia
`1
`25/pouched
`841
`J
`1801
`151
`I
`100 I
`30 I
`1 pe nouin
`1 sloppy
`1 ussr
`1
`21 polar
`361
`I doo
`1 listless
`I usa
`I
`20 I omnivorous
`361
`I
`80 I
`121
`1--------------------------------------------------------------------------------------------------------------------l
`vegetation table
`
`I zone
`
`I avorain
`
`I maxtemp
`
`I mintemp
`
`I tree type
`
`I main crop
`
`I
`
`I ---------------1 ... ................................................................ · · · · .. I
`
`I
`I corn
`-30 I deciduous
`1001
`30 I
`I
`I temperate
`I tropic
`I
`I rice
`0 I everoreen
`llO I
`60 I
`I
`I
`I none
`I tundra
`-601 coniferous
`801
`201
`1
`I
`I bamboo
`40 I evergreen
`110 1
`751
`I
`I equatorial
`J---------- ----------------- ------------------------------ ---------- - ----- ----------------1
`Figure 1: Atlas database with anomalies
`
`90
`
`

`

`relation-list:
`interlink-list:
`
`Database-supplement: "ATLAS"
`name:
`11 ATLAS"
`11 atlas database contains an example of information about
`description:
`different countries. This is used to illustrate the procedures
`developed for Referential Inteority checkino and Intellioent
`Join operation."
`(WEATHER COUNTRY ECONOMY NATWILDLIFE AN I MAL VEGETATION FAUNA)
`(exhibits/is-exhibited-by is-inhabited-by/lives-in
`has/is-associated-with has-inhabitants/lives-at
`supports/is-supported-by is-conducive-to/orows-in
`is-represented-by/represents
`has-characteristics/is-distinou ished-by
`has-same-distinouishino-features-as/
`has-same-distinouishino-features-as)
`(POSITION-VECTOR COUNTRY-NAME)
`ATLAS
`
`oroup-li st:
`ldatabase:
`
`description:
`
`Relation-supplement: "COUNTRY"
`name:
`11 COUNTRY"
`name-explanation: "This relation 'country• contains the position variables
`for every country • "
`"Each record describes a country's approx. boundaries on
`the four sides."
`(COUNTRY LONGWE.ST LONGE.AST LATSOUTH LATNORTH)
`( (COUNTRY) )
`(POSITION-VECTOR COUNTRY-NAME)
`COUNTRY
`ATLAS
`(supports/is-supported-by has-inhabitants/lives-at
`has/is-associated-with is-inhabited-by/lives-in
`exhibits/is-exhibited-by)
`
`column-list:
`key-list:
`oroup-1 i st :
`lrelation:
`mydatabase:
`interlink-list:
`
`Relation-supplement "ANIMAL"
`name:
`name-explanation:
`description:
`column-list:
`
`key-list:
`oroup-1 i st :
`lrelation:
`mydatabase:
`interlink-list:
`
`11 ANIMAL 11
`•animal is a relation containinQ all animals"
`"Each record details an animal's characteristics"
`(AVGLIFE AVGWEIGHT AVGHEIGHT ANMTYPE MAXSPEED COUNTRYFOUND
`DISTFEATURE ANIMAL)
`((ANIMAL))
`nil
`ANIMAL
`ATLAS
`(has-same-distinouishino-features-as/has-same-distinouishinQ
`-features-as has-characteristics/is-distinouished-by
`is-represented-by/represents is-inhabited-by/lives-in)
`
`Figure 2: liD metadata
`
`does not prevent these anomalies, however, it will subsequently
`detect the errors and notify the user of inconsistencies.
`Detailed discussion of RIC procedures and examples applying
`RIC to the atlas database are found in section 4.
`
`Another compelling reason for needing to know
`interlink information focuses on the "select" or "retrieve"
`command. In both SQL and Que!, the user composes the
`desired selection using a qualification clause to effect an
`equijoin operation. In many cases, more than one equijoin over
`the relations is necessary to retrieve the desired information.
`For example, in the atlas database, suppose a user wishes to
`retrieve the treetype found in India. Ideally, the user would
`like to submit a query such as:
`
`retrieve (country.country, vegetation.treetype)
`where country.country = "india"
`
`However, this request requires a join across an intermediate
`relation, namely, weather. Therefore, to retrieve the desired
`data, the following query is necessary:
`
`retrieve (country.country, vegetation.treetype)
`where country.country = "india" and
`country.latnorth weather.latnorth and
`country.latsouth weather.latsouth and
`country.longeast weather.longeast and
`country.longwest weather.longwes t and
`weather.zone = vegetation.zone
`
`In this example, the user must know implicit "interlink"
`and
`information
`relating
`the
`relations
`country
`vegetation and the user additionally needs to compose the
`query to join the relations country, weather, and
`vegetation. (In the remainder of this paper, use of the term
`"join" implies "equijoin".)
`
`In effect, the user must navigate through the relations
`to establish the desired correspondences. It is interesting to note
`that when the relational model was introduced, declarative
`query languages were cited as an important benefit. Indeed, we
`have come a long way from tracing record pointers and
`maintaining currency indicators. Nevertheless, a notion of
`"navigation" still remains. Navigation in a relational context
`implies navigating through a "conceptual" model rather than a
`"physical" model. The Intelligent Join capability we have
`developed uses explicit metadata in liD to perform the
`
`91
`
`

`

`Interlink: •exhibits/is-exhibited-by"
`from-relation:
`COUNTRY
`from-column-list:
`(LATNORTH LATSOUTH
`LONGEAST LONGWEST )
`WEATHER
`(LATNORTH LATSOUTH
`LONGEAST LONGWEST)
`cardinality: :ONE to 1.
`
`to-relation:
`to- column-list:
`
`from-relation:
`from-column-list:
`
`WEATHER
`(LATNORTH LATSOUTH
`LONGEAST LONGWEST)
`COUNTRY
`(LATNORTH LATSOUTH
`LONGEAST LONGWEST)
`cardinality: :ZERO-MANY to 1.
`
`to-relation:
`to-column-list:
`
`Interlink: "is-conduclve-to/orows-in"
`from-relation:
`WEATHER
`from-column-list:
`(ZONE)
`to-relation:
`VEGETATION
`to-column-list:
`(ZONE)
`cardinality: :ONE to 1.
`
`VEGETATION
`from-relation:
`(ZONE)
`from-column-list:
`to-relation:
`WEATHER
`to-column-1 ist: (ZONE)
`cardinality: :ZERO-MANY to 1.
`
`Figure 3: llD interlinks
`
`navigation and compose join clauses automatically for the user.
`In section 5 we provide a more detailed explanation and
`examples of IJ queries.
`
`3. Metadata as a network structure
`
`RIC and IJ capabilities require explicit representation
`?f s~~antic metadata, which is traditionally recognized as
`1mphc1t. user knowledge. Metadata in llD, such as figures 2
`and 3, 1S represented as object-oriented instances of classes
`such as "Relation-supplement", "Column-supplement", and
`''lnt~rlink". However, to better facilitate navigation through a
`relat10nal database, we augment our object-oriented llD
`representation with data structures corresponding
`to
`the
`network-like organization of a relational database. A network
`llD
`representation
`is generated directly
`from selective
`metadata. The network is constant throughout the life of the
`database (unless schema modifications are allowed); therefore,
`the overhead incurred by building the network is a one-time
`cost.
`
`Given a relational schema, exactly one network can
`be produced which represents the database schema. The
`the network helps contribute
`to
`the
`uniqueness of
`computational efficiency when traversing and processing the
`network by eliminating the problem of recognizing and
`manipulating isomorphic networks. Although each relational
`schema maps to a single network, it is possible for more than
`one schema to map to a common network. This situation occurs
`when the databases have the same "meta-schema", that is, the
`same number of interlinks and the same kinds of constraints
`over those interlinks. In the remainder of this paper, references
`to a "relational schema" include supplemental metadata, such
`as keys and interlinks, and references to a "network" denote an
`llD metadata network.
`
`3.1. Building the metadata network
`
`Nodes in the network represent relations, and links
`capture interlink information between the nodes. If n interlinks
`are defined between two relations, then the corresponding
`nodes are connected by n links. For example, in figure 2 there
`are nine interlinks declared in the interlink-list of the
`atlas database-supplement; therefore, nine links are
`represented in the corresponding network. Links not only
`represent an abstract relationship but also encode attributes
`which the two relations have in common. The network supports
`three different kinds of links depending on the characteristics of
`the common attributes in the relations denoted by the connected
`nodes. These characteristics distinguish between common
`attributes which are a key in one, both, or neither of the
`relations. We identify the corresponding link types as singly
`directed, doubly directed, or undirected links. Below we detail
`the specifications of each link type.
`
`In a database with k relations, R l' R 2, ... , Rk, the
`corresponding metadata network has nodes n 1, n 2, ... , nk where
`the ith relation, R;o is represented by n1• An undirected link
`between two nodes, n1 and n i' indicates that there exists a set of
`attributes common to both relations, R1 and R ., which is a key
`in neither relation. A doubly directed link ~tween n. and n .
`signifies that there exists a common set of attributes whlch is ~
`key in both R1 and Rr A singly directed link emanating from n1
`indicates that there exists a set of common attributes between
`R1 and Ri that is a key in Ri. A singly directed link also
`implies that the common attributes are a foreign key in R ..
`While there can be at most one undirected link between ty)o
`nodes, there can exist zero or more singly and doubly directed
`links. Multiple singly directed links denote multiple foreign
`key/key relationships between two relations. More than one
`doubly directed link between two relations occurs only when
`there are multiple candidate keys in both relations.
`
`The network corresponding to the atlas database
`is shown in figure 4, indicating nodes, links, and common
`attribute sets prescribed by the above rules. We have also
`underlined key attributes participating in each link. In this
`example, only one doubly directed link is found between any
`pair of nodes because we have defined only a single primary
`key for each relation in the atlas database. For some links
`the common attributes have the same name, for instance,
`country identifies
`the common attribute
`in both
`the
`country and natwildlife relations. However, attribute
`names are assigned for convenience and there is no restriction
`on naming conventions. For example, the common attributes
`between the relations animal and natwildlife are
`named animal and natanimal respectively. Therefore,
`the network must explicitly name the common attributes for
`both relations, and if there is more than one link between two
`nodes, proper correspondences must be maintained. It is also
`important to note that the contents of the database are not
`reflected in the network. Instead, the network captures the
`structure of the database among abstract entities such as
`relations and attributes. When it is necessary to refer to the
`contents of the database, for example, during RIC checking, we
`access the relations directly through Ingres.
`
`So far, we have discussed only the foundations for
`the RIC and IJ capabilities. The network representation and
`encapsulation of a relational schema has proven to be an
`expressive and powerful formalism for explicit representation
`of relational metadata. In the subsequent sections, we present
`our methodology for irttegrity checking and intelligent joins
`within the framework of a metadata network.
`
`92
`
`

`

`Figure 4: Metadata network with common attribute sets
`
`4. Deferred referential Integrity checking
`
`This section describes work addressing one category
`of integrity constraints, namely, referential integrity. These
`constraints help describe the structural integrity of a relational
`schema and database. Unlike "value" constraints which encode
`information about allowable values and are enforced with
`predicates, referential integrity constraints focus on allowable
`mappings between relations. Although the particular values of
`an attribute arc irrelevant, the use of those values for mapping
`across relations reflects the structural requirements of the
`database schema.
`
`Two different approaches can be applied to enforce
`referential integrity. One method prevents integrity violations
`through the use of triggers and demons, and is intimately
`integrated into the data manipulation routines of the database
`[Ston85]. While this ensures a consistent database at all times,
`the associated overhead is usually high. The second option,
`which we have adopted, supports "deferred" referential
`integrity checking (RIC). In this approach, update anomalies
`are not blocked or prevented. Rather, deferred RIC performs a
`complete "sweep" of the database at regular intervals notifying
`the user of inconsistencies.
`
`4.1. Background
`
`We have observed the use of first normal form
`databases for simulation and modeling applications. In these
`domains, global databases are acquired as flat files from outside
`agencies. When corresponding lngres databases are generated,
`
`they are already inconsistent and erroneous. Furthermore, users
`select and combine subsets of the global data to create their
`own local databases for their application models. Once their
`local databases are satisfactorily derived, few modifications are
`made to the data. In these simulation laboratories, deferred
`RIC is being used to validate entire global databases and also
`local derived databases. In addition, users can modify the
`integrity constraints of their local data by changing liD
`metadata. In this context, deferred RIC is an ideal solution for
`"scrubbing"
`inconsistent global databases and enforcing
`referential integrity within local versions.
`
`Opponents of deferred validation argue
`that
`processing an entire database is expensive. Indeed, deferred
`checking is a dedicated, time-intensive process. However, in
`the scenario described above, validating the entire database is a
`task performed overnight when the database is idle. Local
`databases are usually small enough that they can be validated
`during a lunch hour! Although we have not yet performed any
`systematic studies or timings, our intuition suggests that many
`interactive database applications resemble the situations we
`have observed.
`
`4.2. RIC algorithms
`
`The current implementation to validate referential
`integrity entails three rules: The first rule requires non-null key
`values. The second restriction, also regarding key attributes,
`insists that key values be unique. The third rule addresses
`inclusion dependencies between foreign key and key attributes.
`However, before applying these rules to a database, deferred
`RIC first checks for duplicate tuples in each relation. Although
`
`93
`
`

`

`relational databases
`theory underlying
`set-oriented
`the
`precludes duplicate "tuples", relational DBMS implementations
`do not enforce their uniqueness. The need for this type of
`validation check was motivated directly by the simulation and
`modeling applications discussed above, where duplicate tuples
`were responsible for extreme errors during selections which
`involved aggregation functions, such as, count and sum.
`
`We have implemented RIC algorithms corresponding
`the above
`four criteria. These procedures utilize
`to
`information, such as keys and interlinks, encoded in the
`metadata network described in section 3. Because interlink
`mappings are identified by syntactic pattern matching across
`common attributes of relations, the RIC facility automatically
`to validate referential
`generates the procedures necessary
`integrity for any given relational schema.
`
`In the following four subsections, we explain the
`method we have adopted for implementing each RIC rule. We
`also provide examples in each subsection demonstrating an
`application of the rule's procedure to the atlas database.
`The entire RIC process produces four error flles corresponding
`to each RIC rule. The examples shown below are extracted
`from these output error files.
`
`4.2..1. Unique tuples
`
`The algorithm for validating uniqueness of tuples
`uses the Lingres (and corresponding Ingres) functions "count"
`and "countu[nique]". Both coun

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket