throbber
Intensional Associations Between Data and Metadata
`
`Divesh Srivastava
`AT&T Labs–Research
`divesh@research.att.com
`
`Yannis Velegrakis
`University of Trento
`velgias@dit.unitn.it
`
`[12]
`
`[5]
`
`[16]
`
`[24]
`[14]
`[29]
`
`[3]
`
`Ref. Method Used
`Atomic value annotations attached to a block of
`values within a tuple. They accompany the values
`as retrieved. Relational algebra query language.
`Atomic data values carry their provenance, which
`propagates with them as they are retrieved. Query
`language supports predicates on provenance.
`Data values are associated with quality parame-
`ters (accuracy, freshness, etc.). SQL is extended
`to retrieve data using these parameters.
`Explicit modeling of schema and mapping infor-
`mation, and associations of it with portions of the
`data. SQL extension to retrieve data and meta-
`data that satisfy certain metadata properties.
`Credential-based access control. System reads
`complex security profiles and returns data results
`accordingly.
`Loosely-coupled model of information elements,
`marks and links used to represent superimposed
`information. It has no specific schema, but is in
`relational model and can be queried using SQL.
`Creation and modification time is recorded with
`the data and used in query answering. Query lan-
`guage supports predicates on time values.
`
`[15]
`
`[6]
`
`Kind
`
`Annotations
`
`Provenance
`
`Quality
`Parameters
`
`Schema &
`Mapping
`Information
`
`Security
`
`Super-
`imposed
`Information
`
`Time
`
`ABSTRACT
`There is a growing need to associate a variety of metadata with
`the underlying data, but a simple, elegant approach to uniformly
`model and query both the data and the metadata has been elusive.
`In this paper, we argue that (1) the relational model augmented with
`queries as data values is a natural way to uniformly model data, ar-
`bitrary metadata and their associations, and (2) relational queries
`with a join mechanism augmented to permit matching of query re-
`sult relations, instead of only atomic values, is an elegant way to
`uniformly query across data and metadata. We describe the archi-
`tecture of a system we have prototyped for this purpose, demon-
`strate the generality of our approach and evaluate the performance
`of the system, in comparison with previous proposals for metadata
`management.
`Categories and Subject Descriptors:
`H.2.1 [Database Management]: Logical Design – Data Models
`General Terms: Management.
`Keywords: Queries as data, metadata management, intensional as-
`sociations, annotations.
`
`1.
`
`INTRODUCTION
`In recent years we have witnessed a tremendous proliferation of
`databases in many fields of endeavor, ranging from corporate en-
`vironments and scientific domains to supporting a diverse set of
`applications on the web. These databases are becoming increas-
`ingly complex, both in their internal structure (e.g., thousands of
`tables) and in their interactions with other databases and applica-
`tions (e.g., mediators and workflows). There is a consequent need
`for understanding, maintaining, querying, integrating and evolving
`these databases. In successfully performing these tasks, metadata
`plays an important role. Metadata is data about data, a secondary
`piece of information that is separate in some way from the primary
`piece of information to which it refers. Metadata examples include
`schema, integrity constraints, comments about the data [4], ontolo-
`gies [1], quality parameters [28, 16], annotations [5, 12], prove-
`nance [24], and security policies [3].
`Metadata is used in many different fields. In corporate environ-
`ments, databases deployed at the core of important business oper-
`
`Permission to make digital or hard copies of all or part of this work for
`personal or classroom use is granted without fee provided that copies are
`not made or distributed for profit or commercial advantage and that copies
`bear this notice and the full citation on the first page. To copy otherwise, to
`republish, to post on servers or to redistribute to lists, requires prior specific
`permission and/or a fee.
`SIGMOD’07,June12–14,2007,Beijing,China.
`Copyright 2007 ACM 978-1-59593-686-8/07/0006 ...$5.00.
`
`Table 1: Metadata management proposals
`
`ations may contain erroneous, inaccurate, out-of-date, or incom-
`plete data with a significant impact on query results [8]. In such
`applications, data can be tagged with quality parameters to com-
`municate suitability, accuracy, freshness or redundancy [27, 28],
`and schema structures can be annotated with textual description
`to communicate their semantics. In scientific domains where data
`may be collected from various sources, cleansed, integrated and
`processed to produce new forms of data enhanced with new analy-
`sis results [19], provenance can be provided as metadata in the form
`of annotations [4] and schema and mapping information can be
`stored in special structures to allow users to apply their own judg-
`ment to assess credibility of query results [24]. In heterogeneous
`environments where different sources may use different structures
`to represent the same real world entity, or the same representation to
`model different concepts, metadata can clarify semantics, prevent
`misinterpretations or misuses of data, achieve interoperability [10]
`and allow the retrieval of the data that is best suited to the task at
`hand [21]. In the Internet domain, superimposed information, i.e.,
`data “placed over” existing information sources, is used to help or-
`ganize, access, connect and reuse information elements in those
`sources [15]. Similarly, security related policies can be associated
`to the data to control access in various environments [3].
`
`401
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2213
`Page 1 of 12
`
`

`
`1.1 Metadata Management Approaches
`Over the years, numerous proposals have been made by re-
`searchers for augmenting the data model and the query capabilities
`of a database in order to facilitate metadata management. Table 1
`provides a list of such proposals. While the list is by no means ex-
`haustive, it provides a good sample of the kinds of metadata that
`have been considered of interest and have been studied.
`It also
`characterizes the way the problem has been approached. Based on
`these approaches one can observe the great variety of metadata, the
`different structures, the kind of data associated with metadata, and
`the way metadata has been used in queries. Some of this metadata
`is described by single atomic values, e.g., the creation time of an
`element in the database [6]. Others have a more complex structure,
`like the schema mapping information [24] or security [3]. Further-
`more, metadata can be associated either to individual data values
`[5, 16, 6] or to a group of values, such as a subset of the attributes
`(i.e., a block) of a tuple [12] or a complex XML element [3].
`A common denominator of the approaches in Table 1 is the use
`of metadata in querying. Some use it to restrict query results, which
`may [6, 16] or may not [5] include metadata alongside the actual
`data results. Others query and retrieve metadata independently of
`the data to which it is associated [15].
`It is also worth observing that each entry in Table 1 is tailored
`to specific kinds of metadata, and is not directly applicable to other
`kinds, at least not without some major modifications. Past attempts
`at building generic metadata stores (e.g., [13, 2]) have employed
`complex modeling tools for this purpose:
`[13] explicitly repre-
`sented the various artifacts using Telos [17], and [2] employed
`data repositories (intended for shared databases of engineering ar-
`tifacts). A simple, elegant approach to uniformly model and query
`data, arbitrary metadata and their association has been elusive.
`Another observation from Table 1 is that metadata has to be ex-
`plicitly associated with each data item it refers to. There are, how-
`ever, practical scenarios in which an intensional association may
`be more appropriate. For example, in an application that contains
`reviews about restaurants [4], one may need to describe a property
`that holds for all the restaurants in New York. Using previous pro-
`posals this information would have had to be associated explicitly
`to every New York restaurant in the database. Furthermore, future
`tuples cannot be accommodated without explicit association. For
`instance, if ten new restaurants are opened in New York, the prop-
`erty that holds for all the restaurants in New York should also hold
`for them. However, this cannot be achieved unless the property
`gets explicitly associated to these ten restaurants. An alternative
`approach (which we advocate) is to use an intensional description
`of the restaurants (data) that have that property (metadata), in the
`same way virtual views use queries over base tables to describe
`their instances, and future tuples that are inserted or deleted from
`the base tables, are automatically included or removed from the
`instances of the views.
`1.2 Our Approach
`In this work, we describe a study that aims to accommodate in
`one simple framework the different kinds of metadata, the differ-
`ent structures, the different ways that metadata is associated with
`data and the different ways in which it is used in queries. We argue
`that in order to achieve this we need a framework that is simple
`and abstracted from the specifics of each kind of metadata. Having
`observed that the operations one needs to perform on metadata are
`similar to those people do with data, we propose the use of standard
`data management techniques for metadata, so that both data and
`metadata can be managed in one single framework. We advocate
`that the relational model is adequate for such a purpose. Metadata
`
`with complex structures can easily be modeled through relations
`and attributes. These relations have no special semantics, thus, the
`same piece of information can be viewed either as data or as meta-
`data. It can also be queried using a relational query language, even
`independently of whether or not it is associated to some data.
`Our philosophy is that although, at the conceptual level, there
`may be a distinction between data and metadata, at the database
`level, for the purpose of management, everything is represented as
`a relation. This approach is not different from the one followed by
`the relational model, where at the conceptual level there may be a
`distinction between entities and relationships, but at the database
`level, everything is represented through relations.
`The main mechanism used in the relational model to associate
`data in different relations is the join on one or more attributes.
`If data and metadata have been modeled as relations, then the
`same mechanism can be used to describe the association between
`data and metadata. Unfortunately, the relational join operation has
`two main limitations that make it inadequate for this intended use:
`(i) The association is always at the tuple level, i.e., it is not possible
`to associate a metadata tuple with only a subset of attributes of an-
`other tuple in a different table, since there is no way to specify the
`attributes that the metadata refers to; (ii) It requires explicit associ-
`ation between the tuples through the join attributes, as previously
`mentioned in the New York restaurants example. To cope with
`these issues, we propose the use of queries as values in the rela-
`tional tables. In particular, we show how attributes of type “query”,
`can achieve the required functionality.
`Using queries as data values is not entirely new. Relational
`DBMSs already store in the catalog tables the definition queries of
`their views. In this case, however, queries are considered schema
`information and despite the fact that they can be queried using SQL,
`they are not considered part of the instance data. Our proposal
`raises such metadata to the level of data, and provides a unified
`mechanism for modeling and querying across data and metadata.
`Apart from the system catalog tables, queries as values have been
`proposed in INGRES [23], with a similar functionality adopted by
`Oracle [11]. They have also been studied in the context of relational
`algebra [18], and the Meta-SQL system [9]. Here we show how this
`idea can be used in the service of metadata management. A key dif-
`ference is that previous approaches require the existence of an eval
`operator that evaluates the queries stored as values at run time, pos-
`sibly resulting in nested relations, or in some cases the computation
`of the outer-union of these results. In contrast, we only need a new
`kind of predicate that makes use of eval, leading to a more efficient
`implementation. Our use of queries as values is similar to the role
`of RDF resource descriptions [26], but our approach is much more
`generic, since we can use as a resource description the full power
`of SQL queries.
`Our contributions can be summarized as follows:
`1. We elevate metadata to first class citizens of the database
`and the query language, without requiring any special se-
`mantics. The approach allows metadata management with-
`out any modification of the semantics of the relational model
`and SQL, and without having to alter existing tables, since
`we use stored queries to refer to the data which the metadata
`entries describe.
`2. We extend the traditional join mechanism of the relational
`model to support joins that are based not on single values,
`but on a relation specified by a query stored in one of the
`attributes. This allows intensional specification of the data
`to which the metadata is associated. Furthermore, it allows
`metadata tuples to be associated to not just whole tuples, but
`also to portions of them.
`
`402
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2213
`Page 2 of 12
`
`

`
`Customers
`Name Type Loc PhoneLine CircuitID
`AFLAC bus NJ
`4078417332 245-6983
`J. Lu
`res
`NY 2019394460 245-7363
`H. Ford res
`NJ
`2159537607 245-7564
`AMEX bus NY 3178763540 343-5002
`NJC
`bus NJ
`9730918327 981-5002
`BCT
`bus NJ
`9734858504 273-6019
`...
`...
`...
`...
`...
`
`Provenance
`Protocol
`Rf1 Source IP
`http
`q1 NJDB 147.52.7.8
`q2
`3State
`148.62.1.11 ftp
`...
`...
`...
`...
`
`Permissions
`Rf2 Users
`q11 Administrators
`q12 Guests
`...
`...
`
`q1: select Name,Type,PhoneLine
`from Customers where Loc=’NJ’
`q2: select Loc,PhoneLine,CircuitID
`from Customers where Type=’business’
`
`q11: select * from Provenance
`where IP LIKE ’147.%’
`q12: select Name from Customers
`where Loc=’NY’
`
`XJ
`Qc Qt
`qC1 qT 1
`qC2 qT 2
`qC3 qT 3
`...
`...
`
`Technicians
`Company
`Name
`Contact
`W. Farkas 4804978353 AT&T
`S. Gilbert 3178757627 Verizon
`M. Henry 8187167852 AT&T
`C. Urs
`7739735713 AT&T
`Y. James
`7344676191 CISCO
`
`qC1. select CircuitID from Customers where Type=’residence’
`qT 1. select * from Technicians where Company=’CISCO’
`qC2. select PhoneLine,CircuitID from Customers where Type=’business’
`qT 2. select * from Technicians where Company=’Verizon’
`qC3. select PhoneLine from Customers
`qT 3. select * from Technicians where Company=’AT&T’
`
`Figure 1: A database with metadata information stored in regular tables as data.
`
`3. We explore alternative implementation mechanisms that al-
`low the use of queries as data values in modern relational
`databases and also allow joins based on such values. We
`present pure rewriting-based strategies, as well as techniques
`that can effectively use and update indexes for this purpose.
`
`4. We describe the architecture of the Metadata Management
`System (MMS) we have prototyped. We experimentally
`evaluate the performance of MMS, and compare it with pre-
`vious proposals for metadata management. Our results vali-
`date the generality and practicality of our uniform approach
`to metadata management.
`
`The structure of the paper is as follows. Section 2 provides a run-
`ning example that identifies the issues and illustrates our solution.
`Section 3 defines the semantics of query expressions as data values
`in relational tables, and their use in query conditions. Section 4
`describes how attributes of this type can be implemented using ex-
`isting relational database technology. Finally, Section 5 presents
`experimental results to validate our methodology.
`
`2.
`
`ILLUSTRATIVE EXAMPLE
`We describe in this section a realistic example that illustrates the
`need for a uniform way of managing different kinds of metadata
`and their associations to data, and also illustrates our solution.
`
`EXAMPLE 2.1. Consider a communications company database
`with the table Customers shown in Figure 1. The table contains
`information about the phone lines (PhoneLine) of the customers
`(Name), their location (Loc), whether a customer is a business or
`a residence (Type) and the circuit (CircuitID) used by the phone
`line. The contents of the table are generated by integrating data
`from a number of physically distributed sources. When a mistake
`is detected in the table, it is important to know its origin in or-
`der to correct it. To make this information available to the user,
`the data in the Customers table needs to be annotated with its
`provenance information. This includes the origin database name
`(Source), its IP address (IP), and the communication protocol
`used to access it (Protocol). One way to achieve this is to al-
`ter the table Customers by adding three new columns for each of
`its attributes [4]. Such a solution may affect the way existing appli-
`cations use the table, may degrade performance, or may not even
`be implementable due to lack of authorization for such a change.
`An alternative solution is to to store the provenance informa-
`tion in a separate table (Provenance) as illustrated in Figure 1.
`
`Column Rf1 can be used to specify the relationship between the
`specific tuple and the data it annotates. It may contain Name values
`assuming that Name is the key in Customers. For instance, tuple
`[BCT, NJDB, 147.52.7.8, http] in Provenance would indicate that
`the BCT customer data tuple was obtained from the NJDB source.
`This modeling approach has two main drawbacks. First, it has
`a lot of information repetition. Assume that it has been asserted
`that all the New Jersey customers originate from the same data
`source. To record that, a tuple like the one just mentioned will
`have to be repeated in table Provenance for every New Jersey
`customer. The second drawback is that this mechanism cannot be
`used to model the fact that a Provenance tuple may not refer to
`the whole Customers tuple but only to a subset of its attributes.
`
`What we propose is to allow queries to be used as values in the
`table columns. In particular, to have some columns recording query
`expressions used to intensionally describe data a metadata tuple is
`associated to. To find whether a particular data value is associated
`with a given (metadata) tuple, one only needs to check if the data
`value is part of the relation described by the query expression.
`
`EXAMPLE 2.2. In the example database of Figure 1, column
`Rf1 of table Provenance contains queries instead of atomic val-
`ues. The first tuple with query q1 in column Rf1 intensionally
`describes that the provenance of all the customers with location
`’NJ’ is the NJDB data source. Furthermore, through the attributes
`of its select clause, it specifies that this is true only for attributes
`Name, Type and PhoneLine. It states nothing about attributes Loc
`and CircuitID. In a similar fashion, the second tuple specifies
`that data source ’3State’ is the origin of the Loc, PhoneLine and
`CircuitID values of all the business Customers.
`
`EXAMPLE 2.3. The data in table Customers often needs to be
`verified for their consistency. This is common practice in large
`database applications where errors appear frequently [5]. In the
`current application, this is done by a number of technicians from
`various companies. Not all technicians are qualified to verify the
`correctness of every data element in the Customers table. There
`are certain rules that govern this qualification. For example, any
`CISCO technician can verify that the circuit id recorded in the data-
`base for any residential customer is correct, any Verizon techni-
`cian can verify the correctness of the recorded phone number or
`circuit id (or their association) of any business customer, and any
`AT&T technician can verify the correctness of the phone number of
`
`403
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2213
`Page 3 of 12
`
`

`
`t R2
`
`R2
`
`...
`
`...
`
`...
`
`
`
`
`
`
`
`
`
`
`
`
`
`qQ
`
`
`
`
`
`
`
`X
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`An
`
`S
`A
`
`1
`
`R1
`
`Figure 2: Q-type joins
`
`non-first normal form relation. The second disadvantage is that it
`would have created results with an unspecified schema. Different
`Q-values have query expressions that can return different numbers
`and types of attributes. Evaluating the Q-type column would have
`resulted in multiple non union-compatible relations.
`
`EXAMPLE 3.1. Consider the query select * from Provenance
`applied on the database instance of Figure 1.
`If the Query col-
`umn Rf1 is evaluated prior to retrieval, the result will be a relation
`with the last 3 attributes of Provenance table and the attributes re-
`turned by the execution of the query in column Rf1. Queries q1 and
`q2 are not union-compatible (they have different select clauses).
`Due to this, in the execution of query select * from Provenance,
`if an unnest operation is applied on the returned result relation of
`queries q1 and q2, the final result will have tuples with different
`number and kind of attributes, hence, it would not be a relation.
`
`The only additional functionality that we need is a new kind of
`predicate (i.e., conditional expression) that makes use of the eval
`function. Such a predicate can be a parameter of the standard op-
`erators of relational algebra, including selection and join, yielding
`functionality that can effectively use Q-values for associating meta-
`data tables with data (or other metadata) tables, as described next.
`The functionality described here can also be achieved in the
`nested relational model, where an entire relation is stored as a value
`of an attribute. That model is much more powerful but, unfortu-
`nately, its richer functionality comes with a much higher cost. Our
`goal is to achieve what we want with minimal cost. We chose to
`implement our solution to the relational model that currently dom-
`inates the commercial DBMS world.
`3.2 Selections on Q-values
`Although for storage and retrieval purposes Q-type column con-
`tents are viewed as atomic types, for comparison purposes, we
`would like to view Q-values as relations. Since a Q-value is noth-
`ing more than an intensional description of a virtual relation, the
`functionality needed is the one that allows checking whether cer-
`tain values exist in the relation described by a Q-value. If they do,
`it is said that the Q-value references these values.
`
`DEFINITION 3.2. For a relation R with a Q-type column Q,
`let t be a tuple of that relation and q the value of that tuple on
`column Q. Assume that A is the set of attributes of the result rela-
`tion eval(q). If v1, . . . , vn are atomic values and A1, . . . , An are
`attribute names, expression R.Q[A1, . . . , An]
`[v1, . . . , vn] eval-
`uates to true for R.Q=q if the following conditions are satisfied.
`1. ∀i = 1..n Ai∈A, and
`2. ∃t(cid:2)∈eval(q) such that t(cid:2)[Ai] = vi, ∀i = 1..n.
`
`.=
`
`EXAMPLE 3.3. Assume that a data administrator would like to
`know what data sources are related to customer ’AFLAC’. This
`
`any customer. The owner of the database would like to annotate the
`data in the Customers table with the technicians that are eligible
`for performing the verification task, so that given some data val-
`ues, it is easy to find who can be called to perform the verification.
`To do that, the technician information is recorded in a new data-
`base table Technicians. The relationship between technicians
`and customers is modeled through a new table XJ with columns
`Qc and Qt, both containing queries as values. Their contents are
`presented in Figure 1. The tuples in the table XJ model these rules.
`
`With the proposed mechanism, one can easily introduce new
`metadata on top of other existing metadata. It is only a matter of
`creation of a new table and of specifying the right queries as val-
`ues in one of its columns. Also, different metadata tuples in the
`same metadata table can refer to different data or metadata tables.
`These are important features since the distinction between data and
`metadata is usually blurred. The same piece of information may be
`viewed as data by one application and as metadata by another.
`
`EXAMPLE 2.4. Suppose that a set of security policies need to
`be specified for some of the data. For simplicity, assume that these
`policies include only the group of users who can access the relevant
`data. The system administrator would like to annotate both the
`Customers and Provenance tables with the access permissions
`information. To achieve it, she creates a new Permissions table
`as illustrated in Figure 1. The first tuple of that table, through query
`q11 stored as a value in column Rf2, indicates that records in the
`Provenance table whose IP is in the 147.* domain can be accessed
`only by an administrator. The second tuple, through query q12,
`indicates that the Name field in the Customers table with location
`’NY’ can be accessed by a guest user.
`
`It is also important to note that through the proposed modeling
`of metadata information as data, and of the associations between
`tables through attributes with queries as values, metadata can have
`any complex structure. In particular, a piece of metadata informa-
`tion may have multiple columns with different types.
`
`3. QUERIES AS DATA VALUES
`This section formally defines the semantics and the use of query
`expressions as data values and the operators on them.
`3.1 Query-Types
`The adopted type system is the one of the relational model ex-
`tended with a new user defined atomic type called Q-type. Q-types
`provide the means to store queries as values in relational tables,
`in a fashion similar to INGRES [23] or Meta-SQL [9]. User de-
`fined types are used the same way any other primitive atomic type
`is used. The ability to define and use such types is part of the SQL
`Standard and is currently supported by most commercial database
`management systems. A value of type Q-type, referred to as Q-
`value, is a relational query expression.
`To be able to dynamically execute queries stored as values, we
`assume the existence of a function eval whose role is to evaluate
`a query expression that is provided to it as argument. However,
`in contrast to other approaches that use query expressions as data
`values [23, 18], we do not propose to extend relational algebra to
`include this function as an operator. Such an extension would have
`two major implications. The first is that it would have required the
`use of a nested relational model, instead of the simpler flat (first
`normal form) relational model. If, for instance, eval was part of
`the extended relational algebra, applying it on a Q-type column
`would have returned a relation in which the contents of that col-
`umn would have been relations, i.e., the result would have been a
`
`404
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2213
`Page 4 of 12
`
`

`
`information is done through a many-to-many relationship imple-
`mented by table XJ. Note that in contrast to traditional join ap-
`proaches, there are no common attributes between XJ, Customers,
`and Technicians. The join is achieved through the intensional
`description of the queries stored in the two Q-type columns of table
`XJ. First, a query is constructed to retrieve the New Jersey business
`customers.
`select * from Customers c
`where Loc=’NJ’ and c.Type=’business’
`
`It is then enhanced to also retrieve the tuples of table XJ
`that contain some intensional reference to the circuit ids of these
`Customers. This is achieved through a join between the Q-values
`in attribute Qc and the attribute CircuitID of Customers. The
`query becomes:
`select * from Customers c, XJ j
`where c.Loc=’NJ’ and c.Type=’business’ and
`[c.CircuitID]
`j.Qc[CircuitID]
`
`.=
`
`From the tuples that appear in XJ, the first is for residential cus-
`tomers, so it cannot satisfy the query specifications. The third tuple
`does not mention CircuitID in its select clause, so it cannot sat-
`The second tuple of XJ will
`isfy the query specifications either.
`form a join pair with every Customers tuple that agrees on the
`CircuitID attribute value with at least one tuple in the evaluation
`of the query qC2.
`The result of the join between Customers and XJ will also have
`to be associated with the Technicians tuples. This is done in a
`similar way through a join on column Qt of XJ and the attributes
`of Technicians. The final query is:
`select t.Name from Customers c, XJ j, Technicians t
`where c.Loc=’NJ’ and c.Type=’business’ and
`[c.CircuitID] and
`j.Qc[CircuitID]
`[t.Name]
`j.Qt[Name]
`
`.=
`
`.=
`
`If the administrator was interested in the names of the techni-
`cians that can verify either the phone and the circuit id of business
`customers, then the final query would have been:
`select t.Name from Customers c, XJ j, Technicians t
`where c.Loc=’NJ’ and c.Type=’business’ and
`[c.PhoneLine] or
`(j.Qc[PhoneLine]
`j.Qc[CircuitID]
`[c.CircuitID])
`[t.Name]
`and j.Qt[Name]
`
`.=
`
`.=
`
`.=
`
`In this case, both the second and the third tuples of XJ would
`have been relevant. If the administrator was interested in techni-
`cians that can verify both the phone and the circuit id, then the final
`query would have been:
`select t.Name from Customers c, XJ j, Technicians t
`where c.Loc=’NJ’ and c.Type=’business’ and
`[c.PhoneLine, c.CircuitID]
`j.Qc[PhoneLine, CircuitID]
`[t.Name]
`and j.Qt[Name]
`
`.=
`
`.=
`
`which is not equivalent to the query:
`select t.Name from Customers c, XJ j, Technicians t
`where c.Loc=’NJ’ and c.Type=’business’ and
`[c.PhoneLine] and
`j.Qc[PhoneLine]
`j.Qc[CircuitID]
`[c.CircuitID]
`[t.Name]
`and j.Qt[Name]
`
`.=
`
`.=
`
`.=
`
`This is because, in principle, different tuples in the evaluation of
`a Q-value query expression can satisfy the two conditions linked by
`the and, while in the former query the same tuple in the evaluation
`of the Q-value query expression has to satisfy the joint condition on
`phone and circuit id.
`
`translates to selecting from the Provenance table those tuples hav-
`ing a Q-value in attribute Rf1 that references Name and that name
`is ’AFLAC’.
`This can be expressed as:
`select distinct p.Source from Provenance p
`[(cid:2)AF LAC(cid:2)].
`where p.Rf1[Name]
`
`.=
`
`Note that the semantics of the select operation have not changed.
`The only new part is the introduction of the “
`” conditional ex-
`pression (predicate) for values of type Q. The way this condition
`is evaluated is a topic of a subsequent section. The symbol “
`” is
`used instead of “=” to emphasize the different predicate.
`Another point worth clarifying is that Q-values can specify mul-
`tiple conditions on multiple attributes. Our examples are kept sim-
`ple for expository purposes.
`3.3 Joins Using a Q-value
`The values v1, . . . , vn in Definition 3.2 can be either constant
`values as is the case in Example 3.3, or relational atomic expres-
`sions that take values during query evaluation. The ability to use
`such expressions provides the means to form joins that are based
`on Q-type columns, allowing tables representing metadata to be as-
`sociated to the data tables.
`
`.=
`
`.=
`
`EXAMPLE 3.4. A data administrator has discovered that in the
`database in Figure 1 the customer names starting with “A” violate
`the format policy and would like to know the source from where they
`originate. She knows that the provenance information is stored in
`the Provenance table where attribute Rf1 specifies the association
`between the data and the metadata. She issues the following query:
`select distinct p.Source from Customers c, Provenance p
`where p.Rf1[Name]
`[c.Name] and c.Name LIKE ’A%’
`
`.=
`
`What the query does is to select all the customers whose name starts
`with letter “A”. For every such tuple c it checks if there is a tu-
`ple p in Provenance with a Q-value q in p.Rf1 such that relation
`eval(q) has an attribute Name and there is at least one tuple in it
`with the value in column Name equal to the value of c.Name. If yes,
`then tuples c and p pair up. The answer of the above query on the
`instance of Figure 1 is the tuple [(cid:2)NJDB(cid:2)].
`Note again that this has the same semantics as the regular SQL
`join operator (only the join condition is different since it is based
`on a Q-type column). The result of its execution if the select clause
`was select “*”, would consist of the attributes of Customers con-
`catenated to the attributes of Provenance. The Q-type attribute
`values will be the query expression presented as a string.
`
`Figure 2 provides a visual explanation of when a tuple in one
`relation R2 can form a join with tuples in a relation R1 when
`the join is based on its Q-type column Q and th

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket