`
`Divesh Srivastava
`AT&T Labs–Research
`divesh@research.att.com
`
`Yannis Velegrakis
`University of Trento
`velgias@dit.unitn.it
`
`[12]
`
`[5]
`
`[16]
`
`[24]
`[14]
`[29]
`
`[3]
`
`Ref. Method Used
`Atomic value annotations attached to a block of
`values within a tuple. They accompany the values
`as retrieved. Relational algebra query language.
`Atomic data values carry their provenance, which
`propagates with them as they are retrieved. Query
`language supports predicates on provenance.
`Data values are associated with quality parame-
`ters (accuracy, freshness, etc.). SQL is extended
`to retrieve data using these parameters.
`Explicit modeling of schema and mapping infor-
`mation, and associations of it with portions of the
`data. SQL extension to retrieve data and meta-
`data that satisfy certain metadata properties.
`Credential-based access control. System reads
`complex security profiles and returns data results
`accordingly.
`Loosely-coupled model of information elements,
`marks and links used to represent superimposed
`information. It has no specific schema, but is in
`relational model and can be queried using SQL.
`Creation and modification time is recorded with
`the data and used in query answering. Query lan-
`guage supports predicates on time values.
`
`[15]
`
`[6]
`
`Kind
`
`Annotations
`
`Provenance
`
`Quality
`Parameters
`
`Schema &
`Mapping
`Information
`
`Security
`
`Super-
`imposed
`Information
`
`Time
`
`ABSTRACT
`There is a growing need to associate a variety of metadata with
`the underlying data, but a simple, elegant approach to uniformly
`model and query both the data and the metadata has been elusive.
`In this paper, we argue that (1) the relational model augmented with
`queries as data values is a natural way to uniformly model data, ar-
`bitrary metadata and their associations, and (2) relational queries
`with a join mechanism augmented to permit matching of query re-
`sult relations, instead of only atomic values, is an elegant way to
`uniformly query across data and metadata. We describe the archi-
`tecture of a system we have prototyped for this purpose, demon-
`strate the generality of our approach and evaluate the performance
`of the system, in comparison with previous proposals for metadata
`management.
`Categories and Subject Descriptors:
`H.2.1 [Database Management]: Logical Design – Data Models
`General Terms: Management.
`Keywords: Queries as data, metadata management, intensional as-
`sociations, annotations.
`
`1.
`
`INTRODUCTION
`In recent years we have witnessed a tremendous proliferation of
`databases in many fields of endeavor, ranging from corporate en-
`vironments and scientific domains to supporting a diverse set of
`applications on the web. These databases are becoming increas-
`ingly complex, both in their internal structure (e.g., thousands of
`tables) and in their interactions with other databases and applica-
`tions (e.g., mediators and workflows). There is a consequent need
`for understanding, maintaining, querying, integrating and evolving
`these databases. In successfully performing these tasks, metadata
`plays an important role. Metadata is data about data, a secondary
`piece of information that is separate in some way from the primary
`piece of information to which it refers. Metadata examples include
`schema, integrity constraints, comments about the data [4], ontolo-
`gies [1], quality parameters [28, 16], annotations [5, 12], prove-
`nance [24], and security policies [3].
`Metadata is used in many different fields. In corporate environ-
`ments, databases deployed at the core of important business oper-
`
`Permission to make digital or hard copies of all or part of this work for
`personal or classroom use is granted without fee provided that copies are
`not made or distributed for profit or commercial advantage and that copies
`bear this notice and the full citation on the first page. To copy otherwise, to
`republish, to post on servers or to redistribute to lists, requires prior specific
`permission and/or a fee.
`SIGMOD’07,June12–14,2007,Beijing,China.
`Copyright 2007 ACM 978-1-59593-686-8/07/0006 ...$5.00.
`
`Table 1: Metadata management proposals
`
`ations may contain erroneous, inaccurate, out-of-date, or incom-
`plete data with a significant impact on query results [8]. In such
`applications, data can be tagged with quality parameters to com-
`municate suitability, accuracy, freshness or redundancy [27, 28],
`and schema structures can be annotated with textual description
`to communicate their semantics. In scientific domains where data
`may be collected from various sources, cleansed, integrated and
`processed to produce new forms of data enhanced with new analy-
`sis results [19], provenance can be provided as metadata in the form
`of annotations [4] and schema and mapping information can be
`stored in special structures to allow users to apply their own judg-
`ment to assess credibility of query results [24]. In heterogeneous
`environments where different sources may use different structures
`to represent the same real world entity, or the same representation to
`model different concepts, metadata can clarify semantics, prevent
`misinterpretations or misuses of data, achieve interoperability [10]
`and allow the retrieval of the data that is best suited to the task at
`hand [21]. In the Internet domain, superimposed information, i.e.,
`data “placed over” existing information sources, is used to help or-
`ganize, access, connect and reuse information elements in those
`sources [15]. Similarly, security related policies can be associated
`to the data to control access in various environments [3].
`
`401
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2213
`Page 1 of 12
`
`
`
`1.1 Metadata Management Approaches
`Over the years, numerous proposals have been made by re-
`searchers for augmenting the data model and the query capabilities
`of a database in order to facilitate metadata management. Table 1
`provides a list of such proposals. While the list is by no means ex-
`haustive, it provides a good sample of the kinds of metadata that
`have been considered of interest and have been studied.
`It also
`characterizes the way the problem has been approached. Based on
`these approaches one can observe the great variety of metadata, the
`different structures, the kind of data associated with metadata, and
`the way metadata has been used in queries. Some of this metadata
`is described by single atomic values, e.g., the creation time of an
`element in the database [6]. Others have a more complex structure,
`like the schema mapping information [24] or security [3]. Further-
`more, metadata can be associated either to individual data values
`[5, 16, 6] or to a group of values, such as a subset of the attributes
`(i.e., a block) of a tuple [12] or a complex XML element [3].
`A common denominator of the approaches in Table 1 is the use
`of metadata in querying. Some use it to restrict query results, which
`may [6, 16] or may not [5] include metadata alongside the actual
`data results. Others query and retrieve metadata independently of
`the data to which it is associated [15].
`It is also worth observing that each entry in Table 1 is tailored
`to specific kinds of metadata, and is not directly applicable to other
`kinds, at least not without some major modifications. Past attempts
`at building generic metadata stores (e.g., [13, 2]) have employed
`complex modeling tools for this purpose:
`[13] explicitly repre-
`sented the various artifacts using Telos [17], and [2] employed
`data repositories (intended for shared databases of engineering ar-
`tifacts). A simple, elegant approach to uniformly model and query
`data, arbitrary metadata and their association has been elusive.
`Another observation from Table 1 is that metadata has to be ex-
`plicitly associated with each data item it refers to. There are, how-
`ever, practical scenarios in which an intensional association may
`be more appropriate. For example, in an application that contains
`reviews about restaurants [4], one may need to describe a property
`that holds for all the restaurants in New York. Using previous pro-
`posals this information would have had to be associated explicitly
`to every New York restaurant in the database. Furthermore, future
`tuples cannot be accommodated without explicit association. For
`instance, if ten new restaurants are opened in New York, the prop-
`erty that holds for all the restaurants in New York should also hold
`for them. However, this cannot be achieved unless the property
`gets explicitly associated to these ten restaurants. An alternative
`approach (which we advocate) is to use an intensional description
`of the restaurants (data) that have that property (metadata), in the
`same way virtual views use queries over base tables to describe
`their instances, and future tuples that are inserted or deleted from
`the base tables, are automatically included or removed from the
`instances of the views.
`1.2 Our Approach
`In this work, we describe a study that aims to accommodate in
`one simple framework the different kinds of metadata, the differ-
`ent structures, the different ways that metadata is associated with
`data and the different ways in which it is used in queries. We argue
`that in order to achieve this we need a framework that is simple
`and abstracted from the specifics of each kind of metadata. Having
`observed that the operations one needs to perform on metadata are
`similar to those people do with data, we propose the use of standard
`data management techniques for metadata, so that both data and
`metadata can be managed in one single framework. We advocate
`that the relational model is adequate for such a purpose. Metadata
`
`with complex structures can easily be modeled through relations
`and attributes. These relations have no special semantics, thus, the
`same piece of information can be viewed either as data or as meta-
`data. It can also be queried using a relational query language, even
`independently of whether or not it is associated to some data.
`Our philosophy is that although, at the conceptual level, there
`may be a distinction between data and metadata, at the database
`level, for the purpose of management, everything is represented as
`a relation. This approach is not different from the one followed by
`the relational model, where at the conceptual level there may be a
`distinction between entities and relationships, but at the database
`level, everything is represented through relations.
`The main mechanism used in the relational model to associate
`data in different relations is the join on one or more attributes.
`If data and metadata have been modeled as relations, then the
`same mechanism can be used to describe the association between
`data and metadata. Unfortunately, the relational join operation has
`two main limitations that make it inadequate for this intended use:
`(i) The association is always at the tuple level, i.e., it is not possible
`to associate a metadata tuple with only a subset of attributes of an-
`other tuple in a different table, since there is no way to specify the
`attributes that the metadata refers to; (ii) It requires explicit associ-
`ation between the tuples through the join attributes, as previously
`mentioned in the New York restaurants example. To cope with
`these issues, we propose the use of queries as values in the rela-
`tional tables. In particular, we show how attributes of type “query”,
`can achieve the required functionality.
`Using queries as data values is not entirely new. Relational
`DBMSs already store in the catalog tables the definition queries of
`their views. In this case, however, queries are considered schema
`information and despite the fact that they can be queried using SQL,
`they are not considered part of the instance data. Our proposal
`raises such metadata to the level of data, and provides a unified
`mechanism for modeling and querying across data and metadata.
`Apart from the system catalog tables, queries as values have been
`proposed in INGRES [23], with a similar functionality adopted by
`Oracle [11]. They have also been studied in the context of relational
`algebra [18], and the Meta-SQL system [9]. Here we show how this
`idea can be used in the service of metadata management. A key dif-
`ference is that previous approaches require the existence of an eval
`operator that evaluates the queries stored as values at run time, pos-
`sibly resulting in nested relations, or in some cases the computation
`of the outer-union of these results. In contrast, we only need a new
`kind of predicate that makes use of eval, leading to a more efficient
`implementation. Our use of queries as values is similar to the role
`of RDF resource descriptions [26], but our approach is much more
`generic, since we can use as a resource description the full power
`of SQL queries.
`Our contributions can be summarized as follows:
`1. We elevate metadata to first class citizens of the database
`and the query language, without requiring any special se-
`mantics. The approach allows metadata management with-
`out any modification of the semantics of the relational model
`and SQL, and without having to alter existing tables, since
`we use stored queries to refer to the data which the metadata
`entries describe.
`2. We extend the traditional join mechanism of the relational
`model to support joins that are based not on single values,
`but on a relation specified by a query stored in one of the
`attributes. This allows intensional specification of the data
`to which the metadata is associated. Furthermore, it allows
`metadata tuples to be associated to not just whole tuples, but
`also to portions of them.
`
`402
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2213
`Page 2 of 12
`
`
`
`Customers
`Name Type Loc PhoneLine CircuitID
`AFLAC bus NJ
`4078417332 245-6983
`J. Lu
`res
`NY 2019394460 245-7363
`H. Ford res
`NJ
`2159537607 245-7564
`AMEX bus NY 3178763540 343-5002
`NJC
`bus NJ
`9730918327 981-5002
`BCT
`bus NJ
`9734858504 273-6019
`...
`...
`...
`...
`...
`
`Provenance
`Protocol
`Rf1 Source IP
`http
`q1 NJDB 147.52.7.8
`q2
`3State
`148.62.1.11 ftp
`...
`...
`...
`...
`
`Permissions
`Rf2 Users
`q11 Administrators
`q12 Guests
`...
`...
`
`q1: select Name,Type,PhoneLine
`from Customers where Loc=’NJ’
`q2: select Loc,PhoneLine,CircuitID
`from Customers where Type=’business’
`
`q11: select * from Provenance
`where IP LIKE ’147.%’
`q12: select Name from Customers
`where Loc=’NY’
`
`XJ
`Qc Qt
`qC1 qT 1
`qC2 qT 2
`qC3 qT 3
`...
`...
`
`Technicians
`Company
`Name
`Contact
`W. Farkas 4804978353 AT&T
`S. Gilbert 3178757627 Verizon
`M. Henry 8187167852 AT&T
`C. Urs
`7739735713 AT&T
`Y. James
`7344676191 CISCO
`
`qC1. select CircuitID from Customers where Type=’residence’
`qT 1. select * from Technicians where Company=’CISCO’
`qC2. select PhoneLine,CircuitID from Customers where Type=’business’
`qT 2. select * from Technicians where Company=’Verizon’
`qC3. select PhoneLine from Customers
`qT 3. select * from Technicians where Company=’AT&T’
`
`Figure 1: A database with metadata information stored in regular tables as data.
`
`3. We explore alternative implementation mechanisms that al-
`low the use of queries as data values in modern relational
`databases and also allow joins based on such values. We
`present pure rewriting-based strategies, as well as techniques
`that can effectively use and update indexes for this purpose.
`
`4. We describe the architecture of the Metadata Management
`System (MMS) we have prototyped. We experimentally
`evaluate the performance of MMS, and compare it with pre-
`vious proposals for metadata management. Our results vali-
`date the generality and practicality of our uniform approach
`to metadata management.
`
`The structure of the paper is as follows. Section 2 provides a run-
`ning example that identifies the issues and illustrates our solution.
`Section 3 defines the semantics of query expressions as data values
`in relational tables, and their use in query conditions. Section 4
`describes how attributes of this type can be implemented using ex-
`isting relational database technology. Finally, Section 5 presents
`experimental results to validate our methodology.
`
`2.
`
`ILLUSTRATIVE EXAMPLE
`We describe in this section a realistic example that illustrates the
`need for a uniform way of managing different kinds of metadata
`and their associations to data, and also illustrates our solution.
`
`EXAMPLE 2.1. Consider a communications company database
`with the table Customers shown in Figure 1. The table contains
`information about the phone lines (PhoneLine) of the customers
`(Name), their location (Loc), whether a customer is a business or
`a residence (Type) and the circuit (CircuitID) used by the phone
`line. The contents of the table are generated by integrating data
`from a number of physically distributed sources. When a mistake
`is detected in the table, it is important to know its origin in or-
`der to correct it. To make this information available to the user,
`the data in the Customers table needs to be annotated with its
`provenance information. This includes the origin database name
`(Source), its IP address (IP), and the communication protocol
`used to access it (Protocol). One way to achieve this is to al-
`ter the table Customers by adding three new columns for each of
`its attributes [4]. Such a solution may affect the way existing appli-
`cations use the table, may degrade performance, or may not even
`be implementable due to lack of authorization for such a change.
`An alternative solution is to to store the provenance informa-
`tion in a separate table (Provenance) as illustrated in Figure 1.
`
`Column Rf1 can be used to specify the relationship between the
`specific tuple and the data it annotates. It may contain Name values
`assuming that Name is the key in Customers. For instance, tuple
`[BCT, NJDB, 147.52.7.8, http] in Provenance would indicate that
`the BCT customer data tuple was obtained from the NJDB source.
`This modeling approach has two main drawbacks. First, it has
`a lot of information repetition. Assume that it has been asserted
`that all the New Jersey customers originate from the same data
`source. To record that, a tuple like the one just mentioned will
`have to be repeated in table Provenance for every New Jersey
`customer. The second drawback is that this mechanism cannot be
`used to model the fact that a Provenance tuple may not refer to
`the whole Customers tuple but only to a subset of its attributes.
`
`What we propose is to allow queries to be used as values in the
`table columns. In particular, to have some columns recording query
`expressions used to intensionally describe data a metadata tuple is
`associated to. To find whether a particular data value is associated
`with a given (metadata) tuple, one only needs to check if the data
`value is part of the relation described by the query expression.
`
`EXAMPLE 2.2. In the example database of Figure 1, column
`Rf1 of table Provenance contains queries instead of atomic val-
`ues. The first tuple with query q1 in column Rf1 intensionally
`describes that the provenance of all the customers with location
`’NJ’ is the NJDB data source. Furthermore, through the attributes
`of its select clause, it specifies that this is true only for attributes
`Name, Type and PhoneLine. It states nothing about attributes Loc
`and CircuitID. In a similar fashion, the second tuple specifies
`that data source ’3State’ is the origin of the Loc, PhoneLine and
`CircuitID values of all the business Customers.
`
`EXAMPLE 2.3. The data in table Customers often needs to be
`verified for their consistency. This is common practice in large
`database applications where errors appear frequently [5]. In the
`current application, this is done by a number of technicians from
`various companies. Not all technicians are qualified to verify the
`correctness of every data element in the Customers table. There
`are certain rules that govern this qualification. For example, any
`CISCO technician can verify that the circuit id recorded in the data-
`base for any residential customer is correct, any Verizon techni-
`cian can verify the correctness of the recorded phone number or
`circuit id (or their association) of any business customer, and any
`AT&T technician can verify the correctness of the phone number of
`
`403
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2213
`Page 3 of 12
`
`
`
`t R2
`
`R2
`
`...
`
`...
`
`...
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`X
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`An
`
`S
`A
`
`1
`
`R1
`
`Figure 2: Q-type joins
`
`non-first normal form relation. The second disadvantage is that it
`would have created results with an unspecified schema. Different
`Q-values have query expressions that can return different numbers
`and types of attributes. Evaluating the Q-type column would have
`resulted in multiple non union-compatible relations.
`
`EXAMPLE 3.1. Consider the query select * from Provenance
`applied on the database instance of Figure 1.
`If the Query col-
`umn Rf1 is evaluated prior to retrieval, the result will be a relation
`with the last 3 attributes of Provenance table and the attributes re-
`turned by the execution of the query in column Rf1. Queries q1 and
`q2 are not union-compatible (they have different select clauses).
`Due to this, in the execution of query select * from Provenance,
`if an unnest operation is applied on the returned result relation of
`queries q1 and q2, the final result will have tuples with different
`number and kind of attributes, hence, it would not be a relation.
`
`The only additional functionality that we need is a new kind of
`predicate (i.e., conditional expression) that makes use of the eval
`function. Such a predicate can be a parameter of the standard op-
`erators of relational algebra, including selection and join, yielding
`functionality that can effectively use Q-values for associating meta-
`data tables with data (or other metadata) tables, as described next.
`The functionality described here can also be achieved in the
`nested relational model, where an entire relation is stored as a value
`of an attribute. That model is much more powerful but, unfortu-
`nately, its richer functionality comes with a much higher cost. Our
`goal is to achieve what we want with minimal cost. We chose to
`implement our solution to the relational model that currently dom-
`inates the commercial DBMS world.
`3.2 Selections on Q-values
`Although for storage and retrieval purposes Q-type column con-
`tents are viewed as atomic types, for comparison purposes, we
`would like to view Q-values as relations. Since a Q-value is noth-
`ing more than an intensional description of a virtual relation, the
`functionality needed is the one that allows checking whether cer-
`tain values exist in the relation described by a Q-value. If they do,
`it is said that the Q-value references these values.
`
`DEFINITION 3.2. For a relation R with a Q-type column Q,
`let t be a tuple of that relation and q the value of that tuple on
`column Q. Assume that A is the set of attributes of the result rela-
`tion eval(q). If v1, . . . , vn are atomic values and A1, . . . , An are
`attribute names, expression R.Q[A1, . . . , An]
`[v1, . . . , vn] eval-
`uates to true for R.Q=q if the following conditions are satisfied.
`1. ∀i = 1..n Ai∈A, and
`2. ∃t(cid:2)∈eval(q) such that t(cid:2)[Ai] = vi, ∀i = 1..n.
`
`.=
`
`EXAMPLE 3.3. Assume that a data administrator would like to
`know what data sources are related to customer ’AFLAC’. This
`
`any customer. The owner of the database would like to annotate the
`data in the Customers table with the technicians that are eligible
`for performing the verification task, so that given some data val-
`ues, it is easy to find who can be called to perform the verification.
`To do that, the technician information is recorded in a new data-
`base table Technicians. The relationship between technicians
`and customers is modeled through a new table XJ with columns
`Qc and Qt, both containing queries as values. Their contents are
`presented in Figure 1. The tuples in the table XJ model these rules.
`
`With the proposed mechanism, one can easily introduce new
`metadata on top of other existing metadata. It is only a matter of
`creation of a new table and of specifying the right queries as val-
`ues in one of its columns. Also, different metadata tuples in the
`same metadata table can refer to different data or metadata tables.
`These are important features since the distinction between data and
`metadata is usually blurred. The same piece of information may be
`viewed as data by one application and as metadata by another.
`
`EXAMPLE 2.4. Suppose that a set of security policies need to
`be specified for some of the data. For simplicity, assume that these
`policies include only the group of users who can access the relevant
`data. The system administrator would like to annotate both the
`Customers and Provenance tables with the access permissions
`information. To achieve it, she creates a new Permissions table
`as illustrated in Figure 1. The first tuple of that table, through query
`q11 stored as a value in column Rf2, indicates that records in the
`Provenance table whose IP is in the 147.* domain can be accessed
`only by an administrator. The second tuple, through query q12,
`indicates that the Name field in the Customers table with location
`’NY’ can be accessed by a guest user.
`
`It is also important to note that through the proposed modeling
`of metadata information as data, and of the associations between
`tables through attributes with queries as values, metadata can have
`any complex structure. In particular, a piece of metadata informa-
`tion may have multiple columns with different types.
`
`3. QUERIES AS DATA VALUES
`This section formally defines the semantics and the use of query
`expressions as data values and the operators on them.
`3.1 Query-Types
`The adopted type system is the one of the relational model ex-
`tended with a new user defined atomic type called Q-type. Q-types
`provide the means to store queries as values in relational tables,
`in a fashion similar to INGRES [23] or Meta-SQL [9]. User de-
`fined types are used the same way any other primitive atomic type
`is used. The ability to define and use such types is part of the SQL
`Standard and is currently supported by most commercial database
`management systems. A value of type Q-type, referred to as Q-
`value, is a relational query expression.
`To be able to dynamically execute queries stored as values, we
`assume the existence of a function eval whose role is to evaluate
`a query expression that is provided to it as argument. However,
`in contrast to other approaches that use query expressions as data
`values [23, 18], we do not propose to extend relational algebra to
`include this function as an operator. Such an extension would have
`two major implications. The first is that it would have required the
`use of a nested relational model, instead of the simpler flat (first
`normal form) relational model. If, for instance, eval was part of
`the extended relational algebra, applying it on a Q-type column
`would have returned a relation in which the contents of that col-
`umn would have been relations, i.e., the result would have been a
`
`404
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2213
`Page 4 of 12
`
`
`
`information is done through a many-to-many relationship imple-
`mented by table XJ. Note that in contrast to traditional join ap-
`proaches, there are no common attributes between XJ, Customers,
`and Technicians. The join is achieved through the intensional
`description of the queries stored in the two Q-type columns of table
`XJ. First, a query is constructed to retrieve the New Jersey business
`customers.
`select * from Customers c
`where Loc=’NJ’ and c.Type=’business’
`
`It is then enhanced to also retrieve the tuples of table XJ
`that contain some intensional reference to the circuit ids of these
`Customers. This is achieved through a join between the Q-values
`in attribute Qc and the attribute CircuitID of Customers. The
`query becomes:
`select * from Customers c, XJ j
`where c.Loc=’NJ’ and c.Type=’business’ and
`[c.CircuitID]
`j.Qc[CircuitID]
`
`.=
`
`From the tuples that appear in XJ, the first is for residential cus-
`tomers, so it cannot satisfy the query specifications. The third tuple
`does not mention CircuitID in its select clause, so it cannot sat-
`The second tuple of XJ will
`isfy the query specifications either.
`form a join pair with every Customers tuple that agrees on the
`CircuitID attribute value with at least one tuple in the evaluation
`of the query qC2.
`The result of the join between Customers and XJ will also have
`to be associated with the Technicians tuples. This is done in a
`similar way through a join on column Qt of XJ and the attributes
`of Technicians. The final query is:
`select t.Name from Customers c, XJ j, Technicians t
`where c.Loc=’NJ’ and c.Type=’business’ and
`[c.CircuitID] and
`j.Qc[CircuitID]
`[t.Name]
`j.Qt[Name]
`
`.=
`
`.=
`
`If the administrator was interested in the names of the techni-
`cians that can verify either the phone and the circuit id of business
`customers, then the final query would have been:
`select t.Name from Customers c, XJ j, Technicians t
`where c.Loc=’NJ’ and c.Type=’business’ and
`[c.PhoneLine] or
`(j.Qc[PhoneLine]
`j.Qc[CircuitID]
`[c.CircuitID])
`[t.Name]
`and j.Qt[Name]
`
`.=
`
`.=
`
`.=
`
`In this case, both the second and the third tuples of XJ would
`have been relevant. If the administrator was interested in techni-
`cians that can verify both the phone and the circuit id, then the final
`query would have been:
`select t.Name from Customers c, XJ j, Technicians t
`where c.Loc=’NJ’ and c.Type=’business’ and
`[c.PhoneLine, c.CircuitID]
`j.Qc[PhoneLine, CircuitID]
`[t.Name]
`and j.Qt[Name]
`
`.=
`
`.=
`
`which is not equivalent to the query:
`select t.Name from Customers c, XJ j, Technicians t
`where c.Loc=’NJ’ and c.Type=’business’ and
`[c.PhoneLine] and
`j.Qc[PhoneLine]
`j.Qc[CircuitID]
`[c.CircuitID]
`[t.Name]
`and j.Qt[Name]
`
`.=
`
`.=
`
`.=
`
`This is because, in principle, different tuples in the evaluation of
`a Q-value query expression can satisfy the two conditions linked by
`the and, while in the former query the same tuple in the evaluation
`of the Q-value query expression has to satisfy the joint condition on
`phone and circuit id.
`
`translates to selecting from the Provenance table those tuples hav-
`ing a Q-value in attribute Rf1 that references Name and that name
`is ’AFLAC’.
`This can be expressed as:
`select distinct p.Source from Provenance p
`[(cid:2)AF LAC(cid:2)].
`where p.Rf1[Name]
`
`.=
`
`Note that the semantics of the select operation have not changed.
`The only new part is the introduction of the “
`” conditional ex-
`pression (predicate) for values of type Q. The way this condition
`is evaluated is a topic of a subsequent section. The symbol “
`” is
`used instead of “=” to emphasize the different predicate.
`Another point worth clarifying is that Q-values can specify mul-
`tiple conditions on multiple attributes. Our examples are kept sim-
`ple for expository purposes.
`3.3 Joins Using a Q-value
`The values v1, . . . , vn in Definition 3.2 can be either constant
`values as is the case in Example 3.3, or relational atomic expres-
`sions that take values during query evaluation. The ability to use
`such expressions provides the means to form joins that are based
`on Q-type columns, allowing tables representing metadata to be as-
`sociated to the data tables.
`
`.=
`
`.=
`
`EXAMPLE 3.4. A data administrator has discovered that in the
`database in Figure 1 the customer names starting with “A” violate
`the format policy and would like to know the source from where they
`originate. She knows that the provenance information is stored in
`the Provenance table where attribute Rf1 specifies the association
`between the data and the metadata. She issues the following query:
`select distinct p.Source from Customers c, Provenance p
`where p.Rf1[Name]
`[c.Name] and c.Name LIKE ’A%’
`
`.=
`
`What the query does is to select all the customers whose name starts
`with letter “A”. For every such tuple c it checks if there is a tu-
`ple p in Provenance with a Q-value q in p.Rf1 such that relation
`eval(q) has an attribute Name and there is at least one tuple in it
`with the value in column Name equal to the value of c.Name. If yes,
`then tuples c and p pair up. The answer of the above query on the
`instance of Figure 1 is the tuple [(cid:2)NJDB(cid:2)].
`Note again that this has the same semantics as the regular SQL
`join operator (only the join condition is different since it is based
`on a Q-type column). The result of its execution if the select clause
`was select “*”, would consist of the attributes of Customers con-
`catenated to the attributes of Provenance. The Q-type attribute
`values will be the query expression presented as a string.
`
`Figure 2 provides a visual explanation of when a tuple in one
`relation R2 can form a join with tuples in a relation R1 when
`the join is based on its Q-type column Q and th