`Integration
`
`CATHARINE M. WYSS and EDWARD L. ROBERTSON
`Indiana University
`
`In this article, we develop a relational algebra for metadata integration, Federated Interoperable
`Relational Algebra (FIRA). FIRA has many desirable properties such as compositionality, closure, a
`deterministic semantics, a modest complexity, support for nested queries, a subalgebra equivalent
`to canonical Relational Algebra (RA), and robustness under certain classes of schema evolution.
`Beyond this, FIRA queries are capable of producing fully dynamic output schemas, where the
`number of relations and/or the number of columns in relations of the output varies dynamically
`with the input instance. Among existing query languages for relational metadata integration, only
`FIRA provides generalized dynamic output schemas, where the values in any (fixed) number of
`input columns can determine output schemas.
`Further contributions of this article include development of an extended relational model for
`metadata integration, the Federated Relational Data Model, which is strictly downward compatible
`with the relational model. Additionally, we define the notion of Transformational Completeness
`for relational query languages and postulate FIRA as a canonical transformationally complete
`language. We also give a declarative, SQL-like query language that is equivalent to FIRA, called
`Federated Interoperable Structured Query Language (FISQL).
`While our main contributions are conceptual, the federated model, FISQL/FIRA, and the notion
`of transformational completeness nevertheless have important applications to data integration and
`OLAP. In addition to summarizing these applications, we illustrate the use of FIRA to optimize
`FISQL queries using rule-based transformations that directly parallel their canonical relational
`counterparts. We conclude the article with an extended discussion of related work as well as an
`indication of current and future work on FISQL/FIRA.
`Categories and Subject Descriptors: H.2.1 [Database Management]: Logical Design—Data mod-
`els; H.2.3 [Database Management]: Languages—Query languages
`General Terms: Languages
`Additional Key Words and Phrases: Data integration, federated databases, federated data model,
`interoperability, metadata integration, metadata querying, multidatabases, relational query alge-
`bra, schema integration, transformational completeness
`
`The work of E. L. Robertson was supported in part by National Science foundation (NSF) grant
`IIS-82407.
`Authors’ address: Indiana University, Computer Science Department, Bloomington, IN 47405;
`email: {cmw,edrbtsn}@cs.indiana.edu.
`Permission to make digital or hard copies of part or all of this work for personal or classroom use is
`granted without fee provided that copies are not made or distributed for profit or direct commercial
`advantage and that copies show this notice on the first page or initial screen of a display along
`with the full citation. Copyrights for components of this work owned by others than ACM must be
`honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers,
`to redistribute to lists, or to use any component of this work in other works requires prior specific
`permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 1515
`Broadway, New York, NY 10036 USA, fax: +1 (212) 869-0481, or permissions@acm.org.
`C(cid:1) 2005 ACM 0362-5915/05/0600-0624 $5.00
`
`ACM Transactions on Database Systems, Vol. 30, No. 2, June 2005, Pages 624–660.
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2212
`Page 1 of 37
`
`
`
`Relational Languages for Metadata Integration
`
`•
`
`625
`
`1. INTRODUCTION
`In this article, we present a formal paradigm for transformationally com-
`plete relational metadata integration. Our paradigm consists of an extended
`relational algebra, Federated Interoperable Relational Algebra (FIRA). FIRA
`augments canonical Relational Algebra (RA) with the ability to query and re-
`structure metadata along with data directly within the relational model. Fur-
`thermore, FIRA has an equivalent SQL-like counterpart, called Federated In-
`teroperable Structured Query Language (FISQL). While our main contributions
`are conceptual, the FISQL/FIRA framework nevertheless has important appli-
`cations to data integration and OLAP, which we now summarize.
`The field of data integration has interested database researchers for decades.
`The dominant architectural model today is federated, where sources export
`(import) views to (from) a mediated schema. This requires both wrappers en-
`capsulating source data repositories as well as mapping functions giving the
`translation between source data and/or schemas and the mediated schema.
`Wrappers, mapping functions, and mediated schemas are crucial concepts in
`data integration, whether the overall network architecture is centralized, peer-
`to-peer, or hybrid [Halevy 2004].
`Our language, FIRA, can assist with the creation and maintenance of wrap-
`pers, mediating functions, and mediated schemas, especially in the case of
`relational data sources. FIRA is more robust under schema evolution than tra-
`ditional relational languages, as several examples throughout the paper illus-
`trate. Furthermore, one open problem that FIRA addresses in data integration
`is that of dynamically varying schemas, where target schemas may depend
`dynamically on source data. For instance, relational “pivot”-type operations re-
`quire certain data values to become attribute names. An example of this is the
`translation between Indianapolis and Chicago stores in our sales federation
`(Figure 1). FIRA extends contemporary solutions to this problem with more
`generality. For example, dynamic transformations are no longer restricted to
`a single column per query as in SchemaSQL [Gingras and Lakshmanan 1998;
`Lakshmanan et al. 2001]. Also, the federated data model underpinning FIRA
`provides for a clean extension of RA to federations without recourse to a two-
`tiered language as in MD-SQL [Rood et al. 1999] or MQL/MA [Wyss and Van
`Gucht 2001; Wyss et al. 2001]. A deeper discussion of related work appears in
`Section 6.
`Another important application area for FIRA is OnLine Analytical Process-
`ing (OLAP). Many relational transformations required in OLAP contain a “rela-
`tional transpose” similar to our τ operator (Section 3.8). However, the semantics
`of this operation within the constraints of the relational model have not been
`well understood. FIRA leads to important theoretical results concerning rela-
`tional transposition, which we report elsewhere [Robertson and Wyss 2004].
`Furthermore, we are working to extend FISQL with aggregation capabilities
`to better facilitate OLAP applications within the FISQL/FIRA framework.
`
`1.1 Motivating Example
`As an example motivating our languages, consider the relational tables shown
`in Figure 1. In Figure 1, sales data for Amart stores in three midwestern
`ACM Transactions on Database Systems, Vol. 30, No. 2, June 2005.
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2212
`Page 2 of 37
`
`
`
`626
`
`•
`
`C. M. Wyss and E. L. Robertson
`
`Fig. 1. Average sales for Amart stores in three midwestern cities.
`
`Fig. 2. Using SQL to compare Amart data.
`
`cities is shown. Each city uses a different relational representation of their
`sales data. The goal is to facilitate posing queries comparing data in the three
`cities. An SQL query comparing the Indianapolis and Chicago data is shown
`in Figure 2. This query returns pairs of Indianapolis departments and stores
`that make less than at least one comparable department in some Chicago
`store.
`The SQL for this comparison has several problems, centering around the fact
`that the compared departments are “hard-coded” into the query. Thus, for ex-
`ample, if any Chicago store begins offering clothing for toddlers, this query will
`no longer perform the correct comparisons. Similarly, if any Indianapolis stores
`change their department structure, the query is broken. Furthermore, this ap-
`proach does not scale well as the number of departments increases. For 50 or
`100 departments, the query would be daunting to write. In contrast, compare
`the concise representation in FISQL which is independent of the specific de-
`partments (Figure 10, Section 4.1). Previous work has coined the term Schema
`Independence for queries that are robust under this type of schema evolution
`[Lakshmanan et al. 2001; Masermann and Vossen 2000].
`Another class of problems with the SQL query centers around the problem
`of missing or incomplete information. If Chicago stores decide to discontinue
`men’s clothing, the query in Figure 2 will break because “C.Men” nolonger refers
`to a legitimate column in Chicago.AvgSales. In contrast, FISQL assumes a
`
`ACM Transactions on Database Systems, Vol. 30, No. 2, June 2005.
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2212
`Page 3 of 37
`
`
`
`Relational Languages for Metadata Integration
`
`•
`
`627
`
`federated data model which differs from the relational model in its approach to
`missing information, and behaves gracefully under such losses of information.
`At the same time, the federated model is strictly downward compatible with
`the relational model, in that every relation has a unique federated counterpart
`(see Definition 3.5, Section 3.2). Furthermore, this counterpart can be generated
`purely syntactically without recourse to semantic information about the data
`(unlike the transformation to the tabular model in Lakshmanan et al. [1999],
`for example).
`
`1.2 Main Contributions
`In summary, the main contributions of the current article are as follows:
`
`—We introduce the Federated Relational Model in Section 2.1 as a straightfor-
`ward syntactic extension of the relational model. The federated model treats
`relation and attribute names as first class domain elements, and behaves
`well under incomplete information.
`—We introduce the notion of Transformational Completeness for query lan-
`guages in Section 2.2. This notion captures our intuition concerning
`the completeness of a query language for relational data ↔ metadata
`transformations.
`—Our foremost contribution in this article is to introduce the algebraic query
`language FIRA, which is our paradigm for transformationally complete re-
`lational metadata integration. FIRA has several advantageous properties
`beyond current relational metadata integration languages, including:
`—Compositionality—A FIRA query maps a set of federated databases to a
`single federated database. This closure is a natural parallel of RA closure,
`since RA accepts a finite number of relations and returns a single relation
`as output.
`—Downward Compatibility—FIRA contains a subalgebra isomorphic to the
`canonical RA. This entails that RA equivalences translate directly to FIRA
`equivalences. Thus (for example), FIRA Cartesian Product is commutative
`and associative, unlike previous algebras [Grant et al. 1993].
`—Generality—FIRA allows dynamic schema transformations in more gen-
`erality than previous languages such as SchemaSQL [Lakshmanan et al.
`2001] where only a single column of data could be promoted to metadata
`per query.
`—Nested Queries—FIRA includes support for nested queries, and FIRA
`queries can be arbitrarily chained together in finite sequences. This is in
`contrast to previous languages that have used view mechanisms to inte-
`grate data and metadata, such as SchemaSQL [Lakshmanan et al. 2001].
`—In Section 4, we present FISQL and show subsequently that it is equivalent
`to FIRA (Section 4.4). The FISQL-FIRA equivalence parallels the SQL-RA
`equivalence, which means rule-based query optimization techniques devel-
`oped for SQL immediately carry over to FISQL, as we show in Section 5.
`
`ACM Transactions on Database Systems, Vol. 30, No. 2, June 2005.
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2212
`Page 4 of 37
`
`
`
`628
`
`•
`
`C. M. Wyss and E. L. Robertson
`
`—Finally, in Section 6, we give a targeted overview of related work, com-
`paring properties of previous relational metadata integration languages to
`FISQL/FIRA. A tabular summary of the survey is given in Table I.
`In the next section, we begin by detailing formal notation and concepts as-
`sumed throughout the presentation.
`
`2. FORMAL PRELIMINARIES
`For the remainder of the article, we assume familiarity with the standard re-
`lational model and Relational Algebra (RA). To fix these notions, we briefly
`recapitulate what we understand to be accepted definitions of these concepts.
`We assume a domain of atomic elements, denoted dom, which contains al-
`phanumeric strings, such as 123, abc or SomeElement. Weuse teletype font to
`denote specific elements of dom. In addition, we use a special element, ⊥, called
`the null element. We assume ⊥ (cid:4)∈ dom, sothat ⊥ can be used to distinguish
`noninformation within the model.
`For metadata, we also assume a domain of atomic elements, denoted M0,
`which is usually considered to contain alphanumeric strings beginning with a
`letter. We assume M0 ⊂ dom.1
`Definition 2.1
`(1) A (Canonical) Tuple, t is a mapping from a finite set S ⊂ M0 to dom ∪{⊥}.
`Elements of S are termed attributes. The square-bracket notation t[A] is
`used to signify the element t(A) for A ∈ S.
`(2) A (Canonical) Relation has a name N ∈ M0 and a finite schema S ⊂ M0. The
`relation body consists of a finite set of (canonical) tuples t : S → dom ∪{⊥}.
`(3) A (Canonical) Database consists of a finite set of (canonical) relations.
`Definition 2.2
`(1) The Relational Algebra (RA) is understood to be the query language gener-
`ated by applying the six relational operations ρ (Renaming), σ (Selection),
`π (Projection), × (Cartesian Product), ∪ (Set Union), and—(Set Difference).
`A query in the RA maps a set of input relations (i.e., a database) to a single
`output relation.
`(2) A query language that can express all queries of the RA is said to be Rela-
`tionally Complete [Codd 1970].
`The notion of Relational Completeness is often used to gauge expressivity
`within the relational model. Beyond this, the data complexity of the RA is
`LOGSPACE. This is often used as a second gauge for relational query languages,
`since efficient query processing is of paramount importance in large data sets.
`The LOGSPACE class is considered to contain enough power for most common
`query tasks, while remaining efficient in terms of execution times. The RA itself
`is particularly suited to optimized query execution, since several nice algebraic
`
`1Classically, relational metadata and data are not compared within the model. However, relational
`metadata are represented as alphanumeric strings beginning with a letter, so in principle there is
`no problem with comparing metadata and data literals (as strings).
`
`ACM Transactions on Database Systems, Vol. 30, No. 2, June 2005.
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2212
`Page 5 of 37
`
`
`
`Relational Languages for Metadata Integration
`
`•
`
`629
`
`rules give rise to simple but effective rule-based query optimization strategies
`[Garcia-Molina et al. 2000].
`
`2.1 The Federated Relational Data Model
`One of the main contributions of our work is to extend the relational model to
`incorporate metadata. The resulting data model is termed the Federated Re-
`lational Data Model, orsimply the Federated Model. Inthe federated model,
`relational metadata can be compared and transformed along with relational
`data. However, the meta-ness of metadata in the ordinary relational model
`must be recapitulated in the federated model, only at the “next-higher” level.
`Thus, we assume a countable domain of meta-metadata, denoted M1, contain-
`ing elements distinct from any in dom and M0; that is, M1 ∩ dom = ∅ and
`M1 ∩ M0 = ∅. In the federated model, relational data and metadata are al-
`lowed to occupy both data and schema positions.2 However, federated metadata
`(elements of M1) are not allowed as data.
`Definition 2.3
`(1) A (Federated) Tuple is a mapping from a finite set S ⊂ dom∪ M1 to dom
`∪{⊥}. S is known as the Schema of the tuple, that is, S = schema(t).
`(2) A (Federated) Relation has a name N ∈ dom. The relation body consists of
`a finite set of (federated) tuples.
`(3) A (Federated) Database has a name D ∈ M0. The database body consists of
`a finite set of (federated) relations.
`(4) A Federation consists of a finite set of (federated) databases.
`
`Definition 2.4
`(1) A federated relation can be seen as a pair (cid:11)N, B(cid:12), where N ∈ dom and
`B is a finite set of federated tuples. Given federated relation R = (cid:11)N, B(cid:12),
`we use the notation name(R) and body(R) torefer abstractly to N and B
`(respectively).
`(2) Given a federated relation R, wedefine its Schema to be
`∪
`schema(R) =
`schema(t).
`t∈body(R)
`Note that a federated relation is allowed to contain tuples with (possibly)
`differing schemas. The schema of the relation is the union of the schemas of
`tuples it contains, so that the schema of a federated relation depends critically
`on the relation instance. This is an important design decision in the creation
`of the federated model, and facilitates dynamic schema transformations, such
`as relational transposition (Section 3.8) or attribute dereference (Section 3.5).
`One consequence of this decision is that algebraic operations that remove or
`change tuples may modify the relation schema. As per Definition 2.4, we denote
`
`2It may seem odd to allow numbers as attributes and relation names. However, there is no reason
`in principle to rule this out; in fact, dates or ages (for example) may be quite useful column or
`relation names. In any case, the formalism for disallowing these cases is unwieldy, so for clarity we
`defer such considerations to implementation.
`
`ACM Transactions on Database Systems, Vol. 30, No. 2, June 2005.
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2212
`Page 6 of 37
`
`
`
`630
`
`•
`
`C. M. Wyss and E. L. Robertson
`
`Fig. 3. Data-metadata transformations.
`the schema of a federated relation abstractly as schema(R), even though the
`extension of this set varies with the relation instance. This view is notationally
`convenient, and we use it throughout the remainder of this article.
`
`2.2 Transformational Completeness
`So far, we have seen federated analogs of the relational notions of tuple, relation,
`database, and schema. Another analogous concept is that of Transformational
`Completeness. We say that a query language is Transformationally Complete if
`(i) it is relationally complete and (ii) it can express schema-independent queries
`encompassing all six data-metadata transformations depicted in Figure 3. Note
`that canonical RA is not transformationally complete in this sense, as previous
`work has indicated [Lakshmanan et al. 2001].
`Several languages have been proposed that capture some or all of the trans-
`formations depicted in Figure 3. These languages are discussed in more detail
`in Section 6 after we present our transformationally complete query languages
`FIRA and FISQL (Sections 3 and 4, respectively). In fact, one of our contri-
`butions is to formalize the notion of “transformational completeness” by pos-
`tulating FIRA as a paradigmatic transformationally complete language. This
`is not to say FIRA is the final word in relational metadata integration; rather
`we use FIRA as a formal archetype for what it means to be transformationally
`complete (in the same way as RA is a formal archetype for what it means to be
`relationally complete). Thus, transformational completeness and FIRA are the
`federated analog of relational completeness and RA.
`Note that the transformations of Figure 3 must be carried out in a man-
`ner that preserves the semantics of the underlying data. For the relational
`model, one formalization of this type of symmetry is encoded in the notion of
`BP-Completeness [Abiteboul et al. 1995]. To fully capture the notion of sym-
`metry inherent in federated transformations, ongoing work is extending BP-
`Completeness and related notions to the federated data model.
`
`2.3 Higher-Order Federated Data Models
`In principle, there is no reason to stop with federations and M1. For k ≥ 2
`we can assume a “higher-order” metadata domain Mk that is disjoint from
`dom and from M j where j < k. We can then define the concepts of tuple,
`relation, database, federation, set of federations, set of sets of federations, etc.,
`and associated notions. The query languages for these “higher-order” relational
`models will have affinities with higher-order logics, just as FIRA and FISQL
`(and similar languages) have affinities with second-order logic. Ongoing work is
`characterizing the resulting relational hierarchy and languages. In particular,
`
`ACM Transactions on Database Systems, Vol. 30, No. 2, June 2005.
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2212
`Page 7 of 37
`
`
`
`Relational Languages for Metadata Integration
`
`•
`
`631
`
`extended federated models provide a segue to tree-structured databases and
`XML.
`In the present federated model, elements of M1 cannot be demoted to data
`within the federated model; thus the “chain of removal” ends with M1.
`
`3. FEDERATED INTEROPERABLE RA
`In this section, we present the query algebra called Federated Interoperable Re-
`lational Algebra. Each operator in FIRA accepts a fixed number of databases as
`input (i.e., a federation) and returns a single database. This closure is a natural
`parallel of RA closure, since RA accepts a fixed number of relations and returns
`a single relation as output. As with RA, some operators are parameterized by
`elements of the input schema.
`
`3.1 Basic Terms
`Basic terms in FIRA are database names or database variables. For example,
`“Indy” is abasic term in the FIRA. We use variables of the form D1, D2, . . . to
`denote databases.
`
`3.2 The Relational Core of FIRA
`Each of the six RA operators (ρ, σ, π, ×, ∪, −) has a federated counterpart within
`FIRA. We use hatted notation for these FIRA counterparts ( ˆρ, ˆσ , etc.). The
`definitions of the FIRA relational algebra operators include applications of
`canonical RA operators to federated relation bodies. Given federated relation R,
`body(R) isnot, strictly speaking, a canonical relation. However, these notions
`are so close that we ignore the difference to facilitate concise definitions.3
`One twist in the definition of the federated counterparts for unary RA op-
`erators is that we need to allow relation renaming in addition to attribute re-
`naming; this is done in Definition 3.1(1). Selection and Projection are straight-
`forward (Definitions, 3.1(2) and 3.1(3), respectively).
`Definition 3.1 (Federated Unary Relational Operators)
`(1) (Renaming) Let D be a federated database. There are two cases.
`(a) (General Renaming) Let Ai, Bi ∈ dom ∪ M1 for 1 ≤ i ≤ n. Then
`ˆρ A1→B1,..., An→Bn(D) = {(cid:11)name(R), ρA1→B1,..., An→Bn(body(R))(cid:12)|R ∈ D}.
`(b) (Relation Specific Renaming) Let Ai, Bi ∈ dom ∪ M1 for 1 ≤ i ≤ n and
`N, M ∈ dom. Then
`A1→B1,..., An→Bn(D) =
`ˆρ N→M
`{(cid:11)M, ρA1→B1,..., An→Bn(body(R))(cid:12)|R ∈ D, name(R) = N}
`∪ {R ∈ D|name(R) (cid:4)= N}.
`3Formally, for federated relationR, weconsider schema(R) asthe set “ S” inDefinition 2.1, extending
`tuples in body(R) with null values where necessary. Since RA operators do not demote relational
`metadata to data, we can view elements of dom and M1 in schema(R) ascanonical metadata
`constants so that the action of RA operators on columns headed by these elements is the same as
`for columns headed by elements of M0.
`
`ACM Transactions on Database Systems, Vol. 30, No. 2, June 2005.
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2212
`Page 8 of 37
`
`
`
`•
`632
`C. M. Wyss and E. L. Robertson
`(2) (Selection) Let D be a federated database and C be a well-formed Boolean
`selection condition. Then
`ˆσ C(D) = {(cid:11)name(R), σC(body(R))(cid:12)|R ∈ D}.
`(3) (Projection) Let D be a federated database and Ak ∈ dom ∪ M1 for 1 ≤ k ≤ n.
`Then
`ˆπ A1,..., An(D) = {(cid:11)name(R), πA1,..., An(body(R))|R ∈ D}.
`For the binary RA operators, the fact that relation names are now a part
`of the formalism means that we must be careful when defining the federated
`counterparts of these operations. When operating on federated databases, the
`FIRA counterparts of the binary RA operators behave consistently with the
`RA by acting non-trivially only on like named relations. Furthermore, since
`schemas vary dynamically in FIRA, we assume outer versions of the canonical
`RA operators×,∪, and−, sothat mismatched schemas will still provide sensible
`answers.
`Definition 3.2 (Federated Cartesian Product). Let D1 and D2 denote feder-
`ated databases. Their Federated Cartesian Product is defined as
`D1 ˆ× D2 = {(cid:11)name(R1), body(R1) × body(R2)(cid:12)
`| R1 ∈ D1, R2 ∈ D2 and name(R1) = name(R2)}.
`Note that unmatched relations in either database are dropped in the product,
`so the cardinality of D1 ˆ× D2 is equal to or less than that of D1. This decision
`may seem odd, but has two important consequences. First, ˆ× is commutative
`and associative, unlike in previous algebras that define the Cartesian Product
`of databases as all pairwise combinations of products [Grant et al. 1993]. Also,
`there exists an embedding of canonical relations into federated relations that
`preserves RA as a subalgebra of FIRA (Theorem 3.6, below). For similar reasons,
`federated set union and set difference are defined analogously, as follows.
`Definition 3.3 (Federated Set Union). Let D1 and D2 denote federated
`databases. Their Federated Set Union is defined as
`D1 ˆ∪D2 = {(cid:11)name(R1), body(R1) ∪ body(R2)(cid:12)
`(cid:1)
`| R1 ∈ D1, R2 ∈ D2 and name(R1) = nameR2}
`(cid:1)
`{R1 ∈ D1| there is no R2 ∈ D2 such that name(R1) = nameR2}
`{R2 ∈ D2| there is no R1 ∈ D1 such that name(R2) = nameR1}.
`Definition 3.4 (Federated Set Difference). Let D1 and D2 denote federated
`databases. The Federated Set Difference of D1 and D2 is defined as
`D1 ˆ−D2 = {(cid:11)name(R1), body(R1) − body(R2)(cid:12)
`(cid:1)
`| R1 ∈ D1, R2 ∈ D2 and name(R1) = nameR2}
`{R1 ∈ D1| there is no R2 ∈ D2 such that name(R1) = nameR2}.
`The six FIRA operators ˆρ, ˆσ , ˆπ, ˆ×, ˆ∪, and ˆ− are in and of themselves a
`compositionally closed algebra on federated databases. This algebra is thus
`
`ACM Transactions on Database Systems, Vol. 30, No. 2, June 2005.
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2212
`Page 9 of 37
`
`
`
`Relational Languages for Metadata Integration
`
`•
`
`633
`
`a subalgebra of full FIRA (which we have yet to define), and furthermore is
`equivalent to canonical RA. To see this, we require a well-defined embedding of
`canonical relations into federated databases. Note that since dom ⊂ dom ∪ M1,
`we can view a canonical relation as the body of a federated relation without
`issue. Let ε be a distinguished symbol in dom, called the “empty name”. We
`federate canonical relations simply by assigning them the name ε.
`
`Definition 3.5. Let R be a canonical relation. The Federated Counterpart
`of R, denoted federate(R), is the federated database {(cid:11)ε, R(cid:12)}.
`THEOREM 3.6. RA is isomorphic to a subalgebra of FIRA.
`
`PROOF. Definition 3.5 defines a mapping, federate, from canonical relations
`into federated databases given by R (cid:15)→ federate(R). It is easy to see that federate
`is injective. Furthermore, federate is a homomorphism with respect to the RA
`operators and their FIRA counterparts so that, for example, federate(σC(R)) =
`ˆσC(federate(R)), federate(R ∪ S) = federate(R) ˆ∪ federate(S), etc.
`On the other hand, consider the class D that consists of exactly the federated
`databases that are the image of some canonical relation under federate. Clearly,
`federate is surjective for this class. It is relatively easy to see that we have
`closure under the FIRA operations ˆρ, ˆσ , ˆπ, ˆ×, ˆ∪, and ˆ− within D.
`Thus, federate induces a subalgebra of FIRA that is isomorphic to RA. We
`term this subalgebra the Relational Core of FIRA. (cid:16)(cid:17)
`Theorem 3.6 is important because it precisely characterizes how FIRA se-
`mantics properly extends RA semantics. Previous algebras have not shown such
`a result.
`The next subsections introduce the operators of full FIRA that move beyond
`the relational core.
`
`3.3 Drop Projection
`FIRA includes a projection operator that allows one to remove elements from
`relation schemas, denoted π.
`Definition 3.7
`(1) (Drop Projection for Federated Relations) Let R be a federated relation and
`A ∈ dom ∪ M1. Then
`πA(R) = {(cid:11)name(R), πschema(R)−A(body(R)(cid:12)}.
`(2) (Drop Projection for Federated Databases) Let D be a federated database
`and A ∈ dom ∪ M1. Then πA(D) = { πA(R)|R ∈ D}. In addition, we use the
`shorthand notation ˆπA1,... , An(D) for Ak ∈ dom ∪ M1 (1 ≤ k ≤ n) tomean
`ˆπA1(···( ˆπAn(D))···).
`Note that since federated schemas vary dynamically with instances, the drop
`projection cannot uniformly be imitated with canonical projection. In fact, re-
`cent work shows that drop projection is not superfluous for polymorphic versions
`of the canonical RA [Van den Bussche and Waller 2002].
`
`ACM Transactions on Database Systems, Vol. 30, No. 2, June 2005.
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2212
`Page 10 of 37
`
`
`
`634
`
`•
`
`C. M. Wyss and E. L. Robertson
`
`Fig. 4. The relation ↓3 (Milw.FreePkSales).
`
`3.4 The Down Operator
`FIRA contains a family of operators which explicitly demote relational meta-
`data to federated data. Notationally, we use fixed constants from M1 to signify
`columns created to hold relational metadata. We will assume two families of
`constants in M1 which we represent using lowercase, subscripted versions of the
`“Fraktur” font, such as r3, and a21 (for relation, and attribute names, respec-
`tively). Note that these constants can appear in federated relation schemas,
`however they cannot be demoted to federated data. This is a consequence of
`“stopping” the relational hierarchy at federations (as discussed in Section 2.3).
`Definition 3.8 (Down Operators)
`(1) (Down Operators for Federated Relations) Let R be a federated relation and
`i ∈ N be a fixed natural number.
`(a) Let name(R) = N ∈ dom and schema(R)∩ dom = {A1, . . ., An}. We
`define metadatai(R) to bethe following set of federated tuples:
`ri
`ai
`N A1
`N A2
`...
`...
`N An
`(b) The down of R with respect to i, denoted ↓i (R) isthe federated relation
`↓i (R) = (cid:11)name(R), metadatai(R) × πri,ai(body(R))(cid:12).
`(2) (Down Operators for Federated Databases) Let D be a federated database.
`Then
`↓i (D) = {↓i (R)|R ∈ D}.
`Example 3.9. The federated relation ↓3(Milw.FreePkSales) isshown in
`Figure 4.
`Note that i only affects the choice of constants from M1 labeling the out-
`put columns of metadatai(R). The drop projection in the body of ↓i (R) en-
`sures that we only add meta-metadata subscripted by i once. Inother words,
`
`ACM Transactions on Database Systems, Vol. 30, No. 2, June 2005.
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2212
`Page 11 of 37
`
`
`
`Relational Languages for Metadata Integration
`
`•
`↓i(↓i(R)) = ↓i(R). This plus the fact that we do not include meta-metadata as
`data in the relation metadatai(R) means that ↓i (↓ j (R)) = ↓i (R) 1↓ j (R) for
`i, j ∈ N.
`
`635
`
`3.5 Attribute Dereference
`FIRA includes an operator for projecting a relation R on attributes determined
`by cell values in another attribute. For each tuple t ∈ R, instead of projecting
`that tuple on field A, weproject on the field named by t[A]. This operation is
`termed Attribute Dereference and is denoted .
`Definition 3.10 (Attribute Dereference)
`(1) (Attribute Dereference for Federated Relations) Let R be a federated relation
`A(R) = (cid:11)name(R), R(cid:19)(cid:12) where R(cid:19) is obtained
`and A, B ∈ dom ∪ M1. Then B
`from body(R) tuple-by-tuple as follows: For t ∈ body(R), we obtain s ∈ R(cid:19)
`(cid:2)
`as:
`t[t[A]] iff X = B;
`s[X ] =
`t[X ] otherwise.
`(2) (Attribute Dereference for Federated Databases) Let D be a database and
`A, B ∈ dom ∪ M1. Then B
`
`A(D) = { BA(R)|R ∈ D}.
`Example 3.11. Consider dereferencing Chi.AvgSales by the Indy depart-
`ments. We will first augment Chi.AvgSales with the column of Indy department
`names (Figure 5(a)); then we can dereference the resulting relation to obtain
`the desired information (Figure 5(b)).
`
`Dereferencing is common when comparing data from relations having “trans-
`posed” structure, such as the Chi.AvgSales and the Indy.Sales relations. This
`type of structural mismatch is commonly found when comparing spreadsheet
`data to native relational data. In these cases, dereference terms will most often
`be included within selections and projections. For these situations, we provide
`convenient notational shorthands.
`Definition 3.12
`(1) Let R be a federated relation and A, B ∈ dom ∪ M1. Let op ∈ {<, >, =, (cid:4)=,
`≤, ≥}. Then σA op B(R) isshorthand notation for
`A (R))).
`πX (σX op B( X
`where X ∈ (dom ∪ M1) − schema(R) isarbitrary (but fixed). Similarly, we
`allow selection terms of the form σA op B(R) orσ A op B(R).
`(2) Let R be a federated relation and A, B ∈ dom ∪ M1. Then π B
`A (R) isshort-
`A(R)). When such projection terms are included in a list, it is
`hand for πB( B
`understood that the dereferencing will take place first, then the projections.
`
`In general, the