throbber
From Spreadsheets to Relational Databases
`and Back?
`
`J´acome Cunha1, Jo˜ao Saraiva1, and Joost Visser2
`
`1 Departamento de Inform´atica, Universidade do Minho, Portugal
`2 Software Improvement Group, The Netherlands
`
`Abstract. This paper presents techniques and tools to transform spread-
`sheets into relational databases and back. A set of data refinement rules is
`introduced to map a tabular datatype into a relational database schema.
`Having expressed the transformation of the two data models as data re-
`finements, we obtain for free the functions that migrate the data. We
`use well-known relational database techniques to optimize and query the
`data. Because data refinements define bidirectional transformations we
`can map such database back to an optimized spreadsheet.
`We have implemented the data refinement rules and we have constructed
`tools to manipulate, optimize and refactor Excel-like spreadsheets.
`
`1 Introduction
`
`Spreadsheet tools can be viewed as programming environments for non-profes-
`sional programmers. These so-called “end-user” programmers vastly outnumber
`professional programmers [28].
`As a programming language, spreadsheets lack support for abstraction, test-
`ing, encapsulation, or structured programming. As a result, they are error-prone.
`In fact, numerous studies have shown that existing spreadsheets contain redun-
`dancy and errors at an alarmingly high rate [24, 27, 29, 30].
`Spreadsheets are applications created by single end-users, without planning
`ahead of time for maintainability or scalability. Still, after their initial creation,
`many spreadsheets turn out to be used for storing and processing increasing
`amounts of data and supporting increasing numbers of users over long periods
`of time. To turn such spreadsheets into database-backed multi-user applications
`with high maintainability is not a smooth transition, but requires substantial
`time and effort.
`In this paper, we develop techniques for smooth transitions between spread-
`sheets and relational databases. The basis of these techniques is the fundamental
`insight that spreadsheets and relational databases are formally connected by a
`data refinement relation. To find this relation we discover functional dependen-
`cies in spreadsheet data by data mining techniques. These functional dependen-
`cies can be exploited to derive a relational database schema. We then apply data
`
`? This work is partially funded by the Portuguese Science Foundation (FCT) under
`grants SFRH/BD/30231/2006 and SFRH/BSAB/782/2008.
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2239
`Page 1 of 21
`
`

`

`calculation laws to the derived schema in order to reconstruct a sequence of re-
`finement steps that connects the relational database schema back to the tabular
`spreadsheet. Each refinement step at the schema level is witnessed by bidirec-
`tional conversion steps at the data level, allowing data to be converted from
`spreadsheet to database and vice versa. Our approach is to employ techniques
`for bidirectional transformation of types, values, functions, and constraints [32],
`based on data refinement theory [23].
`We have implemented data refinement rules for converting between tabular
`and relational datatypes as a library in the functional programming language
`Haskell [25]. On this library, frontends were fitted for the exchange formats
`used by the spreadsheet systems Excel and Gnumeric. We have constructed two
`tools (a batch and an interactive version) to read, optimize, refactor and query
`spreadsheets. The tools get as argument a spreadsheet in the Excel or Gnumeric
`format and they have two different code generators: the SQL code generator,
`that produces SQL code to create and populate the corresponding relational
`database, and an Excel/Gnumeric code generator that produces a (transformed)
`spreadsheet.
`This paper is organized as follows: Section 2 presents a motivating example
`that is used throughout the paper. Section 3 briefly discusses relational databases
`and functional dependencies. In Section 4 we define data refinements and frame-
`work for constraint-aware two-level transformation. In Section 5 we present the
`refinement rules to map databases into spreadsheets. In Section 6 we describe the
`libraries and tools constructed to transform and refactor spreadsheets. Section 7
`discusses related work and Section 8 contains the conclusions. In Appendix we
`show the API of our library.
`
`2 Motivating Example
`
`Throughout the paper we will use a well-known example spreadsheet taken
`from [8] and reproduced in Figure 1. This sheet stores information about a
`housing renting system, gathering information about clients, owners and rents.
`It also stores prices and dates of renting. The name of each column gives a clear
`idea of the information it represents.
`For the sake of argument, we extend this example with two additional columns,
`named totalDays (that computes the days of renting by subtracting the column
`
`Fig. 1. A spreadsheet representing a property renting system.
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2239
`Page 2 of 21
`
`

`

`rentFinish to rentStart) and total rent (that multiplies the total number of days
`of renting by the rent-per-day value, rentPerDay). As usual in spreadsheets,
`these columns are expressed by formulas.
`This spreadsheet defines a valid model to represent the information of the
`renting system. However, it contains redundant information. For example, the
`displayed data specifies the house renting of two clients (and owners) only, but
`their names are included 5 times. This kind of redundancy makes the main-
`tenance and update of the spreadsheet complex and error-prone. A mistake is
`easily made, for example by mistyping a name and thus corrupting the data.
`The same information can be stored without redundancy. In fact, in the
`database community, techniques for database normalization are commonly used
`to minimize duplication of information and improve data integrity [31, 11]. Data-
`base normalization is based on the detecting and exploiting functional depen-
`dencies inherent in the data [5]. Can we leverage these database techniques for
`spreadsheets? Based on the data in our example spreadsheet, we would like to
`discover the following functional dependencies:
`
`clientNo * cName
`ownerNo * oName
`propertyNo * pAddress, rentPerDay, ownerNo, oName
`clientNo, propertyNo * rentStart, rentFinish, total rent, totalDays
`
`We say that an attribute b (the consequent) is functionally dependent on at-
`tribute a (the antecedent), if a uniquely determines b (notation: a * b). For
`instance, the client number functionally determines his/her name, since no two
`clients have the same number.
`After discovering these dependencies we would like to infer a relational data-
`base schema which is optimized to eliminate data redundancy. This schema can
`then be used, either to store the data in a relational database management sys-
`tem, or to create an improved spreadsheet. Figure 2 presents such an optimized
`spreadsheet for our example. This new spreadsheet consists of four tables (bold
`boxes) and the redundancy present in the original spreadsheet has been elimi-
`nated. As expected, the names of the two clients (and owners) only occur once.
`As we will demonstrate in the remaining sections of this paper, the process of
`detecting functional dependencies, deriving a normalized database schema, and
`converting the data to the new format can be formalized and automated.
`
`Fig. 2. The spreadsheet after applying the third normal form refactoring.
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2239
`Page 3 of 21
`
`

`

`After establishing a mapping between the original spreadsheet and a rela-
`tional database schema, we may want to use SQL to query the spreadsheet.
`Regarding the house renting information, one may want to know who are the
`clients of the properties that where rented between January, 2000 and January
`2002? Such queries are difficult to formulate in the spreadsheet environment. In
`SQL, the above question can be formulated as follows:
`
`select clientNo from rent
`where rentStart between ’1/01/00’ and ’1/01/02’
`
`Below we will demonstrate that the automatically derived mapping can be ex-
`ploited to fire such SQL queries at the original or the optimized spreadsheet.
`In the next sections, we will formalize the correspondence between spread-
`sheets and relational schemas using data refinement rules. We will present formal
`proofs that guarantee their correctness. Moreover, we will present a framework
`that implements the transformation rules and includes frontends to well-known
`spreadsheet systems. In fact, the example presented in this section was processed
`by our framework.
`
`3 From Functional Dependencies to RDB Schemas
`
`This section explains how to extract functional dependencies from the spread-
`sheet data and how to construct the relational schema. We start by introducing
`some concepts related to relational databases. Then, we present an algorithm to
`extract functional dependencies from data. Finally, we describe how to use the
`FDs to create a relational schema.
`
`Relational Databases and Functional Dependencies A relational database DB =
`{R1, ..., Rn} is a collection of named relations (or tables), Ri ⊆ d1 × ...× dk, de-
`fined over data sets not necessarily distinct. Each relation’s Ri element (d1, ..., dk)
`is called a tuple (or row) and each di is called an attribute. Each tuple is uniquely
`identified by a minimum nonempty set of attributes called primary key (PK).
`It could be the case of existing more then one set suitable for becoming the
`primary key. They are designated candidate keys and only one is chosen to be-
`comes primary key. A foreign key (FK) is a set of attributes within one relation
`that matches the primary key of some relation. A relational database schema
`(RDB) is a set of relation schemas each of which being a set of attributes. These
`concepts are illustrated in Figure 3.
`Another important concept is functional dependency (FD) between sets of
`attributes within a relation [5]. A set B is functionally dependent on a set A,
`denoted A * B, if each value of A is associated with exactly one value of B.
`The normalisation of a database is important to prevent data redundancy.
`Although, there are more normal forms, in general, a RDB is considered nor-
`malised if it respects the third normal form (3NF) [8], that is, if it respects the
`second normal formal (2NF) and all the non-key attributes are only dependent
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2239
`Page 4 of 21
`
`

`

`Fig. 3. An example of a relation that represents part of our example.
`
`on the key attributes. A relation respects the 2NF if it is in the first normal form
`(1NF) and its non-key attributes are not functionally dependent on part of the
`key. Finally, the 1NF is respected if each element of each row contains only one
`element.
`In order to define the RDB schema, we use the data mining algorithm Fun
`[18] to compute the FD given a spreadsheet, and then database techniques,
`namely Maier’s algorithm [16], to compute the RDB schema in the 3NF.
`We have expressed Fun as the Haskell f un function. Next, we execute
`this function with our running example (the arguments propSch and propRel
`correspond to the first and remaining lines of the spreadsheet, respectively).
`∗ ghciifun propSch propRel
`ownerNo * oName
`clientNo * cName
`totalDays * clientNo, cName
`propertyNo * pAddress, rentPerDay, ownerNo, oName
`pAddress * propertyNo, rentPerDay, ownerNo, oName
`...
`
`The FDs derived by the Fun algorithm depend heavily on the quantity and
`quality of the data. Thus, for small samples of data, or data that exhibits
`too many or too few dependencies, the Fun algorithm may not produce the
`desired FDs. For instance, in our running example and considering only the
`data shown on Figure 1, the Fun algorithm does not induce the following FD
`clientNo, propertyNo * rentStart, rentFinish, total rent, totalDays.
`
`3.1 Spreadsheet Formulas
`
`Functional dependencies are the basis for defining the RDB schema. The Fun
`algorithm, however, may compute redundant FDs which may have a negative
`impact on the design of the RDB schema. In this section, we discuss character-
`istics of spreadsheets that can be used to define a more precise set of functional
`dependencies.
`Spreadsheets use formulas to define the values of some elements in terms
`of other elements. For example, in the house renting spreadsheet, the column
`totalDays is computed by subtracting the column rentFinish to rentStart, and
`it is usually written as follows G3 = F3 - E3. This formula states that the
`
`Attributes
`
`Relation
`
`{
`
`Tuples
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2239
`Page 5 of 21
`
`

`

`values of F3 and E3 determine the value of G3, thus inducing the following
`functional dependency: rentStart, rentF inish * totalDays. Note also that
`totalDays is the primary key of a FD produced by the Fun algorithm, namely
`totalDays * clientNo, cName. Primary keys, however, must be constants rather
`than formulas. Thus, such FDs should be eliminated.
`Formulas can have references to other formulas. Consider, for example, the
`second formula of the running example I3 = G3 * H3, which defines the total
`rent by multiplying the number of days by the value of the rent. Because G3 is
`defined by another formula, the values that determine G3 also determine I3. As
`a result, the two formulas induce the following FDs:
`rentStart, rentFinish, rentPerDay * total rent
`rentStart, rentFinish * totalDays
`Functional dependencies induced by formulas are added to the ones computed
`by the Fun algorithm. In genereal a spreadsheet formula of the form X0 =
`f(X1, . . . , Xn) induces the following functional dependency: X1, . . . , Xn * X0.
`In spreadsheet systems, formulas are usually introduced by copying them through
`all the elements in a column, thus making the FD explicit in all the elements.
`This may not always be the case and some elements can be defined otherwise
`(e.g. by using a constant value or a different formula). In this case, no functional
`dependency is induced.
`
`3.2 Computing the RDB Schema
`
`Having computed the functional dependencies, we can now construct the schema
`of the RDB. Maier in [16] defined an algorithm called synthesize that receives
`a set of FDs and returns a relational database schema respecting the 3NF.
`
`begin synthesize :
`Input a set of FDs F
`Output a complete database schema for F
`1. find a reduced, minimum annular cover G for F ;
`2. for each CFD (X1, X2, ..., Xn) * Y in G, construct a relational schema
`R = X1X2...XnY with designated keys K = {X1, X2, ..., Xn};
`3. return the set of relational schemas constructed in step 2.
`end synthesize
`
`This concise, but complex algorithm works as follows: To find a minimum
`annular cover G for F we start by compute a minimum cover G0 for F . G0
`is minimum if it has as few FDs as any equivalent set of FDs. Transform G0
`into G is simple: just combine the FDs with equivalent left sides into compound
`functional dependencies (CFDs) having the form (X1, X2, ..., Xn) * Y where
`X1, ..., Xn, Y are sets of FDs and the left sets are equivalent.
`Now we need to reduce the set of CFDs and this is achieved when all the CFDs
`into the set are reduced. A CFD is reduced if no left set contains any shiftable
`attributes and the right side contains no extraneous attributes. An attribute is
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2239
`Page 6 of 21
`
`

`

`shiftable if it can be removed from the left side of the FD and inserted into the
`right side without changing the equivalence of the set of FDs. An attribute is
`extraneous if it can be removed from the FD without changing the equivalence
`of the set of FDs.
`We have implemented this algorithm in Haskell as the synthesize function.
`It gets as argument the functional dependencies (produced by the Fun) and
`returns a set of CFD. Next, we execute synthesize with the FDs induced by our
`running example (sets are represented by the predefined Haskell lists).
`∗ ghciisynthesize ◦ fun propSch propRel
`([ownerNo ], [oName ]) * [ ]
`([clientNo ], [cName ]) * [ ]
`([totalDays ]) * [cName ]
`([propertyNo ], [pAddress ], [rentPerDay ]) * [oName ]
`([rentStart, rentFinish, rentPerDay ]) * [total rent ]
`([rentStart, rentFinish ]) * [totalDays ]
`...
`Each CFD defines several candidate keys for each table. However, to fully
`characterise the RDB schema we need to chose the primary key from those can-
`didates. To find such key we use a simple algorithm: first, we produce all the
`possible tables using each candidate key as a primary key. For example, the sec-
`ond CFD above expands to two possible tables with the same attributes: one has
`clientNo as primary key and in the other is cName the primary key. Next, we
`choose the table which has the smallest PK, since in general a ’good’ table has the
`smallest key as possible. The more attributes the key determines the best. A final
`cleanup is necessary: we remove FD that all their attributes are already repre-
`sented in other FDs. We also merge two FDs whenever antecedents/consequents
`are subsets. The final result is listed bellow.
`ownerNo * oName
`clientNo * cName
`propertyNo * pAddress, rentPerDay, ownerNo
`clientNo, propertyNo * rentStart, rentFinish, total rent, totalDays
`As a final step, the set of foreign keys has to be computed by detecting which
`primary keys are referenced in the consequent of the FD.
`Next, we show the the RDB schema derived for the house renting system
`example. The RDB is represented as a tuple of tables. A table is a map between
`the PK and the remaining attributes. This datatype is constrained by an in-
`variant, defining the foreign keys, which will be explained in detail in the next
`section.
`(clientNo × propertyNo * rentStart × rentFinish × total rent × totalDays ×
`clientNo * cName ×
`(propertyNo * pAddress × rentPerDay × ownerNo ×
`ownerNo * oName)inv1 )inv2
`where
`inv1 = πownerNo ◦ ρ ◦ π1 ⊆ δ ◦ π2
`inv2 = πclientNo ◦ δ ◦ π1 ⊆ δ ◦ π1 ◦ π2 ∧ πpropertyNo ◦ δ ◦ π1 ⊆ δ ◦ π1 ◦ π2 ◦ π2
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2239
`Page 7 of 21
`
`

`

`The tables are indexed by a binary function that represents the foreign keys
`restriction. This notation and operators are introduced in the next section.
`
`4 Constraint-aware Rewriting
`
`As we have explained before, the mapping between the spreadsheet and the
`RDB models is performed through data refinements using the 2LT system. Thus,
`before we present the data refinement rules to map spreadsheets into databases,
`let us briefly describe data refinements and the 2LT system3.
`
`4.1 Datatype Refinement
`
`Data refinement theory provides an algebraic framework for calculating with
`datatypes. Refining a datatype A to a datatype B can be captured by the fol-
`lowing diagram:
`
`A
`
`to
`
`6
`
`from
`
`B
`
`where
`
` to : A → B is an injective and total relation;
`
`from : B → A a surjective function;
`from · to = idA (identity function on A);
`
`We will use A 6to
`from B as a short notation to the above diagram.
`Refinements can be composed, that is, if A 6tofrom B and B 6to0from0 C then
`
`
`A 6to0·to
`from·from0 C. Also, transformation in specific parts of a datatype must
`be propagated to the global datatype in which they are embedded, that is, if
`from B then F A 6F to
`A 6to
`F from F B where F is a functor that models the context
`of the transformation. A functor captures i) the embedding of local datatypes
`inside global datatypes and ii) the lifting of value-level functions to and f rom
`on the local datatypes to value-level transformations on the global datatypes.
`In the particular case where the refinement works in both directions we have an
`isomorphism A ∼= B.
`
`A common example is that maps are the implementation for lists [9] –
`A? 6seq2index
`N * A – where seq2index creates a map (or finite function,
`list
`here represented as M ap) where the keys are the indexes of the elements of the
`list. list just collects the elements in the map. For more details about data re-
`finements the reader is referred to [17, 19, 23].
`
`Consider now a RDB with two tables, A * B and A * C. Suppose that
`the key of the first table is a foreign key to the key of the second one. This is
`represented with the datatype constraint δ ◦ π1 ⊆ δ ◦ π2, where π1 and π2 repre-
`sent left and right projection, respectively, and δ denotes the domain of a map.
`
`3 2LT stands
`for Two-Level Transformations. The
`http://code.google.com/p/2lt/.
`
`tool
`
`is
`
`available
`
`at
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2239
`Page 8 of 21
`
`%%
`ee
`

`

`Constraints on data types are modelled as boolean functions which distinguishes
`between legal values and values that violate the constraint. A data type A with
`a constraint φ is represented as Aφ where φ : A → Bool is a total function So,
`our example becomes as follows: ((A * B)×(A * C))δ◦π1⊆δ◦π2. Further details
`about constraint data types can be found in [4, 20].
`
`4.2 Two-Level Transformations with Constraints
`
`The data refinement theory presented in the previous section was implemented as
`a rewriting system named 2LT in Haskell [4, 9]. We will now briefly introduce
`this system.
`A type-safe representation of types and functions is constructed using gen-
`eralised algebraic datatypes (GADTs) [26]. To represent types, the following
`GADT is used:
`
`data Type t where
`String :: Type String
`[·]
`:: Type a → Type [a ]
`· * ·
`:: Type a → Type b → Type a * b
`· × · ::Type a → Type b → Type (a, b)
`Maybe :: Type a → Type (Maybe a)
`:: Type a → PF (a → Bool) → Type a
`··
`...
`
`Each refinement rule is encoded as a two-level rewriting rule:
`type Rule = ∀ a . Type a → Maybe (View (Type a))
`data View a where View :: Rep a b → Type b → View (Type a)
`data Rep a b = Rep{to = PF (a → b), from = PF (b → a)}
`
`Although the refinement are from a type a to a type b, this can not be directed
`encoded since the type b is only known when the transformation completes, so
`the type b is represented as a view of the type a. A view means that given a
`function to which transforms a type a into a type b and a vice versa function
`from it is possible to construct b from a.
`These functions are represented in a point-free style, that is, without any
`variables. Its representation is accomplished by the following GADT:
`data PF a where
`:: PF ((a, b) → a)
`π1
`:: PF ((a, b) → b)
`π2
`:: PF ([a ] → Set a)
`list2set
`:: PF ((a * b) → Set b)

`:: PF ((a * b) → Set a)

`:: PF ([(a, b)] → [(b, b)])
`CompList
`:: PF ([(a, b)] → [(b, b)])
`ListId
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2239
`Page 9 of 21
`
`

`

`:: PF (a → b) → PF ([a ] → [b ])
`·?
`:: PF (a → b) → PF (Set a → Set b)
`·?
`:: PF (Pred a) → PF (Pred a) → PF (Pred a)
`· ∧ ·
`:: PF (b → c) → PF (a → b) → PF (a → c)
`· ◦ ·
`:: PF (a → b) → PF (a → c) → PF (a → (b, c))
`· 4 ·
`:: PF (a → b) → PF (c → d) → PF ((a, c) → (b, d))
`· × ·
`:: PF (a → Set b) → PF (a → Set b) → PF (Pred a)
`· ⊆ ·
`Tables2table :: PF ((a * b, c * d) → (a * b, Maybe d))
`Table2tables :: PF ((a * b, Maybe d) → (a * b, c * d))
`Table2sstable :: PF ((a * b) → [(a, b)])
`Sstable2table :: PF ([(a, b)] → (a * b))
`...
`To represent datatypes with constraints the following new Type constructor
`is used:
`·· :: Type a → PF (a → Bool) → Type a
`
`Its first argument is the type to constraint and the second one is the PF function
`representing the constraint.
`Each refinement rule can only be applied to a specific datatype, for instance,
`a refinement on the type A can not be applied to the type A × B. To allow this
`some rule-based combinators were created:
`
`nop :: Rule
`. ::Rule → Rule → Rule
`(cid:11) ::Rule → Rule → Rule
`many :: Rule → Rule
`once :: Rule → Rule
`
`-- identity
`-- sequential composition
`-- left-biased choice
`-- repetition
`-- arbitrary depth rule application
`
`Using combinators, rules can be combined in order to create a full rewrite sys-
`tems.
`
`5 From a Database to a Spreadsheet: The Rules
`
`In our approach spreadsheets are represented by a product of spreadsheet tables
`(from now designated as sstables). A sstable is a product of rows and each row is
`itself a product of values. Although, we wish to transform a spreadsheet into a
`relational database, we will introduce rules to map databases into spreadsheets
`since the former are a refinement of the later. Thus, we will use the RDB schema
`constructed, as explained in Section 3.2, and refine it to a spreadsheet. Because
`we do this data type migration within the 2LT system, we obtain for free the
`function that we will use to migrate the data of the spreadsheet into the database.
`Next, we describe several date refinements needed to transform a relational
`database into a spreadsheet. We also present a strategy to apply these refine-
`ments in order to obtain the desirable result.
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2239
`Page 10 of 21
`
`

`

`5.1 Refining a Table to a sstable
`
`It is possible to transform a table (a map with key of type A and values of type
`B) into a sstable (a list of tuples) and vice-versa as long as there is a constraint
`imposing that there exists a FD between the elements in the column of type A
`and the column of type B:
`
`T able2sstable
`
`A * B
`
`6
`
`(A × B)?
`list2set◦compList⊆list2set◦listId
`
`Sstable2table
`
`Here, list2set transforms a list into a set, compList sees a list as a relation
`and compose it with its inverse. listId is the list resulting from transforming the
`id relation into a list. This definition of FD is based on the definition of FD
`presented in [21]. From now on this invariant will be designated fd. The proof
`of this data refinement can be found in [22].
`The rule is implemented in Haskell as follows.
`table2sstable :: Rule
`table2sstable (a * b)i = return $ View rep [a × b]inv0
`where
`inv0 = trySimplify (i ◦ Sstable2table ∧ fd)
`rep = Rep{to = Table2sstable, from = Sstable2table}
`where trySimplify is a rule that simplifies the invariant.
`If the relational table has already an invariant, then it is composed with
`the invariant of the new type concerning with the FD. The resulting invariant
`is then simplified. This is just part of the rule’s implementation since another
`function is necessary to encode the rule when the initial argument does not have
`an associated invariant.
`Let us use this rule to map the table with information about clients of our
`running example.
`∗ ghciitable2sstable (clientNo * cName)
`Just (View (Rephtoihfromi) [clientNo × cName]fd)
`The result of this rule is a datatype modelling a spreadsheet that constains
`the attribute of the database. The invariant fd guarantees that the functional
`dependency is now present in the datatype. Moreoever, the returned to and from
`functions are the migration functions needed to map the data between the two
`models.
`
`5.2 Refining Tables with Foreign Keys on Primary Keys
`
`A pair of tables where the primary key of the first table is a foreign key to the
`primary key of the second table, can be refined to a pair of sstables using the
`following law:
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2239
`Page 11 of 21
`
`++
`kk
`

`

`((A * B) × (C * D))πA◦δ◦π1⊆πC◦δ◦π2
`
`Sstables2tables
`
`6
`
`T ables2sstable
`
`
`
`1◦π1⊆πC◦list2set◦π?1◦π2
`
`
`
`((A × B)?f d × (C × D)?f d)πA◦list2set◦π?
`
`
`The invariant guarantees that exists a FK from the PK of the first table
`to the PK of the second table. The πA projection has type πA : A → E and
`πC : C → E. A must be a tuple of the form A1 × . . .× E × . . .× An and C of the
`form C1 × . . . × E × . . . × Cn. This allows that just part of the PK of each table
`is used in the FK definition. The proof of this refinement corresponds to apply
`twice the refinement presented in Section 5.1. The invariant must be updated
`to work on the new type. The Haskell function table2sstable implements the
`rule.
`table2sstable :: Rule
`table2sstable ((a * b) × (c * d))πA◦δ◦π1⊆πC◦δ◦π2 =
`return $ View rep ([a × b]fd × [c × d]fd)inv
`where
`inv = πA ◦ list2set ◦ π?1 ◦ π1 ⊆ πC ◦ list2set ◦ π?1 ◦ π2
`
`
`rep = Rep{to = Table2sstable × Table2sstable,
`from = Sstable2table × Sstable2table}
`A particular instance of this refinement occurs when πA is the identity func-
`tion. In this case all the attributes of the PK of the first table, are FKs to part
`of the attributes of the PK of the second table. Another instance of this refine-
`ment is when πC = id, that is, part of the attributes of the PK of the first table
`reference all the attributes of the PK of the second table. Both cases are very
`similar to the general one and they are not shown here. A final instance of this
`rule presents πA and πC has the identity function meaning that all the attributes
`of the PK of the first table are FKs to all the attributes of the PK of the second
`table. In this case the refinement changes as we show next.
`
`((A * B) × (A * C))δ◦π1⊆δ◦π2
`
`∼=
`
`A * (C × B?)
`
`T ables2table
`
`T able2tables
`
`The values of the second table have a ?4 because it can be the case that
`some keys in the second table are not referenced in the first one. This happens
`because it is not always true that all the keys have a foreign usage. The proof of
`such an isomorphism is as follows.
`
`4 This symbol means optional. it is also representable as A + 1. In Haskell it is
`represented by the datatype Maybe x = Just x | Nothing.
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2239
`Page 12 of 21
`
`
`GG
`++
`kk
`

`

`∼=
`
`∼=
`
`∼=
`
`A * (C × B?)
`{ A? ∼= 1 * A}
`A * (C × (1 * B))
`1 ◦δ◦π2⊆δ◦π1}
`{ A * (D × (B * C)) ∼= ((A * D) × (A × B * C))π?
`((A * C) × (A × 1 * B))π?
`1◦δ◦π2⊆δ◦π1
`{ A ∼= A × 1}
`((A * C) × (A * B))δ◦π2⊆δ◦π1
`
`(1)
`
`(2)
`
`(3)
`
`Proofs of rules 1, 2, and 3 can be found in [9], [4], and [23]. A function in
`Haskell was created to define this isomorphism.
`
`tables2table :: Rule
`tables2table ((a * b) × (a * c))δ◦π1⊆δ◦π2 =
`return $ View rep (a * c × Maybe b)
`where
`rep = Rep{to = Tables2table, from = Table2tables}
`Note that each of these rules has a dual one, that is, for each rule refining a
`pair A× B there exists another one refining the pair B× A, with the appropriate
`invariant.
`
`5.3 Refining Tables with Foreign Keys on Non Primary Keys
`
`In the previous section we have introduced refinement rules to manipulate tables
`with FKs to PKs. In this section we present another set of rules to deal with FKs
`in the non-key attributes. The diagram of the general rule is presented below.
`
`((A * B) × (C * D))πB◦ρ◦π1⊆πC◦δ◦π2
`
`Sstables2tables0
`
`6
`
`T ables2sstables0
`
`
`
`2◦π1⊆πC◦list2set◦π?1◦π2
`
`
`
`((A × B)?f d × (C × D)?f d)πB◦list2set◦π?
`
`
`The proof of this refinement corresponds again to apply twice the refinement
`presented in Section 5.1. The invariant must be updated to work on the new
`type too. The function tables2sstables implements this refinement.
`tables2sstables0 ((a * b) × (c * d))πB◦ρ◦π1⊆πC◦δ◦π2 =
`return $ View rep ([a × b]fd × [c × d]fd)πB◦list2set◦π?
`
`
`2◦π1⊆πC◦list2set◦π?1◦π2
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2239
`Page 13 of 21
`
`
`GG
`

`

`where
`rep = Rep{to = Table2sstable × Table2sstable,
`from = Sstable2table × Sstable2table}
`This refinement has three other particular cases. One where πB = id, another
`where πC = id, and finally when πB = πC = id. In these cases the refinement is
`the same, only the invariant changes. The proof and implementation of this rule
`are very similar to the general case and so not shown here.
`Let us consider again our running example. In order to use this rule we
`consider two FDs where all the PK of second table is referenced by part of the
`non-key attributes of the first one.
`∗ ghciilet prop = propertyNo * pAddress × rentPerDay × ownerNo
`∗ ghciilet owner = ownerNo * oName
`∗ ghciimaps2tables (prop × owner)πownerNo◦ρ◦π1⊆δ◦π2
`Just (View (Rephtoihfromi)
`(prop0 × owner0)πownerNo◦list2set◦π?
`1◦π1⊆list2set◦π?1◦π2)
`
`where
`prop0 = [propertyNo × pAddress × rentPerDay × ownerNo]
`owner0 = [ownerNo × oName]
`In this case the two initial pairs are transformed into two lists. The constraint
`is updated to work with such structures.
`
`5.4 Data Refinements as a Strategic Rewrite System
`
`The individual refinement rules can be combined into a compound rules and full
`transformation systems using the strategy combinators shown in Section 4.2. In
`particular, we define a compound rule to map a RDB to a spreadsheet:
`rdb2ss :: Rule
`rdb2ss = simplify B
`(many ((aux tables2sstables) (cid:11) (aux tables2sstables0))) B
`(many (aux table2sstable))
`where
`aux r = ((once r) B simplify) B many ((once r) B simplify)
`This function starts by simplifying the invariants. Then the tables2sstables rule
`(defined in Section 5.2) is applied exhaustively (with aux) to transform tables
`into sstables. After that the tables2sstables0 rule (Section 5.3) is applied. In a final
`step, the remaining maps are transformed using the table2sstable (Section 5.1)
`rule. After each rule has been applied a simplification step is executed. This
`strategy requires the simplification of the invariants because pattern matching
`is performed not only on the type representations, but also on the invariants.
`The simplify rule is defined as follows:
`
`simplify :: Rule
`simplify = many (prodsrdb (cid:11) mapsrdb (cid:11) others (cid:11) myRules) B compr
`
`Enfish, LLC; IPR2014-00574
`Exhibit 2239
`Page

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket