throbber
to a System for Distributed
`Introduction
`Databases (SDD-1 )
`
`J. B. R~THNIE, JR., P. A. BERNSTEIN, s. FOX, N. GOODMAN, M. HAMMER,
`T. A. LANDERS, C. REEVE, D. W. SHIPMAN, and E. WONG
`Computer Corporation of America
`
`The declining cost of computer hardware and the increasing data processing needs of geographically
`dispersed organizations have led to substantial
`interest
`in distributed data management. SDD-1
`is a
`distributed
`database management system currently
`being developed by Computer Corporation
`of
`America. Users interact with SDD-1 precisely as if it were a nondistributed
`database system because
`SDD-1 handles all issues arising
`from
`the distribution
`of data. These
`issues include distributed
`concurrency
`control, distributed
`query processing,
`resiliency
`to component
`failure, and distributed
`directory management. This paper presents an overview of the SDD-1 design and its solutions
`to the
`above problems.
`This paper
`is the first of a series of companion papers on SDD-1
`Bernstein et al. [4], and Hammer and Shipman
`[14]).
`
`(Bernstein and Shipman
`
`[2],
`
`Key Words and Phrases: distributed
`query processing, database reliability
`CR Categories: 3.5,4.33
`
`database system, relational
`
`data model, concurrency
`
`control,
`
`1. INTRODUCTION
`database management system under development by
`SDD-1
`is a distributed
`Computer Corporation of America. SDD-1
`is a system for managing databases
`whose storage is distributed over a network of computers. Functionally,
`SDD-1
`provides
`the same capabilities
`that one expects of any modern database manage-
`ment system
`(DBMS), and users interact with
`it precisely as if it were not
`distributed.
`
`that the copies are not
`is granted provided
`fee all or part of this material
`to copy without
`Permission
`made or distributed
`for direct commercial advantage,
`the ACM copyright notice and the title of the
`publication and its date appear, and notice
`is given that copying
`is by permission of the Association
`for Computing Machinery.
`To copy otherwise,
`or
`to republish,
`requires a fee and/or
`specific
`permission.
`of
`This
`research was supported by the Advanced Research Projects Agency of the Department
`Defense under Contract NOOO39-77-C-0674, ARPA Order No. 3175-6. The views and conclusions
`contained
`in this document are those of the authors and should not be interpreted
`as necessarily
`representing
`the official policies, either expressed or implied, of the Advanced Research Projects
`Agency or the U.S. Government.
`Authors’ present addresses: J. B. Rothnie, Jr., S. Fox, T. A. Landers, C. Reeve, and D. W. Shipman,
`Computer Corporation
`of America, 575 Technology Square, Cambridge, MA 02139; P. A. Bernstein
`and N. Goodman, Center
`for Research
`in Computing Technology, Aiken Computation
`Laboratory,
`Harvard University, Cambridge, MA 02138; M. Hammer, Massachusetts
`Institute of Technology,
`Laboratory
`for Computing Science, 545 Technology
`Square, Cambridge, MA 02139; E. Wong,
`Department
`of Electrical Engineering and Computer Sciences, University
`of California at Berkeley,
`Berkeley, CA 94720.
`0 1980 ACM 0362-5915/80/0300-01$00.75
`ACM Transactions on Database Systems, Vol. 5, No. 1, March 1980, Pages l-17.
`
`AMERICAN EXPRESS v. METASEARCH
`CBM2014-00001 EXHIBIT 2008-1
`
`

`

`2
`
`-
`
`J. 6. Rothnie et al.
`
`two char-
`for applications which exhibit
`Systems like SDD-1 are appropriate
`acteristics: First, the activity
`requires an integrated database. That is, the activity
`entails access to a single pool of information
`by multiple persons, organizations,
`or programs. And second, either
`the users of the information
`or its sources are
`distributed geographically. Many data processing applications have these char-
`acteristics,
`including
`
`information,
`
`and similar data pro-
`
`control, accounting, personnel
`(1) inventory
`cessing systems in large companies;
`(2) point-of-sale accounting systems, electronic banking, and other consumer-
`oriented on-line processing systems;
`(3) large-scale data resources, e.g., census, climatology,
`databases;
`(4) military
`intelligence databases, and command and control systems;
`(5) report-generating
`systems for businesses with multiple data processing cen-
`ters; and so forth.
`
`toxicology, and similar
`
`Decentralized processing is desirable
`for reasons of
`these applications
`in
`of function. Centralized control is needed
`performance,
`reliability,
`and flexibility
`to ensure operation
`in accordance with overall policy and goals. By meeting both
`these goals in one system, distributed database management offers unique bene-
`fits.
`However, distributed database systems pose new technical challenges owing to
`their inherent
`requirements
`for data communication
`and their inherent potential
`for parallel processing. The principal bottleneck
`in these systems is data com-
`munication. All economically
`feasible long-distance communication media incur
`lengthy delays and/or
`low bandwidth. Moreover,
`the cost of moving data through
`a network
`is comparable
`to the cost of storing
`the data locally
`for many days.
`Parallel processing is also an inherent aspect of distributed systems and mitigates
`to some extent
`the communication
`factor. However,
`it
`is often difficult
`to
`construct algorithms
`that can exploit parallelism.
`For these reasons, the techniques used to implement centralized DBMSs must
`be reexamined
`in the distributed DBMS context. We have done this in developing
`SDD-1, and this paper outlines our main results.
`in
`and the flow of events
`Section 2 describes SDD-l’s
`overah architecture
`processing transactions. Sections 3 through 5 then introduce
`the techniques used
`by SDD-1 for solving the most difficult problems in distributed data management:
`concurrency
`control, query processing, and reliability. Detailed discussions of
`these techniques are presented
`in [2-4, 12, 14, 251. Section 6 explains how these
`the management of system directories. The paper
`techniques are used to handle
`concludes with a brief history of SDD-1 and a summary of its principal contri-
`butions
`to the field.
`
`2. SYSTEM ORGANIZATION
`
`2.1 Data Model
`SDD-1 supports a relational data model [S]. Users interact with SDD-1 in a high-
`level language called Datalanguage
`[9] which
`is illustrated
`in Figure 1. Datalan-
`guage differs
`from relational
`languages such as QUEL
`[15] or SEQUEL
`[7]
`
`ACM Transactions
`
`on Database Systems, Vol. 5, No. 1, March 1980.
`
`AMERICAN EXPRESS v. METASEARCH
`CBM2014-00001 EXHIBIT 2008-2
`
`

`

`Introduction to a System for Distributed Databases
`
`3
`
`Relation:
`
`CUSTOMER
`
`(Name, Branch, Acct #, SavBal, ChkBal, LoanBal)
`
`Command: Update C in CUSTOMER with C.Name = “Adams”
`
`Begin
`
`C.SavBal = C.SavBal
`
`- 100
`
`CChkBal = C.ChkBal + 100
`
`End;
`
`Fig. 1. A Datalanguage
`
`command.
`
`in its use of “declared” variables. This construct and related control
`primarily
`structures expand
`the power of Datalanguage
`to that of a general-purpose
`programming
`language. For purposes of this paper,
`the differences between
`Datalanguage and QUEL or SEQUEL are not important, and for pedagogic ease,
`we adopt QUEL
`terminology.
`is more
`for end-users but
`language
`Datalanguage may be used as a query
`typically
`invoked by host programs. Datalanguage
`is embedded in host programs
`in essentially
`the same manner as QUEL or SEQUEL. That
`is, the host program
`issues self-contained Datalanguage commands to SDD-1, which processes these
`commands exactly as if entered by an end-user.
`(e.g., the command
`A single Datalanguage command
`is called a transaction
`shown in Figure 1 is a transaction). Transactions are the units of atomic inter-
`action between SDD-1 and the external world. This concept of transaction
`is
`similar
`to that of INGRES
`[ 151 and System R [ 11.
`is
`relation
`An SDD-1 database consists of (logical)
`relations. Each SDD-1
`into subrelations called logical fragments, which are the units of data
`partitioned
`distribution.
`Logical
`fragments are defined
`in two steps. First,
`the relation
`is
`partitioned horizontally
`into subsets defined by “simple”
`restrictions.’ Then each
`horizontal
`subset is partitioned
`into subrelations defined by projections
`(see
`Figures 2 and 3). To reconstruct
`the logical relation
`from its fragments, a unique
`tuple
`identifier
`is appended
`to each tuple and
`included
`in every
`fragment
`[lo, 211.
`Logical fragments are the units of data distribution, meaning that each may be
`stored at any one or several sites in the system. Logical
`fragments are defined
`and the assignment of fragments
`to sites is made when the database is designed.
`is called a stored fragment.
`A stored copy of a logical fragment
`or redundancy.
`Note that user transactions are unaware of data distribution
`responsibility
`to
`They
`reference only relations, not fragments.
`It
`is SDD-l’s
`translate
`from relations
`to logical
`fragments, and
`then
`to select
`the stored
`fragments
`to access in processing any given transaction.
`
`2.2 General Architecture
`SDD-1
`is a collection of three
`Modules
`(TMs), Data Modules
`configured as in Figure 4.
`
`types of virtual machines [16]-Transaction
`(DMs), and a Reliable Network
`(RelNet)-
`
`is a Boolean expression whose clauses are of the form
`restriction
`’ A simple
`(constant), where
`(rel-op)
`is =, #, >, c, etc.
`
`(attribute)
`
`(rel-op),
`
`ACM Transactions on Database Systems, Vol. 5, No. 1, March 1980.
`
`AMERICAN EXPRESS v. METASEARCH
`CBM2014-00001 EXHIBIT 2008-3
`
`

`

`4
`
`-
`
`J. B. Rothnie et al.
`
`CUSTOMER
`
`(Name,
`
`Branch,
`
`Acct.#,
`
`SavBal,
`
`ChkBal,
`
`LoanBaI)
`
`CUST-1
`
`CUST-2
`
`CUST-3a
`
`CUST-3b
`
`CUST-1
`CUST-2
`CUST-3a
`CUST-3b
`
`= CUSTOMER where Branch = 1
`= CUSTOMER where Branch = 2
`= CUSTOMER where Branch = 3 and LoanBal f 0
`= CUSTOMER where Branch = 3 and LoanBal = 0
`
`Fig. 2. Horizontal
`
`partitioning.
`
`CUSTOMER
`
`(Name,
`
`Branch,
`
`Acct#,
`
`SavBal,
`
`ChkBal,
`
`LoanBal)
`
`CUST-1
`
`I
`
`CUST-I.1
`
`CUST-1.2
`
`I
`
`I
`
`CUST-2
`
`CUST-1.1
`CUST-1.2
`etc.
`
`= CUST-1
`= CUST-1
`
`[Name, Branch]
`[Acct#, SavBal, ChkBal, LoanBal]
`
`In order
`Fig. 3. Vertical partitioning.
`unique
`tuple identifier
`is appended
`
`its fragments, a
`from
`to reconstruct CUSTOMER
`to each tuple and included
`in euery fragment
`[21].
`
`ACM Transactions on Database Systems, Vol. 5, No. 1, March 1980.
`
`Fig. 4. SDD-1 configuration.
`
`AMERICAN EXPRESS v. METASEARCH
`CBM2014-00001 EXHIBIT 2008-4
`
`

`

`Introduction
`
`to a System for Distributed Databases
`
`5
`
`(DMs). DMs are, in
`All data managed by SDD-1 are stored by Data Modules
`effect, back-end DBMSs
`that respond to commands from Transaction Modules.
`DMs respond to four types of commands: (1) read part of the DM’s database into
`a local workspace at that DM; (2) moue part of a local workspace from this DM
`to another DM; (3) manipulate data in a local workspace at the DM; (4) write
`part of the local workspace into the permanent database stored at the DM.
`Transaction Modules
`(TMs) plan and control
`the distributed
`execution of
`transactions. Each transaction processed by SDD-1
`is supervised by some TM
`which performs
`the following
`tasks:
`
`(1) Fragmentation. The TM
`into queries on
`translates queries on relations
`logical fragments and decides which
`instances of stored fragments
`to access.
`(2) Concurrency control. The TM synchronizes
`the transaction with all other
`active transactions
`in the system.
`(3) Accessplunning. The TM compiles the transaction
`which can be executed cooperatively by several DMs.
`(4) Distributed query execution. The TM coordinates execution of the com-
`piled access plan, exploiting parallelism whenever possible.
`
`into a parallel program
`
`(RelNet) which
`is the Reliable Network
`third SDD-1 virtual machine
`The
`interconnects TMs and DMs
`in a robust
`fashion. The RelNet provides
`four
`services:
`
`(1) guaranteed deliuery, allowing messages to be delivered even if the recipient
`is down at the time the message is sent, and even if the sender and receiver
`are never up simultaneously;
`(2) transaction control, a mechanism
`for posting updates at multiple DMs,
`that either all DMs post the update or none do;
`guaranteeing
`(3) site monitoring,
`to keep track of which sites have failed, and to inform sites
`impacted by failures;
`(4) network clock, a virtual clock kept approximately
`
`synchronized at all sites.
`
`three pieces:
`into
`the distributed DBMS problem
`divides
`This architecture
`database management, management of distributed
`transactions, and distributed
`DBMS
`reliability.
`By
`implementing
`each of these pieces as a self-contained
`virtual machine,
`the overall SDD-1 design is substantially
`simplified.
`
`2.3 Run-Time Structure
`in a distributed DBMS,
`to execute a transaction
`Among
`the functions
`required
`three are especially difficult:
`concurrency control, distributed query processing,
`and reliable posting of updates. SDD-1 handles each of these problems
`in a
`distinct processing phase, so that each can be solved independently. Consider
`transaction T of Figure 1. When T is submitted
`to SDD-1
`for processing,
`the
`system invokes a three-phase processing procedure. The phases are called Read,
`Execute, and Write.
`fast phase is the Read phase and exists for purposes of concurrency
`The
`control. The TM
`that is supervising T analyzes it to determine which portions of
`the (logical) database it reads, called its read-set. In this case the TM would
`ACM Transactions
`on Database Systems, Vol. 5, No. 1, March 1980.
`
`AMERICAN EXPRESS v. METASEARCH
`CBM2014-00001 EXHIBIT 2008-5
`
`

`

`6
`
`-
`
`J. B. Rothnie et al.
`
`Page Map for T
`
`Page # storage lot
`
`Page Map for T ’
`
`.
`.
`.
`
`Fig. 5. Version
`
`to storage mapping. T and T’ share each page until one transaction
`or the other modifies
`the page.
`
`determine
`
`that T’s read-set is
`
`{CSavBal, C.ChkBal 1 C.Name = “Adams”}.
`
`that
`to access to obtain
`the TM decides which stored fragments
`In addition,
`those
`data. Then
`the TM
`issues Read commands
`to the DMs
`that house
`for
`to set aside a private copy of that fragment
`fragments,
`instructing
`each DM
`use during subsequent processing phases.
`to be consistent
`The private copies obtained by the Read phase are guaranteed
`for guaranteeing
`even though the copies reside at distributed sites. The techniques
`consistency are described
`in Section 3. Since the data are consistent when read,
`and since the copies are private, subsequent phases can operate freely on these
`data without
`fear of interference
`from other transactions.
`the
`We emphasize that no data are actually
`transferred between sites during
`Read phase. Each DM simply sets aside the specified data in a workspace at the
`DM. Moreover,
`in each DM,
`the private workspace
`is implemented
`using a
`differential
`file mechanism
`[22], so data are not actually copied. This mechanism
`operates as follows. The primary organization of a stored fragment
`is a paged
`file, much like a UNIX
`[20] or TENEX
`[6] file. Apage
`is a unit of logical storage;
`a page map is a function
`that associates a physical storage location with each
`page (see Figure 5). The “private copy” set aside by a Read command is in reality
`a page map. Page maps behave
`like private copies because pages are never
`updated
`in place;
`if page P is modified on behalf of transaction T’, say, a new
`block of secondary storage is allocated, and the modified page is written
`there. T’
`is able to access the modified page because its page map is also modified
`to reflect
`ACM Transactions on Database Systems, Vol. 5, No. 1, March 1980.
`
`AMERICAN EXPRESS v. METASEARCH
`CBM2014-00001 EXHIBIT 2008-6
`
`

`

`Introduction
`
`to a System for Distributed Databases
`
`-
`
`7
`
`transactions are unaffected because their page
`P’s new storage location. Other
`maps remain unchanged. A similar mechanism
`is described in [ 181.
`the Execute phase, implements distributed
`query
`The second phase, called
`processing. At this time, the TM compiles T into a distributed program that takes
`as input
`the distributed workspace created by the Read phase. This compilation
`procedure
`is described in Section 4. The compiled program consists of Move and
`Manipulate
`commands which cause the DMs
`to perform
`the intent of T in a
`distributed
`fashion. The compiled program
`is supervised by the TM
`to ensure
`that commands are sent to DMs
`in the correct order and to handle
`run-time
`errors.
`into the database
`The output of the program is a list of data items to be written
`(in the case of update
`transactions)
`or displayed
`to the user (in the case of
`retrievals).
`In our example, this output
`list would contain a unique tuple identifier
`for Adams’ tuple, identifiers
`for the field names SavBal and ChkBal, and the new
`values for these fields. This output
`list is produced in a workspace (i.e., temporary
`file) at one DM, and is not yet installed
`into
`the permanent database. Conse-
`quently, problems of concurrency
`control and reliable writing are irrelevant
`during
`this phase.
`The final phase, called the Write phase, installs data modified by T into the
`permanent database and/or displays data retrieved by T to the user. For each
`entry
`in the output
`list, the TM determines which DM(s) contain copies of that
`data item. The TM orders the final DM
`that holds the output
`list to send the
`appropriate entries of the output
`list to each DM; it then issues Write commands
`to each of these DMs
`thereby causing the new values to be installed
`into
`the
`database. Techniques described in Section 5 are used during
`the Write phase to
`ensure that partial
`results are not installed or displayed even if multiple sites or
`communication
`links
`fail
`in midstream. This
`is the most difficult
`aspect of
`distributed DBMS
`reliability,
`and by separating
`it into a distinct phase, we
`simplify both it and the other phases.
`the key
`in SDD-1 neatly partitions
`The three-phase processing of transactions
`technical problems of distributed database management. The next sections of this
`paper explain how SDD-1 solves each of these independent problems.
`
`3. CONCURRENCY
`CONTROL
`The problems
`that arise when multiple users access a shared database are well
`known. Generically
`there are two types of problems:
`(1) If transaction Tl
`is
`reading a portion of the database while
`transaction T2 is updating
`it, Tl might
`read inconsistent data (see Figure 6). (2) If transactions T3 and T4 are both
`updating
`the database, race conditions can produce erroneous results (see Figure
`7). These problems arise in all shared databases-centralized
`or distributed-and
`solved using database locking. However, we have developed
`are conventionally
`a new technique
`for SDD-1.
`
`3.1 Methodology
`SDD-1, like most other DBMSs, adopts serializability as its criterion
`for concur-
`rent correctness. Serializability
`requires that whenever
`transactions execute con-
`their effect must be identical
`to some serial
`(i.e., noninterleaved)
`currently,
`ACM Transactions on Database Systems, Vol. 5, No. 1. March 1980.
`
`AMERICAN EXPRESS v. METASEARCH
`CBM2014-00001 EXHIBIT 2008-7
`
`

`

`0
`
`*
`
`J. 8. Rothnie et al.
`
`Consider the database of Figures 2 and 3, and assume fragments CUST-3a.2, CUST-3a.3, are stored
`at different DMs:
`Let transaction Tl be
`Range of C is CUSTOMER;
`Retrieve C (SavBal + ChkBal) where C.Name = “Adams”;
`Let transaction T2 be
`Range of C is CUSTOMER;
`Replace C(SavBal = SavBal - $100, ChkBal = ChkBal + $100) where C.Name = “Adams”;
`And suppose Tl and T2 execute in the following concurrent order
`Tl reads Adams’ SavBal (= $looO) from fragment CUST-3a.2
`T2 writes Adams’ SavBal (= $900) into fragment CUST-3a.2
`T2 writes Adams’ ChkBal (= $100) into fragment CUST-3a.3
`Tl reads Adams’ ChkBal (= $100) from fragment CUST-3a.3
`Tl’s output will be $lOJ.IO + $100 = $1100, which is incorrect.
`
`Fig. 6. Reading inconsistent data.
`
`Given the database of Figures 2 and 3.
`Let transaction T3 be
`Range of C is CUSTOMER;
`Replace C (ChkBal = ChkBal + $100) where C.Name = “Monroe”;
`Let transaction T4 be
`Range of C is CUSTOMER;
`Replace C (ChkBal = ChkBal - $50) where CName = “Monroe”;
`And suppose T3 and T4 execute in the following concurrent order
`T3 reads Monroe’s ChkBal
`(= $50)
`T4 reads Monroe’s ChkBal
`(= $50)
`T4 writes Monroe’s ChkBal (= $0)
`T3 writes Monroe’s ChkBal (= $50 + $100 = $150)
`The value of ChkBai left in the database is $150, which is incorrect. The final balance should be $50
`- $50 + $100 = $100.
`
`Fig. 7. Race condition producing erroneous update.
`
`is based on the assumption
`execution of those same transactions. This criterion
`that each transaction maps a consistent database state into another consistent
`state. Given this assumption, every serial execution preserves consistency. Since
`a serializable execution
`is equivalent
`to a serial one, it too preserves database
`consistency.
`through database locking. By locking, we
`ensure serializability
`Most DBMS
`mean a synchronization method
`in which
`transactions dynamically
`reserve data
`before accessing them [ 111.
`from
`that are distinctly different
`SDD-1 uses two synchronization mechanisms
`[5]. The first mechanism, called conflict graph analysis, is a technique
`locking
`for analyzing
`“classes” of transactions
`to detect those transactions
`that require
`The second mechanism consists of a set of synchro-
`little or no synchronization.
`nization protocols based on “timestamps,” which synchronize
`those transactions
`that need it.
`ACM Transactions
`
`on Database Systems, Vol. 5, No. 1, March 1980.
`
`AMERICAN EXPRESS v. METASEARCH
`CBM2014-00001 EXHIBIT 2008-8
`
`

`

`Introduction
`
`to a System for Distributed Databases
`
`9
`
`Define transactions Tl and T2 as in Figure 6
`read-set (Tl) = {C.SavBal, C.ChkBal where C.Name = “Adams”}
`write-set(T1) = {}
`read-set (T2) = read-set(T1)
`write-set(T2) = read-set(T1)
`T2
`Tl
`‘2
`‘1
`
`w1 N Note: nodes ri and Wi
`
`denote the read-set
`and write-set of
`transaction Ti
`
`W2
`
`Define transaction T3 and T4 as in Figure 7
`read-set (T3) = {C.ChkBal, where C.Name = “Monroe”}
`write-set(T3) = read-set(T3)
`read-set (T4) = read-set(T3)
`write-set(T4) = read-set(T3)
`T4
`‘4
`
`T3
`‘3
`
`W3 lxl
`
`w4
`
`Fig. 8. Conflict graphs.
`
`3.2 Conflict Graph Analysis
`is the portion of the database it reads and its write-
`The read-set of a transaction
`set is the portion of the database it updates. Two transactions conflict if the read-
`set or write-set of one intersects
`the write-set of the other. In a system that uses
`locking, each transaction
`locks data before accessing them, so conflicting
`trans-
`actions never run concurrently. However, not all conflicts violate serializability;
`that
`is, some conflicting
`transactions
`can safely be run concurrently. More
`concurrency
`can be attained by checking whether or not a given conflict
`is
`troublesome, and only synchronizing
`those that are. Conflict graph analysis is a
`technique
`for doing this.
`the read-sets and write-sets of trans-
`The nodes of a conflict graph represent
`actions, and edges represent conflicts among these sets. (There
`is also an edge
`between
`the read-set and write-set of each transaction.) Figure 8 shows sample
`conflict graphs. The important property
`is that different kinds of edges require
`different
`levels of synclironization,
`and that synchronization
`as strong as locking
`is required only for edges that participate
`in cycles [2]. In Figure 8, for example,
`transactions Tl and T2 do not require synchronization
`as strong as locking,
`whereas T3 and T4 do.
`It is impractical
`to use conflict graph analysis at run-time because too much
`intersite communication would be required
`to exchange information
`about con-
`flicts.
`Instead, during database design we apply the technique off-line as follows:
`defines transaction classes, which are named groups
`The database administrator
`of commonly executed transactions. Each class is defined by its name, a read-set,
`a write-set, and the TM at which
`it runs; a transaction
`is a member of a class if
`ACM Transactions on Database Systems, Vol. 5, No. 1, March 1980.
`
`AMERICAN EXPRESS v. METASEARCH
`CBM2014-00001 EXHIBIT 2008-9
`
`

`

`10
`
`.
`
`J. B. Rothnie et al.
`
`the transaction’s read-set and write-set are contained in the class’s read-set and
`write-set, respectively. Conflict graph analysis is actually performed on these
`transaction classes, not on individual
`transactions as in Figure 8. (Notice that
`transactions from different classes can conflict only if their classes conflict.) The
`output of the analysis is a table which tells (a) for each class, which other classes
`it conflicts with, and (b) for each such conflict, how much synchronization (if
`any) is required to ensure serializability.
`to supervise
`is only permitted
`It is convenient
`to assume that each TM
`transactions from one class, and vice versa.’ At run-time, when transaction T is
`submitted, the system determines which class(es) T is a member of, and sends T
`to the TM that supervises one of these classes. The TM synchronizes T against
`other transactions in its class using a local mechanism similar to locking. To
`synchronize T against transactions in other classes, the TM uses the synchroni-
`zation method(s) specified by the conflict graph analysis. These methods are
`called “protocols” and are described below.
`
`3.3 Time&amp-Based Protocols
`To synchronize two transactions that conflict dangerously, one must be run first,
`and the other delayed until it can safely proceed. In locking systems, the execution
`order is determined by the order in which transactions request conflicting locks.
`In SDD-1, the order is determined by a total ordering of transactions induced by
`timestamps. Each transaction submitted to SDD-1 is assigned a globally unique
`timestamp by its TM. Timestamps are generated by concatenating a TM iden-
`tifier to the right of the network clock time, so that time&amps from different
`TMs always differ in their low-order bits. This means of generating unique
`timestamps was proposed in [23].
`The timestamp of a transaction is attached to all Read and Write commands
`sent to DMs on the behalf of that transaction. In addition, each Read command
`contains a list of classes that conflict dangerously with the transaction issuing the
`Read (this list was determined by the conflict graph analysis). When a DM
`receives a Read command, it defers the command until
`it has processed all
`earlier Write commands (i.e., those with smaller timestamps) and no later Write
`commands (i.e., those with larger ones) from the TMs for the specified classes.
`The DM can determine how long to wait because of a DM-TM communication
`discipline called piping.
`Piping requires that each TM send its Write commands to DMs in timestamp
`order. In addition, the Reliable Network guarantees that messages are received
`in the order sent. Thus when a DM receives a Write from (say) TMx timestamped
`(say) TSx, the DM knows it has received all Write commands from TMx with
`timestamps less than TSx. So, to process a Read command with timestamp TSR,
`the DM proceeds as follows:
`
`For each class specified in the Read command, the DM processes all Write
`commands from that class’s TM up to (but not beyond) TSR. If, however,
`
`* This assumption engenders no loss of generality since several TMs can be multiprogrammed at one
`site, and several classes can be defined with identical read-sets and write-sets.
`ACM Transactions on Database Systems, Vol. 5, No. 1, March 1980.
`
`AMERICAN EXPRESS v. METASEARCH
`CBM2014-00001 EXHIBIT 2008-10
`
`

`

`Introduction
`
`to a System
`
`for Distributed Databases
`
`11
`
`the DM has already processed a Write command with
`TSR from one of these TMs,
`the Read is rejected.
`
`timestamp beyond
`
`for Write commands, idle TMs periodically
`To avoid excessive delays in waiting
`send null
`(empty)
`timestamped Write commands; also, an impatient DM can
`that is slow in sending them.
`from a TM
`explicitly
`request a null Write
`to
`The synchronization
`protocol we have just described roughly corresponds
`locking and is designed to avoid “race conditions”
`[5]. However,
`there are several
`variations of this protocol, depending on the type of timestamp attached to Read
`commands and the interpretation
`of the timestamps by DMs. For example, read-
`the DM selects the
`only transactions can use a less expensive protocol
`in which
`timestamp,
`thereby avoiding
`the possibility of rejection and reducing delay. The
`variety of available synchronization
`protocols
`is an important
`feature of SDD-l’s
`concurrency control.
`The SDD-1 concurrency control mechanism
`[3, 41; its correctness is formally proved in [2].
`When all Read commands have been processed, consistent private copies of
`the read-set have been set aside at all necessary DMs. At this point,
`the Read
`phase is complete.
`
`is described
`
`in greater detail
`
`in
`
`4. DISTRIBUTED
`
`QUERY PROCESSING
`
`read-set, the next step is to
`Having obtained a consistent copy of a transaction’s
`compile
`the transaction
`into a parallel program and execute it. The key part of
`is access planning, an optimization
`procedure
`that minimizes
`the compilation
`the object program’s
`intersite communication
`needs while maximizing
`its paral-
`lelism. Access planning
`is discussed in Section 4.1, and execution of compiled
`transactions
`is explained
`in Section 4.2.
`
`4.1 Access Planning
`transaction T is to move all of
`Perhaps the simplest way to execute a distributed
`T’s read-set to a single DM, and then execute T at that DM (see Figure 9). This
`approach works but suffers two drawbacks: (1) T’s read-set might be very large,
`and moving it between sites could be exorbitantly
`expensive; (2) little use is made
`of parallel processing. Access planning overcomes these drawbacks.
`The access planner produces object programs with two phases, called reduction
`and final processing. The reduction phase eliminates
`from T’s read-set as much
`data as is economically
`feasible without changing T’s answer. Then, during
`final
`processing, the reduced read-set is moved to a designated “final” DM where T is
`executed. This structure mirrors
`the simple approach described above but lowers
`communication
`cost and increases parallelism via reduction.
`restriction and projection operators, plus an
`Reduction employs
`the familiar
`operator called semi-join, defined as follows: let R(A, B) and S(C, D) be relations;
`the semi-join of R by S on a qualification
`q (e.g., R.B = S.C) equals the join of R
`and S on q, projected back onto the attributes of R (see Figure 10). If R and S are
`stored at different DMs,
`this semi-join
`is computed by projecting S onto the
`attributes of q (i.e., S.C), and moving
`the result to R’s DM.
`We define the cost of an operation
`to be the amount of data (e.g., number of
`ACM Transactions on Database Systems, Vol. 5, No. 1, M&h
`1980.
`
`AMERICAN EXPRESS v. METASEARCH
`CBM2014-00001 EXHIBIT 2008-11
`
`

`

`12
`
`-
`
`J. B. Rothnie et al.
`
`Given the database of Figures 2 and 3.
`
`Let transaction T5 be
`Range of C is CUSTOMER;
`Replace C (ChkBal = ChkBal + LoanBal) where LoanBal < 0;
`(The effect of T5 is to credit
`loan overpayments
`to customers’ checking accounts.)
`
`Simple strategy
`Move every
`fragment
`T5 locally at that site.
`
`that could potentially
`
`contribute
`
`to T5’s result
`
`to a designated site. Process
`
`Fig. 9. Simple execution strategy.
`
`Given:
`CUST
`
`AUTO-PAY
`
`(Name,
`Jeff.
`Adams
`Polk
`Tyler
`Buchanan
`Johnson
`
`LoanBal)
`ChkBal,
`$3oooo
`$399
`$2~
`$100
`$2oooo
`$250
`$15ooo
`$lf)o
`$4oooo
`$799
`$2c@ @oooo
`
`(Name,
`Jeff.
`Adams
`Polk
`Tyler
`Buchanan
`Johnson
`
`Amount)
`$399
`$2@3
`@@J
`$150
`$499
`$200
`
`(i)
`Example
`The semi-join of
`CUST by AUTO-PAY
`equals the join of
`CUST and AUTO-PAY
`ChkBal,
`(Name,
`Jeff.
`$309
`$200
`Johnson
`$200
`Johnson
`$200
`Johnson
`projected onto
`PJame,
`Jeff.
`Johnson
`
`ChkBal,
`$399
`$2W
`
`on CUSTChkEal
`
`= AUTO-PAY.Amount
`
`on CUST.ChkBal
`LoanBal,
`$3oooo
`$2oooo
`$2oooo
`woo00
`
`= AUTO-PAY.Amount
`Amount)
`Name,
`Jeff.
`$309
`$200
`Adams
`$200
`Polk
`$200
`Johnson
`
`LoanBal)
`$3oooo
`$2oooo.
`
`(ii)
`Example
`The semi-join of
`on CUST.ChkBal
`CUST by AUTO-PAY
`and CUST.Name = AUTO-PAY.Name
`equals:
`ChkBal,
`LoanBal)
`(Name,
`$2oooo
`Adams
`$100
`$15900
`Tyler
`$100
`(These are customers whose balances are insufficient
`
`< AUTO-PAY.Amount
`
`for their automatic
`
`loan payments.)
`
`Fig. 10. Semi-join examples.
`
`it, while its benefit is the
`that must be sent between sites to execute
`bytes)
`amount by which
`it reduces the size of its opera

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket