throbber

`
`MODERN
`
`DATABASE
`
`MANAGEMENT
`
`Fzfi/a Edition
`
`Fred R. MéFadden
`
`[6199631 A. Hofifer ..
`
`Mary B. Prescatt-'»»'
`
`
`
`Apple, Exhibit 1026, Page 1
`
`Apple, Exhibit 1026, Page 1
`
`

`

`
`
`
`
`I In memory of my valued colleague Daniel Conger. — P. R. M.
`
`I To Patty, for her sacrifices, encouragement, and support.
`To my students, for being receptive and critical, and challenging me to
`be a better teacher. — I. A. H.
`
`I To Larry, Mike, and Ivan. Their love and support provide a foundation
`to my life which makes efforts such as writing this book possible. And
`to Jeff and Fred, who gave me the opportunity to write with them and
`patiently provided invaluable guidance along the way. — M. B. P.
`
`Executive Editor: Michael Roche
`
`Developmental Editor: Maureen Allaire Spada
`Assistant Editor: Ruth Berry
`Editorial Assistant: Adam Hamel
`
`Senior Marketing Manager: Tom Ziolkowski
`Senior Marketing Coordinator: Deanna Storey
`Senior Production Supervisor: Patty Mahtani
`Manufacturing Buyer: Sheila Spinney
`Design Director: Regina Hagen
`Cover Designer: Linda Manly Wade
`Text Designer: Sandra Rigney
`Composition and Project Coordination: Elm Street Publishing Services, Inc.
`Cover Image: © Mick Tarel for Artville
`
`Library of Congress Cataloging-in-Publication Data
`
`McFadden, Fred R., 1933—
`Modem database management / Fred R. McFadden, Jeffrey A. Hoffer,
`Mary B. Prescott. —— 5th ed.
`p.
`cm.
`Includes bibliographical references and index.
`ISBN 0—8053-6054—9
`
`1. Database management.
`111. Title.
`QA76.9.D3M395
`005.74—dc21
`
`1999
`
`I. Hoffer, Jeffrey A.
`
`II. Prescott, Mary B.
`
`Reprinted with corrections, May 1999
`Copyright © 1999 by Addison—Wesley Educational Publishers, Inc.
`
`98—4120
`
`C113
`
`All rights reserved. No part of this publication may be reproduced, stored in a retrieval system,
`or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or
`otherwise, without the prior written permission of the publisher. Printed in the United States of
`America.
`
`5678910WCT020100
`
`Apple, Exhibit 1026, Page 2
`
`
`
`Apple, Exhibit 1026, Page 2
`
`

`

`
`
`
`
`Distributed Databases
`
`LEARNING OBJECTIVES
`
`
`
`
`. EXplaintou. strateg es fo » the deSign ofdistributeddatabases, options within
`each, strategy, and-the factors to consider in seleCtion among these strategies. '
`
`WuWw—mwmwwmmmmmwmm
`
`I State the relative advantages of synchronous and asynchronous data
`replication and partitioning as three'major approaches for distributed
`database design.
`
`I Outline the steps involved in processing a query in a distributed database
`and several approaches used to optimize distributed query processing.
`
`I Explain the salient features of several distributed database management
`systems.
`
`INTRODUCTION
`
`When an organization is geographically dispersed, it may choose to store its data-
`bases on a central computer or to distribute them to local computers (or a combina-
`tion of both). A distributed database is a single logical database that is spread
`
`Distributed database: A
`
`single logical database that is
`spread physically across
`computers in multiple
`locations that are connected by
`adata communications link.
`417
`
`Apple, Exhibit 1026, Page 3
`
`Apple, Exhibit 1026, Page 3
`
`

`

`
`
`418
`
`Chapter 11 —— Distributed Databases
`

`
`’
`
`l
`
`..
`
`‘
`
`i
`
`‘
`
`r
`
`'
`
`physically across computers in multiple locations that are connected by a data com-
`munications network. We emphasize that a distributed database is truly a database,
`not a loose collection of files. The distributed database is still centrally administered
`as a corporate resource while providing local flexibility and customization. The net—
`work must allow the users to share the data; thus a user (or program) at location A
`must be able to access (and perhaps update) data at location B. The sites Of a distrib-
`uted system may be spread over a large area (such as the United States or the world),
`or over a small area (such as a building or campus). The computers may range from
`microcomputers to large-scale computers or even supercomputers.
`A distributed database requires multiple database management systems, run-
`ning at each remote site. The degree to which these different DBMSs cooperate, or
`work in partnership, and whether there is a master site that coordinates requests
`involving data from multiple sites distinguish different types of distributed data-
`base environments.
`
`It is important to distinguish between distributed and decentralized databases. A
`decentralized database is also stored on computers at multiple locations; however,
`the computers are not interconnected by a network, so that users at the various sites
`cannot share data. Thus a decentralized database is best regarded as a collection of
`.
`.
`.
`.
`.
`-
`.
`independent databases, rather than havmg the geographical distribution of a Single
`d t ba a ase.
`.
`_
`.
`.
`~
`-
`Various busmess conditions encourage the use of distributed databases:
`.
`_
`_
`
`Decentralized database:
`A database that is Stored 0“
`compmers at multiple
`1
`'
`; h
`I
`Demons
`owever. the
`computers are not inter—
`connected by a network, so
`that users at the van-Gus sites
`cannot share data.
`
`in modern organizations are often geographically (and possibly internationally)
`
`o Distribution and autonomy ofbusmess units Divisions, departments, and facilities
`distributed. Often each unit has the authority to create its own information
`
`systems, and often these units want local data over which they can have controls.
`
`0 Data sharing Even moderately complex business decisions require sharing data
`across business units, so it must be convenient to consolidate data across local
`databases on demand.
`
`o Data communications costs and reliability The cost to ship large quantities of
`data across a communications network or to handle a large volume of transac—
`tions from remote sources can be high. It is often more economical to locate
`data and applications close to where they are needed. Also, dependence on
`data communications can be risky, so keeping local copies or fragments of data
`can be a reliable way to support the need for rapid access to data across the
`organization.
`
`The ability to create a distributed database has existed for over a decade. As
`you might expect, a variety of distributed database options exist (Bell and Crimson,
`1992). Figure 11-1 outlines the range of distributed database environments. These
`environments are briefly explained by the following:
`
`I. Homogeneous The same DBMS is used at each node.
`
`A. Autonomous Each DBMS works independently, passing messages back and
`forth to share data updates.
`
`B. Non-autonomous A central, or master, DBMS coordinates database access
`
`and update across the nodes.
`
`ll. Heterogeneous Potentially different DBMSs are used at each node.
`
`A. Systems Supports some or all of the functionality of one logical database.
`
`Apple, Exhibit 1026, Page 4
`
`Apple, Exhibit 1026, Page 4
`
`

`

`41 9
`Intro duction
`
`
`
`
`gr2
`
`
`
`
`Distributed database environments
`
`
`
`Homogeneous
`Heterogeneous
`
`Figure 11-1
`Distributed database
`
`environments (adapted from
`Bell and Grimson, 1992)
`
`Autonomous
`
`Non-Autonomous
`
`Gateways
`
`Systems
`/\'
`Full DBMS functionality
`Partial—Multidatabase
`/\
`Federated
`Unfederated
`
`/\
`Loose integration
`Tight integration
`
`ii
`
`1
`
`till
`
`1. Fall DBMS Functionality Supports all of the functionality of a distrib-
`uted database, as discussed in the remainder of this chapter.
`
`2. Partial—Mnltidatabase Supports some features of a distributed database,
`as discussed in the remainder of this chapter.
`
`a. Federated Supports local databases for unique data requests.
`
`1. Loose Integration Many schemas exist, for each local database, and
`each local DBMS must communicate with all local schemas.
`
`ii. Tight Integration One global schema exists that defines all the data
`across all local databases.
`
`b. Unfederated Requires all access to go through a central coordinating
`module.
`
`B. Gateways Simple paths are created to other databases, without the benefits
`of one logical database.
`
`A homogeneous distributed database environment is depicted in Figure 11-2.
`This environment is typically defined by the following characteristics (related to the
`non-autonomous category described above):
`
`' Data are distributed across all the nodes
`
`' The same DBMS is used at each location
`
`° All data are managed by the distributed DBMS (so there are no exclusively
`local data)
`
`' All users access the database through one global schema or database definition
`
`' The global schema is simply the union of all the local database schemas
`
`It is difficult in most organizations to force a homogeneous environment, yet hetero-
`geneous environments are much more difficult to manage.
`
`Apple, Exhibit 1026, Page 5
`
`Apple, Exhibit 1026, Page 5
`
`

`

`
`
`420
`
`Chapter 11 — Distributed Databases
`
`Figure 11-2
`Homogeneous distributed
`database environment
`(adapted from Bell and
`
`Global User
`
`Glohal User
`
`Grimson, 1992)
`
`
`Distributed
`DBMS
`
`
`
`Global
`Schema
`
`Figure 11-3
`Heterogeneous distributed
`database environment
`(adapted from Bell and
`Grimson, 1999)
`
`'
`
`ii.
`331
`
`
`
`As listed above, there are so many variations of heterogeneous distributed data-
`base environments. In the remainder of the chapter, however, a heterogeneous envi-
`ronment will be defined by the following characteristics (as depicted in Figure 11-3):
`
`0 Data are distributed across all the nodes.
`
`0 Different DBMSs may be used at each node.
`
`Local User
`
`Global User
`
`Local User
`
`
`
`Apple, Exhibit 1026, Page 6
`
`Apple, Exhibit 1026, Page 6
`
`

`

`
` Introduction 421
`
`
`
`using only the local DBMS and schema.
`
`o A global schema exists, which allows local users to access remote data.
`
`objectives and Trade-offs
`
` 0 Some users require only local access to databases, which can be accomplished by
`
`A major objective of distributed databases is to provide ease of access to data for users
`at many different locations. To meet this objective, the distributed database system
`must provide what is called location transparency, which means that a user (or user
`program) using data for querying or updating need not know the location of the data.
`Any request to retrieve or update data from any site is automatically forwarded by
`the system to the site or sites related to the processing request. Ideally, the user is
`unaware of the distribution of data, and all data in the network appear as a single log-
`ical database stored at one site. In this ideal case, a single query can join data from
`tables in multiple sites as if the data were all in one site.
`A second objective of distributed databases is local autonomy, which is the capa-
`bility to administer a local database and to operate independently when connections
`to other nodes have failed (Date, 1995). With local autonomy, each site has the capa-
`bility to control local data, administer security, log transactions, and recover when
`local failures occur, and provide full access to local data to local users when any cen—
`tral or coordinating site cannot operate. In this case, data are locally owned and man—
`aged, even though they are accessible from remote sites. This implies that there is no
`reliance on a central site.
`
`A significant trade-off in designing a distributed database environment is
`whether to use synchronous or asynchronous distributed technology. With a syn-
`chronous distributed database technology, all data across the network are continu-
`ously kept up-to—date so that a user at any site can access data anywhere on the
`network at any time and get the same answer. With synchronous technology, if any
`copy of a data item is updated anywhere on the network, the same update is imme-
`diately applied to all other copies or aborted. Synchronous technology ensures data
`integrity and minimizes the complexity of knowing where the most recent copy of
`data are located. Synchronous technology can result in unsatisfactorily slow response
`time since the distributed DBMS is spending considerable time checking that an
`update is accurately and completely propagated across the network.
`Asynchronous distributed database technology keeps copies of replicated data
`at different nodes so that local servers can access data without reaching out across the
`network. With asynchronous technology, there is usually some delay in propagating
`data updates across the remote databases, so some degree of at least temporary
`inconsistency is tolerated. Asynchronous technology tends to have acceptable
`response time since updates happen locally and data replicas are synchronized in
`batches and predetermined intervals, but may be more complex to plan and design to
`ensure exactly the right level of data integrity and consistency across the nodes.
`Compared to centralized databases, there are numerous advantages to either
`form of a distributed database. The most important are the following:
`
`0
`
`Increased reliability and availability When a centralized system fails, the database
`is unavailable to all users. A distributed system will continue to function at some
`reduced level, however, even when a component fails. The reliability and
`availability will depend (among other things) on how the data are distributed
`(discussed in the following sections).
`
`Location transparency: A
`design goal for a distributed
`database, which says that a
`user (or user program) using
`data need not know the
`location of the data.
`
`Local autonomy: A design
`goal for a distributed
`database, which says that a
`site can independently
`administer and operate its
`database when connections to
`other nodes have failed.
`
`Synchronous distributed
`database: A form of
`distributed database
`
`technology in which all data
`across the network are
`
`continuously kept up—to-date
`so that a user at any site can
`access data anywhere on the
`netwOrk at any time and get
`the same answer.
`
`Asynchronous distributed
`database: A form of
`distributed database
`
`technology in which copies of
`replicated data are kept at
`different nodes so that local
`servers can access data
`
`without reaching out across
`the network.
`
`Apple, Exhibit 1026, Page 7
`
`Apple, Exhibit 1026, Page 7
`
`

`

`Chapter 11 —— Distributed Databases
`
`422
`
`0 Local control Distributing the data encourages local groups to exercise greater
`control over “their” data, which promotes improved data integrity and adminis_
`tration. At the same time, users can access nonlocal data when necessary.
`Hardware can be chosen for the local site to match the local, not global, data
`processing work.
`
`0 Modular growth Suppose that an organization expands to a new location or
`adds a new work group. It is often easier and more economical to add a local
`computer and its associated data to the distributed network than to expand a
`large central computer. Also, there is less chance of disruption to existing users
`than is the case when a central computer system is modified or expanded.
`
`0 Lower communication costs With a distributed system, data can be located closer
`to their point of use. This can reduce communication costs, compared to a central
`system.
`
`0 Faster response Depending on how data are distributed, most requests for data
`by users at a particular site can be satisfied by data stored at that site. This
`speeds up query processing since communication and central computer delays
`are minimized. It may also be possible to split complex queries into subqueries
`that can be processed in parallel at several sites, providing even faster response.
`
`A distributed database system also faces certain costs and disadvantages:
`
`0 Software cost and complexity More complex software (especially the DBMS) is
`required for a distributed database environment. We discuss this software later
`in the chapter.
`
`0 Processing overhead The various sites must exchange messages and perform
`additional calculations to ensure proper coordination among data at the different
`the sites.
`
`0 Data integrity A by-product of the increased complexity and need for coordina-
`tion is the additional exposure to improper updating and other problems of data
`integrity.
`
`If the data are not distributed properly according to their usage,
`0 Slow response
`or if queries are not formulated correctly, response to requests for data can be
`extremely slow. These issues are discussed later in the chapter.
`
`OPTIONS FOR DISTRIBUTING A DATABASE
`
`How should a database be distributed among the sites (or nodes) of a network? We
`discussed this important issue of physical database design in Chapter 7, which intro-
`duced an analytical procedure for evaluating alternative distribution strategies. In
`that chapter we noted that there are four basic strategies for distributing databases:
`
`1. Data replication
`
`2. Horizontal partitioning
`3. Vertical partitioning
`4. Combinations of the above
`
`I
`
`Apple, Exhibit 1026, Page 8
`
`Apple, Exhibit 1026, Page 8
`
`

`

`Figure 1 1-4
`
`Acct_Number
`
`Customer_Name
`
`
`
` Options for Distributing a Database 423
`
`Branch_Name
`
`Balance
`
`Customer relation fora bank
`
`
`
`
`
`
`
`
`
`200
`324
`153
`426
`
`500
`683
`252
`
`Jones
`Smith
`Gray
`Dorman
`
`Green
`McIntyre
`Elmore
`
`Lakeview
`Valley
`Valley
`Lakeview
`
`Valley
`Lakeview
`Lakeview
`
`1000
`250
`38'
`796
`
`1 68
`1500
`330
`
`we will explain and illustrate each of these approaches using relational databases.
`The same concepts apply (with some variations) for other data models, such as hier-
`
`archical and network.
`suppose that a bank has numerous branches located throughout a state. One of
`the base relations in the bank’s database is the Customer relation. Figure 11-4 shows
`the format for an abbreviated version of this relation. For simplicity, the sample data
`in the relation apply to only two of the branches (Lakeview and Valley). The primary
`key in this relation is account number (Acct_Number). Branch_Name is the name of
`the branch where customers have opened their accounts (and therefore where they
`presumably perform most of their transactions).
`
`Data Replication
`
`An increasingly popular option for data distribution is to store a separate copy of the
`database at each of two or more sites. Replication may allow an IS organization to
`move a database off a centralized mainframe onto less expensive, departmental or
`location-specific servers, close to end users (Koop, 1995). Replication may use either
`synchronous or asynchronous distributed database technologies, although asyn-
`chronous technologies are more typical in a replicated environment. The Customer
`relation in Figure 11—4 could be stored at Lakeview or Valley, for example. If a copy is
`stored at every site, we have the case of full replication (which may be impractical
`except for only relatively small databases).
`There are five advantages to data replication:
`
`If one of the sites containing the relation (or database) fails, a copy
`1. Reliability
`can always be found at another site without network traffic delays. Also,
`available copies can all be updated as soon as possible as transactions occur,
`and unavailable nodes will be updated once they return to service.
`
`2. Fast response Each site that has a full copy can process queries locally, so
`queries can be processed rapidly.
`
`3. Possible avoidance of complicated distributed transaction integrity routines
`Replicated databases are usually refreshed at scheduled intervals, so most
`forms of replication are used when some relaxing of synchronization across
`database copies is acceptable.
`
`,
`
`4- Node decoupling Each transaction may proceed without coordination across
`the network. Thus, if nodes are down, busy, or disconnected (e.g., in the case of
`mobile personal computers), a transaction is handled when the user desires. In
`
`l
`
`l
`
`I
`
`“l
`
`‘
`
`l
`
`
`
`Apple, Exhibit 1026, Page 9
`
`Apple, Exhibit 1026, Page 9
`
`

`

`I
`
`3"
`l
`
`i
`
`l
`‘
`
`
`
`424 Chapter 11 —— Distributed Databases
`
`
`
`the place of real-time synchronization of updates, a behind-the—scenes process
`coordinates all data copies.
`
`ia
`.‘
`
`5. Reduced network truflic at prime time Often updating data happens during prime
`business hours, when network traffic is highest and the demands for rapid
`response greatest. Replication, with delayed updating of copies of data, moves
`network traffic for sending updates to other nodes to non—prime time hours.
`
`Replication has two primary disadvantages:
`
`1. Storage requirements Each site that has a full copy must have the same storage
`capacity that would be required if the data were stored centrally. Each copy
`requires storage space (the cost for which is constantly decreasing), and
`processing time is required to update each copy on each node.
`
`2. Complexity and cost of updating Whenever a relation is updated, it must
`(eventually) be updated at each site that holds a copy. Synchronizing updating
`near real time can require careful coordination, as will be clear later under the
`topic of commit protocol.
`
`For these reasons, data replication is favored where most process requests are
`read-only and where the data are relatively static, as in catalogs, telephone directo-
`ries, train schedules, and so on. CD-ROM and DVD storage technology has promise
`as an economical medium for replicated databases. Replication is used for "noncol-
`laborative data,” where one location does not need a real-time update of data main-
`tained by other locations (The, 1994). In these applications, data eventually need to be
`synchronized, as quickly as is practical. Replication is not a viable approach for
`online applications such as airline reservations, automated teller machine transac-
`tions, and other financial activities—applications for which each user wants data
`about the same, nonsharable resource.
`
`Snapshot Replication Different schemes exist for updating data copies. Some appli-
`cations, like those for decision support and data warehousing or mining—which do
`not require current data—are supported by simplextable copying or periodic snap-
`shots. This might work as follows, assuming multiple sites are updating the same
`data. First, updates from all replicated sites are periodically collected at a master or
`primary site, where all the updates are made to form a consolidated record of all
`changes. With some distributed DBMSs, this list of changes is collected in a snapshot
`log, which is a table of row identifiers for the records to go into the snapshot. Then a
`read-only snapshot of the replicated portion of the database is taken at the master
`site. Finally, the snapshot is sent to each site where there is a copy (it is often said that
`these other sites ”subscribe” to the data owned at the primary site). This is called a
`full refresh of the database (Edelstein, 1995a). Alternatively, only those pages that
`have changed since the last snapshot can be sent, which is called a differential refresh.
`In this case, a snapshot log for each replicated table is joined with the associated
`base to form the set of changed rows to be sent to the replicated sites.
`Some forms of replication management allow dynamic ownership of data, in
`which the right to update replicated data moves from site to site, but at any point in
`time, only one site owns the right. Dynamic ownership would be appropriate as
`business activities move across time zones, or where the processing of data folloWS a
`work flow across business units supported by different database servers.
`A final form of replication management allows shared ownership of data. Shared
`updates introduces significant issues for managing update conflicts across sites. For
`
`3
`
`Apple, Exhibit 1026, Page 10
`
`Apple, Exhibit 1026, Page 10
`
`

`

`
`
`
`
`Options for Distributing a Database 425
`
`examp16, what if tellers at two bank branches try to update a customer’s address at
`the same time? Asynchronous technology will allow conflicts to exist temporarily.
`This may be fine as long as the updates are not critical to business operations, and -
`such conflicts can be detected and resolved before real business problems arise.
`The cost to perform a snapshot refresh may depend on whether the snapshot is
`sjmple or complex. A simple snapshot is one that references all or a portion of only
`one table. A complex snapshot involves multiple tables, usually from transactions
`that involve joins (such as the entry of a customer order and associated line items).
`With some distributed DBMSs, a simple snapshot can be handled by a differential
`refresh whereas complex snapshots require more time—consuming full refreshes.
`some distributed DBMSs support only simple snapshots.
`
`Near Real-Time Replication For near real-time requirements, store and forward
`messages for each completed transaction can be broadcast across the network inform-
`ing all nodes to update data as soon as is possible, without forcing a confirmation to
`the originating node (as is the case with a coordinated commit protocol, discussed
`below) before the database at the originating node is updated (Schussel, 1994). One
`way to generate such messages is by using triggers (discussed in Chapter 9). A trigger
`can be stored at each local database so that when a piece of replicated data is updat-
`ed, the trigger executes corresponding update commands against remote database
`replicas (Edelstein, 1993). With the use of triggers, each database update event can be
`handled individually and transparently to programs and users. If network connec-
`tions to a node are down or the node is busy, these messages informing the node to
`update its database are held in a queue to be processed when possible.
`
`Pull Replication The schemes explained above for synchronizing replicas are all
`examples of push strategies. Pull strategies also exist. In a pull strategy, the target, not
`the source node, controls when a local database is updated. With pull strategies, the
`local database determines when it needs to be refreshed, and requests a snapshot or
`the emptying of an update message queue. Pull strategies have the advantage that
`the local site controls when it needs and can handle updates. Thus, synchronization is
`less disruptive and occurs only when needed by each site, not when a central master
`site thinks it is best to update.
`
`Database Integrity with Replication For both periodic and near real-time replica-
`tion, consistency across the distributed, replicated database is compromised. Whether
`delayed or near real-time, the DBMS managing replicated databases still must ensure
`the integrity of the database. Decision support applications permit synchronization
`on a table-by-table basis, whereas near real-time applications require transaction~by-
`transaction synéhronization. But in both cases, the DBMS must ensure that copies are
`synchronized per application requirements.
`The difficulty of handling updates with a replicated database also depends on
`the number of nodes at which updates may occur (Froemming, 1996). In a single-
`updater environment, updates will usually be handled by periodically sending read-
`only database snapshots of updated database segments to the nonupdater nodes. In
`this case the effects of multiple updates are effectively batched for the read-only sites.
`This would be the situation for product catalogs, price lists, and other reference data
`for a mobile sales force. In a multiple-updater environment, the most obvious issue is
`data collisions. Data collisions arise when the independently operating updating
`nodes are each attempting to update the same data at the same time. In this case, the
`DBMS must include mechanisms to detect and handle data collisions. For example,
`
`
`
`Apple, Exhibit 1026, Page 11
`
`Apple, Exhibit 1026, Page 11
`
`

`

`
`
`426
`
`Chapter 11 — Distributed Databases
`
`
`
`the DBMS must decide if processing at nodes in conflict should be suspended Until
`the collision is resolved.
`
`When to Use Replication Whether replication is a viable alternative design for a
`distributed database depends on several factors (Froemming, 1996):
`
`1. Data timeliness Applications that can tolerate out-of—date data (whether this be
`for a few seconds or a few hours) are better candidates for replication.
`
`2. DBMS capabilities An important DBMS capability is whether it will support a
`query that references data from more than one node. If not, then replication is a
`better candidate than the partitioning schemes, which are discussed in the
`following sections.
`
`3. Performance implications Replication means that each node is periodically
`refreshed. When this refreshing occurs, the distributed node may be very busy
`handling a large volume of updates. If the refreshing occurs by event triggers
`(for example, when a certain volume of changes accumulate), refreshing could
`occur at a time when the remote node is busy doing local work.
`
`4. Heterogeneity in the network Replication can be complicated if different nodes
`use different operating systems and DBMSs, or, more commonly, use different
`database designs. Mapping changes from one site to n other sites could mean n
`different routines to translate the changes from the originating node into the
`scheme for processing at the other nodes.
`
`5. Communications network capabilities Transmission speeds and capacity in a
`data communications network may prohibit frequent, complete refreshing of
`very large tables. Replication does not require a dedicated communications
`connection, however, so less expensive, shared networks could be used for
`database snapshot transmissions.
`
`Horizontal Partitioning
`
`With horizontal partitioning (see Chapter 7 for a description of different forms of table
`partitioning), some of the rows of a table (or relation) are put into a base relation at
`one site, and other rows are put into a base relation at another site. More generally,
`the rows of a relation are distributed to many sites.
`Figure 11-5 shows the result of taking horizontal partitions of the Customer rela-
`tion. Each row is now located at its home branch. If customers actually conduct most
`of their transactions at the home branch, the transactions are processed locally and
`response times are minimized. When a customer initiates a transaction at another
`branch, the transaction must be transmitted to the home branch for processing and
`the response transmitted back to the initiating branch (this is the normal pattern f01'
`persons using automated teller machines, or ATMs). If a customer’s usage pattern
`changes (perhaps because of a move), the system may be able to detect this Change
`and dynamically move the record to the location where most transactions are being
`initiated. In summary, horizontal partitions for a distributed database have four
`major advantages:
`
`1. Eficiency Data are stored close to where they are used and separate from
`other data used by other users or applications.
`
`2. Local optimization Data can be stored to Optimize performance for local access-
`
`Apple, Exhibit 1026, Page 12
`
`Apple, Exhibit 1026, Page 12
`
`

`

`
`
`
`
`Options for Distributing a Database 427
`
`Acct_Number
`
`Customer_Name
`
`Branch_Name
`
`Balance
`
`200
`426
`683
`252
`
`‘
`
`Jones
`Dorman
`McIntyre
`Elinore
`
`Lakeview
`Lakeview
`Lakeview '
`Lakeview L
`
`1000
`i 796
`1500
`330
`
`..
`
`_
`
`‘
`
`_
`
`’
`
`Figure 11-5
`Horizontal partitions
`
`(a) Lakevicw Branch
`
`i
`
`’
`
`1
`
`Acct_Number
`324
`153
`500
`
`,
`
`CustOmer_Name
`Smith
`* Gray
`_ Green
`
`.
`
`,
`
`. Branch_Narne
`Valley
`‘
`L
`Valley I
`Valley
`
`_
`
`L
`
`_
`
`.
`
`(b) Valley Branch
`
`Balance
`‘ 2501
`'38,
`168‘
`
`'
`
`'
`
`3. Security Data not relevant to usage at a particular site are not made available.
`
`4. Ease of querying Combining data across horizontal partitions is easy since
`rows are simply merged by unions across the partitions.
`
`Thus, horizontal partitions are usually used when an organizational function is dis-
`tributed, but each site is concerned with only a subset of the entity instances (fre-
`quently based on geography).
`Horizontal partitions also have two primary disadvantages:
`
`1. Inconsistent access speed When data from several partitions are required, the
`access time can be significantly different from local-only data access.
`
`2. Backup vulnerability Since data are not replicated, when data at one site
`become inaccessible or damaged, usage cannot switch to another site where a
`copy exists; data may be lost if proper backup is not performed at each site.
`
`Vertical Partitioning
`
`With the vertical partitioning approach (again, see Chapter 7), some of the columns of
`a relation are projected into a base relation at one of the sites, and other columns are
`projected into a base relation at another site (more generally, columns may be pro-
`jected to several sites). The relations at each of the sites must share a common
`domain, so that the original table can be reconstructed.
`To illustrate vertical partitioning, we use an application for the manufacturing
`company shown in Figure 11-6. Figure 11—7 shows the Part relation with Part_Number
`as the primary key. Some of these data are used primarily by manufacturing, while
`Others are used mostly by engineering. The data are distributed to the respective
`departmental computers using vertical partitioni

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket