`
`DATABASE
`
`MANAGEM:ENT
`
`Fifth Edition
`
`Fred R. McFadden
`jeffrey A. Hoffer ....
`Mary B . ... ·
`
`Petitioners' Exhibit 1026, Page11
`
`
`
`•
`
`In memory of my valued colleague Daniel Couger. -
`
`F. R. M.
`
`• To Patty, for her sacrifices, encouragement, and support.
`To my students, for being receptive and critical, and challenging me to
`J. A. H.
`be a better teacher. -
`
`• To Larry, Mike, and Ivan. Their love and support provide a foundation
`to my life which makes efforts such as writing this book possible. And
`to Jeff and Fred, who gave me the opportunity to write with them and
`patiently provided invaluable guidance alon g the way. - M. B. P.
`
`Executive Editor: Michael Roche
`Developmental Editor: Maureen Allaire Spada
`Assistant Editor: Ruth Berry
`Edjtorial Assistant Adam Hamel
`Senior Marketing Manager: Tom Ziolkowski
`Senior Marketing Coordinator: Deanna Storey
`Senior Production Supervisor: Patty Mahtani
`Manufacturing Buyer: Sheila Spinney
`Design Director: Regina Hagen
`Cover Designer: Linda Manly Wade
`Text Designer: Sandra Rigney
`Composition and Project Coordination: Elm Street Publishing Services, Inc.
`Cover Image: (i) Mick Tare! for Artville
`
`Library of Congress Cataloging-i n-Publication Data
`
`Mc.Fadden, .Fred R., 1933-
`Modern database management I .Fred R. McFadden, Jeffrey A. Hoffer,
`Mary B. Prescott. - 5th ed.
`em.
`p.
`Includes bibliographical references and index.
`ISBN 0-8053-6054-9
`l. Database management.
`III. Title.
`QA76.9.D3M395 1999
`005.74-dc21
`Reprinted with corrections, M ay 1999
`Copyright© 1999 by Addison-Wesley Educational Publishers, Inc.
`
`I. Hoffer, Jeffrey A.
`
`U. Prescott, Mary B.
`
`98-4120
`CIP
`
`All rights reserved. No part of this publication may be reproduced, stored in a retrieva) system,
`or transmitted, in any form or by any means, electronic, mechanicaJ, photocopying, recording, or
`otherwise, without the prior written permission of the publisher. Printed in the United States of
`America.
`
`5 6 7 8 9 10 WCT 02 01 00
`
`Petitioners' Exhibit 1026, Page 2
`
`
`
`CHAPTER
`
`Distributed Databases
`
`II I
`
`..
`.
`. . . ~ :. . . . "'·
`' -
`i ' s~~iithe ..
`...
`asynchronciuS data
`. ·r.~plicaclon and p~utitionif~g as: tt:iree major approacheS for distributed
`database cksign.
`· ·
`·
`
`...... • .... .
`
`• Outline the steps involved in processing a query in a distributed database
`and several approaches used to optimize distributed query processing.
`
`•
`
`Explain the salient features of several distributed database management
`systems.
`
`INTRODUCTION
`
`When an organization is geographically dispersed, it may choose to store its data(cid:173)
`bases on a central computer or to distribute them to local computers (or a combina(cid:173)
`tion of both). A distributed database is a single logical database that is spread
`
`Distributed database: A
`single logical database that is
`spread physically across
`computers in multiple
`locations that are connected by
`a data communications link
`
`417
`
`Petitioners' Exhibit 1026, Pagel 3
`
`
`
`J#M41 ·' ·r '·
`
`J
`
`!
`
`.
`
`418
`
`Chapter 11 - Distributed Databases
`
`Decentralized database:
`A database that is stored on
`computers at multiple
`locations; however. the
`computers are not inter(cid:173)
`connected by a network, so
`that users at the various sites
`cannot share data.
`
`physically across computers in multiple locations that are connected by a data com(cid:173)
`munications network. We emphasize that a distributed database is truly a database
`not a loose collection of files. The distributed database is still centrally administered
`as a corporate resource while providing local flexibility and customization. The net(cid:173)
`work must allow the users to share the data; thus a user (or program) at location A
`must be able to access (and perhaps update) data at location B. The sites of a distrib(cid:173)
`uted system may be spread over a large area (such as the United States or the world),
`or over a small area (such as a building or campus). The computers may range from
`microcomputers to large-scale computers or even supercomputers.
`A distributed database requires multiple database management systems, run(cid:173)
`ning at each remote site. The degree to which these different DBMSs cooperate, or
`work in partnership, and whether there is a master site that coordinates requests
`involving data from multiple sites distinguish different types of distributed data(cid:173)
`base environments.
`It is important to distinguish between distributed and decentralized databases. A
`decentralized database is also stored on computers at multiple locations; however,
`the computers are not interconnected by a network, so that users at the various sites
`cannot share data. Thus a decentralized database is best regarded as a collection of
`independent d atabases, rather than having the geographical distribution of a single
`database.
`Various business conditions encourage the use of distributed databases:
`
`• Distribution and autonomy of business units Divisions, departments, and facilities
`in modem organizations are often geographically (and possibly internationally)
`distributed. Often each unit has the authority to create its own information
`systems, and often these units want local data over which they can have controls.
`
`• Data sharing Even moderately complex business decisions require sharing data
`across business units, so it must be convenient to consolidate data across local
`databases on demand.
`
`• Data communications costs and reliability The cost to ship large quantities of
`data across a communications network or to handle a large volume of transac(cid:173)
`tions from remote sources can be high. It is often more economical to locate
`data and applications close to where they are needed. Also, dependence on
`data communications can be risky, so keeping local copies or fragments of data
`can be a reliable way to support the need for rapid access to data across the
`organization.
`
`The ability to create a distributed database has existed for over <! decade. As
`you might expect, a variety of distributed database options exist (Bell and Crimson,
`1992). Figure 11-1 outlines the range of distributed database environments. These
`environments are briefly explained by the following:
`
`I. Homogeneous The same DBMS is used at each node.
`
`A. Autonomous Each DBMS works independently, passing messages back and
`forth to share data updates.
`
`B. Non-autonomous A central, or master, DBMS coordinates database access
`and update across the nodes.
`
`II. Heterogeneous Potentially different DBMSs are used at each node.
`
`A. Systems Supports some or all of the functionality of one logical database.
`
`Petitioners' Exhibit 1026, Page 4
`
`
`
`Distributed database environments
`
`Introduction
`
`419
`
`Figure 11·1
`Distributed database
`environments (adapted from
`Bell and Grimson, 1992)
`
`l
`
`)
`
`f.
`'
`
`Homogeneous
`
`/ "
`
`Autonomous
`
`Non-Autonomous
`
`Heterogeneous
`
`Systems
`
`GatewaY-s
`
`/ "
`/ "
`""'
`
`Full DBMS functionality
`
`Partiai-Multidatabase
`·"-:·
`/
`Federated
`Unfederated
`
`/
`loose integration
`
`Tight integration
`
`1. Full DBlv1S Functionality Supports all of the functionality of a distrib(cid:173)
`uted database, as discussed in the remainder of this chapter.
`
`2. Partiai-Multidatabase Supports some features of a distributed database,
`as discussed in the remainder of this chapter.
`
`a. Federated Supports local databases for unique data requests.
`
`i. Loose Integration Many schemas exist, for each local database, and
`each local DBMS must communicate with all local schemas.
`
`ii. Tight Integration One global schema exists that defines all the data
`across all local databases.
`
`b. Unfederated Requires all access to go through a central coordinating
`module.
`
`B. Gateways Simple paths are created to other databases, without the benefits
`of one logical database.
`
`A homogeneous distributed database environment is depicted in Figure 11-2.
`This environment is typically defined by the following characteristics (related to the
`non-autonomous category described above):
`
`• Data are distributed across all the nodes
`
`• The same DBMS is used at each location
`
`• All data are managed by the distributed DBMS (so there are no exclusively
`local data)
`
`• All users access the database through one global schema or database definition
`
`• The global schema is simply the union of all the local database schemas
`
`It is difficult in most organizations to force a homogeneous envirorunent, yet hetero(cid:173)
`geneous environments are much more difficult to manage.
`
`' .l
`
`II l 1111
`
`~·
`I
`!
`
`Petitioners' Exhibit 1026, Pagel s
`
`
`
`4!20
`
`Chapter 11 - Distributed Databases
`
`Figure 11·t
`Homogeneous distributed
`database environment
`(adapted from Bell and
`Grimson, 1992)
`
`Global User
`
`Global User
`
`Global
`Schema
`
`Node:
`
`DBMS
`Sonware
`
`DBMS
`Software
`
`DBMS
`Software
`
`....
`
`n
`
`DBMS
`Software
`
`t ·
`
`As listed above, there are so many variations of heterogeneous distributed data(cid:173)
`base environments. In the remainder of the chapter, however, a heterogeneous envi(cid:173)
`ronment will be defined by the following characteristics (as depicted in Figure 11-3):
`
`• Data are distributed across all the nodes.
`
`• Different DBMSs may be used at each node.
`
`Figure 11·3
`Heterogeneous distributed
`database environment
`(adapted from Bell and
`Grimson, 1992)
`
`Local User
`
`Global User
`
`Local User
`
`Distributed
`DBMS
`
`Glo~al
`
`DBMS-1
`
`DBM$-2
`
`Petitioners' Exhibit 1026, Pagel s
`
`
`
`I I
`
`I
`
`11111
`
`Introduction
`
`421
`
`Location transparency: A
`design goal for a distributed
`database, which says that a
`user (or user program) using
`data need not know the
`location of the data.
`
`Local autonomy: A design
`goal for a distributed
`database, which says that a
`site can independently
`administer and operate its
`database when connections to
`other nodes have failed.
`
`Synchronous distributed
`database: A form of
`distributed database
`technology in which all data
`across the network are
`continuously kept up-to-date
`so that a user at any site can
`access data anj'\vhere on the
`network at any time and get
`the same answer.
`
`Asynchronous distributed
`database: A form of
`distributed database
`technology in which copies of
`replicated data are kept at
`different nodes so that local
`servers can access data
`without reaching out across
`the network.
`
`• Some users require only local access to databases, which can be accomplished by
`using only the local DBMS and schema.
`
`• A global schema exists, which allows local users to access remote data.
`
`Objectives and Tradc-offs
`
`A major objective of distributed databases is to provide ease of access to data for users
`at many different locations. To meet this objective, the distributed database system
`must provide what is called location transparency, which means that a user (or user
`program) using data for querying or updating need not know the location of the data.
`Any request to retrieve or update data from any site is automatically forwarded by
`the system to the site or sites related to the processing request. Ideally, the user is
`unaware of the distribution of data, and all data in the network appear as a single log(cid:173)
`ical database stored at one site. In this ideal case, a single query can join data from
`tables in multiple sites as if the data were all in one site .
`A second objective of distributed databases is local autonomy, which is the capa(cid:173)
`bility to administer a local database and to operate independently when connections
`to other nodes have failed (Date, 1995). With local autonomy, each site has the capa(cid:173)
`bility to control local data, administer security, log transactions, and recover when
`local failures occur, and provide full access to local data to local users when any cen(cid:173)
`tral or coordinating site cannot operate. In this case, data are locally owned and man(cid:173)
`aged, even though they are accessible from remote sites. This implies that there is no
`reliance on a central site.
`A significant trade-off in designing a d istributed database environment is
`whether to use synchronous or asynchronous distributed technology. With a syn(cid:173)
`chronous distributed database technology, all data across the network are continu(cid:173)
`ously kept up-to-date so that a user at any site can access data anywhere on the
`network at any time and get the same answer. With synchronous technology, if any
`copy of a data item is updated anywhere on the network, the same update is imme(cid:173)
`diately applied to all other copies or aborted. Synchronous technology ensures data
`integrity and minimizes the complexity of knowing where the most recent copy of
`data are located. Synchronous technology can result in unsatisfactorily slow response
`time since the distributed DBMS is spending considerable time checking that an
`update is accurately and completely propagated across the ne~ork.
`Asynchronous distributed database technology keeps copies of replicated data
`at different nodes so that local servers can access data without reaching out across the
`network. With asynchronous technology, there is usually some delay in propagating
`data updates across the remote databases, so some degree of at least temporary
`inconsistency is tolerated. Asynchronous technology tends to have acceptable
`response time since updates happen locally and data replicas are synchronized in
`batches and predetemtined intervals, but may be more complex to plan and design to
`ensure exactly the right level of data integrity and consistency across the nodes.
`Compared to centralized databases, there are numerous advantages to either
`form of a distributed database. The most important are the following:
`
`• Increased reliability and availability When a centralized system fails, the database
`is unavailable to all users. A distributed system will continue to function at some
`reduced level, however, even when a component fails. The reliability and
`availability will depend (among other things) on how the data are distributed
`(discussed in the following sections).
`
`Petitioners' Exhibit 1026, Pagel ?
`
`
`
`4 \II
`
`I
`
`,
`
`1
`
`Chapter 11- Distributed Databases
`
`• Local control Distributing the data encourages local groups to exercise greater
`control over "their" data, which promotes improved data integrity and adminis(cid:173)
`tration. At the same time, users can access nonlocal data when necessary.
`Hardware can be chosen for the local site to match the local, not global, data
`processing work.
`
`• Modular growth Suppose that an organization expands to a new location or
`adds a new work group. It is often easier and more economical to add a local
`computer and its associated data to the distributed network than to expand a
`large central computer. Also, there is less chance of disruption to existing users
`than is the case when a central computer system is modified or expanded.
`
`• Lower communication costs With a distributed system, data can be located closer
`to their point of use. This can reduce communication costs, compared to a central
`system.
`• Faster response Depending on how data are distributed, most requests for data
`by users at a particular site can be satisfied by data stored at that site. 1bis
`speeds up query processing since communication and central computer delays
`are minimized. It may also be possible to split complex queries into subqueries
`that can be processed in parallel at several sites, providing even faster response.
`
`A distributed database system also faces certain costs and disadvantages:
`
`• Software cost and complexity More complex software (especially the DBMS) is
`required for a distributed database environment. We discuss this software later
`in the chapter.
`
`• Processing overhead The various sites must exchange messages and perform
`additional calculations to ensure proper coordination among data at the different
`the sites.
`
`• Data integrity A by-product of the increased complexity and need for coordina(cid:173)
`tion is the additional exposure to improper updating and other problems of data
`integrity.
`
`• Slow response
`If the data are not distributed properly according to their usage,
`or if queries are not formulated correctly, response to requests for data can be
`extremely slow. These issues are discussed later in the chapter.
`
`OPTIONS FOR DISTRIBUTING A DATABASE
`
`How should a database be distributed among the sites (or nodes) of a network? We
`discussed this important issue of physical database design in Chapter 7, which intro(cid:173)
`duced an analytical procedure for evaluating alternative distribution strategies. In
`that chapter we noted that there are four basic strategies for distributing databases:
`
`1. Data replication
`
`2. Horizontal partitioning
`
`3. Vertical partitioning
`
`4. Combinations of the above
`
`Petitioners' Exhibit 1026, Pagel a
`
`
`
`I i I ; !. ; ~
`
`I
`
`~
`
`Acct_Number
`
`Customer_Name
`
`Branch_Name
`
`Balance
`
`Fisurc 11·4
`Customer relation for a bank
`
`Options for Distributing a Database
`
`423
`
`200
`324
`153
`426
`500
`683
`252
`
`Jones
`Smith
`Gray
`Dorman
`Green
`Mcintyre
`Elmore
`
`Lakeview
`Valley
`Valley
`Lakeview
`Valley
`Lakeview
`Lakeview
`
`1000
`250
`38
`796
`168
`1500
`330
`
`We will explain and illustrate each of these approaches using relational databases.
`The same concepts apply (with some variations) for other data models, such ashier·
`archical and network
`Suppose that a bank has numerous branches located tluoughout a state. One of
`the base relations in the bank's database is the Customer relation. Figure 11-4 shows
`the format for an abbreviated version of this relation. For simplicity, the sample data
`in the relation apply to only two of the branches (Lakeview and Valley). The primary
`key in this relation is account number (Acct_Number). Branch_Name is the name of
`the branch where customers have opened their accounts (and therefore where they
`presumably perform most of their transactions).
`
`Data Replication
`
`An increasingly popular option for data distribution is to store a separate copy of the
`database at each of two or more sites. Replication may allow an IS organization to
`move a database off a centralized mainframe onto less expensive, departmental or
`location-specific servers, dose to end users (Koop, 1995). Replication may use either
`synchronous or asynchronous distributed database technologies, although asyn(cid:173)
`chronous technologies are more typical in a replicated environment. The Customer
`relation in Figure 11-4 could be stored at Lakeview or Valle~ for example. If a copy is
`stored at every site, we have the case of full replication (which may be impractical
`except for only relatively small databases).
`There are five advantages to data replication:
`
`If one of the sites containing the relation (or database) fails, a copy
`1. Reliability
`can always be found at another site without network traffic delays. Also,
`available copies can all be updated as soon as possible as transactions occur,
`and unavailable nodes will be updated once they return to service.
`
`2. Fast response Each site that has a full copy can process queries locally, so
`queries can be processed rapidly.
`
`,
`3. Possible avoidance of complicated distributed transaction integrity routines
`Replicated databases are usually refreshed at scheduled intervals, so most
`forms of replication are used when some relaxing of synchronization across
`database copies is acceptable.
`
`4. Node decoupling Each transaction may proceed without coordination across
`the network. Thus, if nodes are down, busy, or disconnected (e.g., in the case of
`mobile personal computers}, a transaction is handled when the user desires. In
`
`II I IIIII
`
`i
`!
`I
`I
`I
`' I
`t r.~~
`
`{·~.,~. : /.
`
`·J~
`.J'
`;' l·
`i
`. i
`'
`
`b
`
`Petitioners' Exhibit 1026, Pagej g
`
`
`
`I
`
`I r
`
`'· i
`
`I .
`
`424
`
`Chapter 11- Distributed Databases
`
`the place of real-time synchronization of updates, a behind-the-scenes process
`coordinates all data copies.
`
`5. Reduced network traffic at prime time Often updating data happens during Prime
`business hours, when network traffic is highest and the demands for rapid
`response greatest. Replication, with delayed updating of copies of data, moves
`network traffic for sending updates to other nodes to non-prime time hours.
`
`Replication has two primary disadvantages:
`
`1. Storage requirements Each site that has a full copy must have the same storage
`capacity that would be required if the data were stored centrally. Each copy
`requires storage space (the cost for which is constantly decreasing), and
`processing time is required to update each copy on each node.
`
`2. Complexity and cost of updating Whenever a relation is updated, it must
`(eventually) be updated at each site that holds a copy. Synchronizing updating
`near real time can require careful coordination, as will be clear later under the
`topic of commit protocol.
`
`For these reasons, data replication is favored where most process requests are
`read-only and where the data are relatively static, as in catalogs, telephone directo(cid:173)
`ries, train schedules, and so on. CD-ROM and DVD storage technology has promise
`as an economical medium for replicated databases. Replication is used for "noncol(cid:173)
`laborative data," where one location does not need a real-time update of data main(cid:173)
`tained by other locations (The, 1994).ln these applications, data eventually need to be
`synchronized, as quickly as is practical Replication is not a viable approach for
`online applications such as airline reservations, automated teller machine transac(cid:173)
`tions, and other financial activities-applications for which each user wants data
`about the same, nonsharable resource.
`
`Snapshot Replication Different schemes exist for updating data copies. Some appli(cid:173)
`cations, like those for decision support and data warehousing or mining-which do
`not require current data-are supported by simple_ table copying or periodic snap(cid:173)
`shots. This might work as follows, assuming multiple sites are updating the same
`data. First, updates from all replicated sites are periodically collected at a master or
`primary site, where all the updates are made to form a consolidated record of all
`changes. With some distributed DBMSs, this list of changes is collected in a snapshot
`log, which is a table of row identifiers for the records to go into the snapshot. Then a
`read-only snapshot of the replicated portion of the database is taken at the master
`site. Finally, the snapshot is sent to each site where there is a copy (it is often said that
`these other sites "subscribe" to the data owned at the primary site). This is called a
`full refresh of the database (Edelstein, 1995a). Alternatively, only those pages that
`have changed since the last snapshot can be sent, which is called a differential refresh.
`In this case, a snapshot log for each replicated table is joined with the associated
`base to form the set of changed rows to be sent to the replicated sites.
`Some forms of replication management allow dynamic ownership of data, in
`which the right to update replicated data moves from site to site, but at any point in
`time, only one site owns the right. Dynamic ownership would be appropriate as
`business activities move across time zones, or where the processing of data follows a
`work flow across business units supported by different database servers.
`A final form of replication management allows shared ownership of data. Shared
`updates introduces significant issues for managing update conflicts across sites. For
`
`Petitioners' Exhibit 1026, Page11 0
`
`
`
`II I ~Ill .
`
`Options for Distributing a Database
`
`425
`
`example, what if tellers at two bank branches try to update a customer's address at
`the same time? Asynchronous technology will allow conflicts to exist temporarily.
`This may be fine as long as the updates are not critical to business operations, and ·
`such conflicts can be detected and resolved before real business problems arise.
`The cost to perform a snapshot refresh may depend on whether the snapshot is
`simple or complex. A simple snapshot is one that references all or a portion of only
`one table. A complex snapshot involves multiple tables, usually from transactions
`that involve joins (such as the entry of a customer order and associated line items).
`With some distributed DBMSs, a simple snapshot can be handled by a differential
`refresh whereas complex snapshots require more time-consuming full refreshes.
`Some distributed DBMSs support only simple snapshots.
`
`t4ear Re.ai-Time Replication For near real-time requirements, store and forward
`messages for each completed transaction can be broadcast across the network inform(cid:173)
`ing all nodes to update data as soon as is possible, without forcing a confirmation to
`the originating node (as is the case with a coordinated commit protocol, discussed
`below) before the database at the originating node is updated (Schussel, 1994). One
`way to generate such messages is by using triggers (discussed in Chapter 9). A trigger
`can be stored at each local database so that when a piece of replicated data is updat(cid:173)
`ed, the trigger executes corresponding update conunands against remote database
`replicas (Edelstein, 1993). With the use of triggers, each database update event can be
`handled individually and transparently to programs and users. If network connec(cid:173)
`tions to a node are down or the node is busy, these messages informing the node to
`update its database are held in a queue to be processed when possible.
`
`Pull Replication The schemes explained above for synchronizing replicas are all
`examples of push strategies. Pull strategies also exist. In a pull strategy, the target, not
`the source node, controls when a local database is updated. With pull strategies, the
`local database determines when it needs to be refreshed, and requests a snapshot or
`the emptying of an update message queue. Pull strategies have the advantage that
`the local site controls when it needs and can handle updates. Thus, synchronization is
`less disruptive and occurs only when needed by each site, not when a central master
`site thinks it is best to update.
`
`Database Integrity with Replication For both periodic and near real-time replica(cid:173)
`tion, consistency across the distributed, replicated database is compromised. Whether
`delayed or near real-time, the DBMS managing replicated databases still must ensure
`the integrity of the database. Dedsion support applications permit synchronization
`on a table-by-table basis, whereas near real-time applications require transaction-by(cid:173)
`transaction synchronization. But in both cases, the DBMS must ensure that copies are
`synchronized per application requirements.
`The difficulty of handling updates with a replicated database also depends on
`the number of nodes at which updates may occur (Froemming, 1996). In a single(cid:173)
`updater environment, updates will usually be handled by periodically sending read(cid:173)
`only database snapshots of updated database segments to the nonupdater nodes. In
`this case the effects of multiple updates are effectively hatched for the read-only sites.
`This would be the situation for product catalogs, price lists, and other reference data
`for a mobile sales force. In a multiple-updater environment, the most obvious issue is
`data collisions. Data collisions arise when the independently operating updating
`nodes are each attempting to update the same data at the same time. In this case, the
`DBMS must include mechanisms to detect and handle data collisions. For example,
`
`Petitioners' Exhibit 1026, Page 11
`
`
`
`1
`
`¥ 1
`
`F I
`
`426
`
`Chapter 11 -Distributed Databases
`
`the DBMS must decide if processing at nodes in conflict should be suspended until
`the collision is resolved.
`
`When to Use Replication Whether replication is a viable alternative design for a
`distributed database depends on several factors (Froemming, 1996):
`
`1. Data timeliness Applications that can tolerate out-of-date data (w hether this be
`for a few seconds or a few hours) are better candidates for replication.
`
`2. DBMS capabilities An important DBMS capability is whether it will support a
`query that references data from more than one node. If not, then replication is a
`better candidate than the partitioning schemes, which are discussed in the
`following sections.
`
`3. Performance implications Replication means that each node is periodically
`refreshed. When this refreshing occurs, the distributed node may be very busy
`handling a large volume of updates. If the refreshing occurs by event triggers
`(for example, when a certain volume of changes accumulate), refreshing could
`occur at a time when the remote n ode is busy doing local work.
`
`4. Heterogeneity in the network Rep lication can be complicated if different nodes
`use different operating systems and DBMSs, or, more commonly, use different
`database designs. Mapping changes from one site to n other sites could mean n
`different routines to translate the changes from the originating node into the
`scheme for processing at the other nodes.
`
`5. Communications network capabilities Transmission speeds and capacity in a
`data communications network may prohibit frequent, complete refreshing of
`very large tables. Replication does not require a dedicated communications
`connection, however, so less expensive, shared networks could be used for
`database snapshot transmissions.
`
`Horizontal Partitioning
`
`With horizontal partitioning (see Chapter 7 for a description of different forms of table
`partitioning), some of the rows of a table (or relation) are put into a base relation at
`one site, and other rows are put into a base relation at another site. More generally,
`the rows of a relation are distributed to many sites.
`Figure 11-5 shows the result of taking horizontal partitions of the Customer rela(cid:173)
`tion. Each row is now located at its home branch. If customers actually conduct most
`of their transactions at the home branch, the transactions are processed locally and
`response times are m.ini.m.ized. When a customer initiates a transaction at another
`branch, the transaction must be transmitted to the home branch for processing and
`the response transmitted back to the initiating branch (this is the normal pattern for
`persons using automated teUer machines, or ATMs). If a customer's usage pattern
`changes (perhaps because of a move), the system may be able to detect this change
`and dynamically move the record to the location where most transactions are being
`initiated. In summary, horizontal partitions for a distributed database have four
`major advantages:
`
`1. Efficiency Data are stored dose to where they are used and separate from
`other data used by other users or applications.
`
`2. Local optimization Data can be stored to optimize performance for local access.
`
`Petitioners' Exhibit 1026, Page 12
`
`
`
`, ...
`: _.;
`
`Options for Distributing a Database
`
`427
`
`200
`426 '.
`683 ;:.
`252 ·.
`
`fi!urc 11-5
`Horizontal partitions
`
`(a) Lakeview Branch
`
`_ ... _, .. ,:'
`....
`: BrancJ:l:_Mame."· ·:;: Ir~1,1ce
`
`(
`
`. ~i~::: ..•.. · ::.;:~·~.~::·
`
`.-.
`
`-........... .
`
`·· Acct_Nu~be~_:_·.: : Cilstome: __Name . . . Brrut~~~~~:·.·_}~/~~l~ce .·.
`
`·•:.,:.· .,:~~~:~.• '. , ·:,. . ,·;. ~:~·:.~~~~F·;~~i[:'~.
`
`(b) Valley Branch
`
`II I illl
`
`3. Security Data not relevant to usage at a particular site are not made available.
`
`4. Ease of querying Combining data across horizontal partitions is easy since
`rows are simply merged by wtions across the partitions.
`
`Thus, horizontal partitions are usually used when an organizational function is dis(cid:173)
`tributed, but each site is concerned with only a subset of the entity instances (fre(cid:173)
`quently based on geography).
`Horizontal partitions also have two primary disadvantages:
`
`1. Inconsistent access speed When data from several partitions are required, the
`access time can be significantly different from local-only data access.
`
`2. Backup vulnerability Since data are not replicated, when data at one site
`become inaccessible or damaged, usage cannot switch to another site where a
`copy exists; data may be lost if proper backup is not performed at each site.
`
`Vertical Partitioning
`
`With the vertical partitioning approach (again, see Chapter 7), some of the columns of
`a relation are projected into a base relation at one of the sites, and other colunms are
`projected into a base relation at another site (more generally, columns m