throbber
[.ITANDEM
`
`An Approach to Decentralized
`Computer Systems
`
`Part Number: 87611
`
`Technical Report 85.4
`June 1985
`
`1
`
`LYFT 1033
`
`LYFT 1033
`
`1
`
`

`

`2
`
`

`

`”'ITANDEMCOMPUTERS
`
`Technical Report 85.4
`June 1985
`
`An Approach to Decentralized
`Computer Systems
`
`PN87611
`
`3
`
`

`

`4
`
`

`

`AN APPROACH TO DECENTRALIZED COMPUTER SYSTEMS
`
`Jim Gray
`
`June 1985
`
`Tandem Technical report 85.4
`
`5
`
`

`

`6
`
`

`

`Tandem TR 85.4
`
`AN APPROACH TO DECENTRALIZED COMPUTER SYSTEMS
`
`Jim Gray
`
`June 1985
`
`Revised January 1986
`
`ABSTRACT
`
`The technology
`
`for distributed computing
`
`is available.
`
`However,
`
`decentralized systems still
`
`pose design and management problems.
`
`Decentralized systems will
`
`always
`
`require more
`
`careful
`
`design,
`
`planning, and management than their centralized counterparts.
`
`This paper begins with the
`
`rational for and against decentralization.
`
`Then, a technical approach to decentralized systems is sketched. This
`
`approach contrasts with the popular
`
`concept
`
`of
`
`a
`
`distributed
`
`integrated database which transparently provides
`
`remote
`
`10
`
`against
`
`single system image. Rather, it proposes that function be distributed
`
`as "servers" which abstract data
`
`as high—level operations
`
`on objects
`
`and communicate with 'requestors' via
`
`a
`
`standard message protocol.
`
`The requestor-server approach has the
`
`advantages of modularity and
`
`performance.
`
`—-—-————.——.—u.—-—--—
`
`This paper has been submitted for publication to the IEEE Transactions
`on Software Engineering.
`
`7
`
`

`

`8
`
`

`

`
`TABLE g CONTENTS
`
`1.
`
`2.
`
`What
`
`is a decentralized system?
`
`1.1. Why build decentralized systems?
`
`1.2.
`
`Integrated system or integrated database?
`
`Technical aspects of distributed systems
`
`2.1. Computational model -- Objects-processes~messages
`
`2.2. Dictionary: naming, authorization and control
`
`2.3. Database
`
`2.3.1. Data definition
`
`2.3.2. Data manipulation
`
`2.3.3. Data independence
`
`2.3.4. Database design
`
`2.4. Networking
`
`2.4.1. Network protocols
`
`2.4.2. Terminal support -- terminal
`
`independence
`
`2.4.3. Network management
`
`2.5. Transaction management
`
`2.5.1. The transaction concept
`
`2.5.2. Direct and queued transaction processing
`
`An approach to designing and managing decentralization
`
`Summary
`
`Acknowledgments
`
`References
`
`9
`
`

`

`10
`
`10
`
`

`

`
`1.1. What is a decentralized system?
`
`A decentralized computer system, as opposed to a centralized one,
`
`is
`
`collection of autonomous computers which communicate with one
`
`another
`
`to perform a common service.
`
`A decentralized system might
`
`occupy
`
`a
`
`single room, but more typically decentralized systems have geographic
`
`and organizational diversity.
`
`The world telephone
`
`system is
`
`the biggest
`
`and best
`
`example of a
`
`decentralized system. It consists of
`
`thousands of
`
`computers,
`
`and
`
`almost a billion terminals.
`
`SOme of the nodes of the system are tiny
`
`local PBX's while others
`
`are quite large, able to handle many calls
`
`per second. Different parts of the system are operated by cooperating
`
`organizations with different
`
`hardware,
`
`different
`
`languages
`
`and
`
`different ideologies. They
`
`have
`
`agreed to protocols which
`
`allow
`
`direct dialing from anywhere to anywhere and the consequent automatic
`
`routing and billing.
`
`[Amos].
`
`Other examples of decentralized systems can
`
`be
`
`found
`
`in the world
`
`travel
`
`industry system,
`
`inter—connecting travel
`
`agents with hotels,
`
`airlines, and other reservation systems, and the
`
`emerging electronic
`
`financial system, connecting
`
`financial
`
`institutions,
`
`businesses and
`
`governments.
`
`11
`
`11
`
`

`

`These systems have the following common features:
`
`* diverse organizations and organizational procedures,
`
`* diverse computer architectures, both hardware and software,
`
`* diverse terminal types,
`
`* diverse system sizes,
`
`from tiny to large. and
`
`* diverse site environments.
`
`The "glue'I that holds
`
`each of
`
`these
`
`systems together is a message
`
`protocol. For each of these systems,
`
`the participating organizations
`
`agreed that:
`
`"When I send you this message,
`
`it means thus
`
`and so“.
`
`They also agreed to the bit—level format of each such message.
`
`The thesis of this article is that these systems could have been built
`
`as an “integrated database",
`
`but
`
`that
`
`such
`
`a
`
`system would
`
`be a
`
`management nightmare.
`
`In reality,
`
`the parts of these systems are not
`
`tightly integrated —- each part
`
`is generally quite different from the
`
`others.
`
`It is "integrated" by having the parts agree to a
`
`common but
`
`arms—length message protocol.
`
`But, before getting into the
`
`the
`
`technical details of distrubted
`
`computing, it is worthwhile
`
`to
`
`review why we distribute computer
`
`systems in the first place.
`
`12
`
`12
`
`

`

`1.1.
`
`Ehy decentralize s stems?
`
`There is no "best'I form of organization.
`
`Each form has its advantages
`
`and disadvantages.
`
`The structure of computing is just one
`
`asPeCt
`
`0f
`
`the structure of
`
`an organization.
`
`Centralized organizations will
`
`continue with centralized computing
`
`and decentralized organizations
`
`will adopt appropriate degrees of decentralized computing [March].
`
`For such decentralized organizations, distributed computer systems are
`
`likely to give
`
`the operational units control of their data.
`
`If done
`
`correctly, a decentralized system allows more flexibility for
`
`growth
`
`of capacity and function.
`
`In addition, a decentralized system can
`
`be
`
`more maintainable since
`
`each part
`
`can change anything so long as it
`
`continues to support its external interfaces.
`
`Expanding on
`
`this
`
`argument,
`
`there
`
`are both
`
`organizational
`
`and
`
`technical motives for decentralizing a system.
`
`The main organizational reasons for decentralized computing are:
`
`* Integration 93 Ere-existing system :
`
`As existing computer
`
`systems
`
`are connected to provide
`
`common
`
`services,
`
`one naturally gets
`
`a
`
`distributed computer
`
`system.
`
`The world phone
`
`system,
`
`travel
`
`reservation systems and finance systems give good examples
`
`of this
`
`decentralization via integration of pre—existing systems.
`
`* Organizational autonomy: Managers want administrative control of
`
`facilities critical to their effectiveness.
`
`To make
`
`this
`
`control
`
`13
`
`13
`
`

`

`real, most administrators also want operational
`
`control of
`
`their
`
`computers and their databases.
`
`Increasingly. computer
`
`systems
`
`are
`
`being designed to reflect the organization structure.
`
`There are several technological reasons for building a decentralized
`
`system.
`
`It is sometimes not feasible to build a centralized system
`
`with the required capacity, response-time, availability or seCurity of
`
`a distributed system. Typical
`
`issues are:
`
`* Capacity:
`
`Some applications have needs far in excess of the largest
`
`computer. They can't fit on a single computer or a single site.
`
`
`* Response-time:
`
`If requestors are distributed over a large area,
`
`the
`
`transit times for requests and replies may be prohibitive. Long haul
`
`commuications can add a second
`
`to response time. Putting processing
`
`and the needed data close to the requestor eliminates such delays.
`
`* Availability: Having geographically distributed autonOmous sites
`
`processing parts of the same application has
`
`the effect
`
`of
`
`error
`
`containment -- a failure or disaster is
`
`likely to be limited to a
`
`single site.
`
`In
`
`addition,
`
`the
`
`remaining
`
`sites may
`
`be
`
`able
`
`to
`
`substitute for a failed node.
`
`
`* Cost: Decentralization may reduce long-haul
`
`communications traffic
`
`and hence reduce communications costs.
`
`The
`
`communications
`
`savings
`
`may outweigh the extra cost of distributed facilites and staff.
`
`14
`
`14
`
`

`

`* Security:
`
`Some orgainzations
`
`feel more
`
`secure when
`
`they have
`
`physical control over the
`
`site that
`
`stores
`
`their data.
`
`(I
`
`am
`
`skeptical of this argument but it is frequently made).
`
`Modularity is another reason for building a distributed system —-
`
`the
`
`desire for modular
`
`growth and change of function and capacity as well
`
`as the desire for the manageability which derives
`
`from a modular
`
`structure.
`
`A decentralized system is necessarily modular:
`
`the autonomous nodes of
`
`the computer network
`
`communicate with one
`
`another via
`
`a message
`
`protocol.
`
`In such a system it is easy to add capacity by adding nodes
`
`or by growing a node.
`
`If done properly, such change is non-disruptive
`
`-- changing one
`
`service does not
`
`interrupt other services so long as
`
`the change is
`
`upward compatible.
`
`Centralized systems are much more
`
`limited in their ability to grow and change.
`
`15
`
`15
`
`

`

`1.2
`
`Integrated system 9; integrated database
`
`As a preview of the thesis of this paper, observe that a major
`
`thrust
`
`of the 19?0's was towards the centralization of data
`
`and application
`
`definition so that diverse parts of
`
`the organization could
`
`share
`
`information -- the goal was
`
`an “integrated-database“.
`
`In most cases
`
`this has resulted in a bureaucracy and
`
`a monolithic
`
`system.
`
`The
`
`corporate-wide Data Base Administrator (DBA)
`
`is a centralized function
`
`and has all
`
`the problems
`
`associated with centralized systems. This
`
`structure can be a source of delays and organizational friction.
`
`A more workable approach to decentralized systems
`
`is
`
`to let
`
`each
`
`department have its own DBA. Rather than exposing the detailed record
`
`formats of its database --
`
`the
`
`traditional
`
`view of
`
`an
`
`integrated
`
`database -- each function externalizes a protocol for manipulating the
`
`data it maintains.
`
`For
`
`example,
`
`the order
`
`entry function of
`
`a
`
`business might support messages to lookup, add, and alter orders.
`
`The
`
`database and accounting
`
`rules for managing orders would
`
`be
`
`hidden
`
`within the procedures supporting these messages.
`
`Such
`
`an
`
`interface
`
`allows a department to change its internal implementation and
`
`to add
`
`new function without
`
`impacting other departments, so long as
`
`the old
`
`message interface is still supported.
`
`In addition,
`
`the department can
`
`enforce integrity and authority constraints
`
`on the data
`
`by assuring
`
`that only it's programs alter the data. This gives modular change and
`
`modular growth to the whole
`
`system.
`
`Management of
`
`these protocol
`
`definitions becomes the responsibility of the network DBA.
`
`16
`
`16
`
`

`

`2.
`
`Technical aspects 9E decentralized systems
`
`So far,
`
`the
`
`rational for decentralized systems has
`
`been
`
`sketched.
`
`This section proposes a technical approach to decentralized systems.
`
`The main technical problem unique to decentralized systems is the lack
`
`of global (centralized) knowledge. It
`
`is difficult to know everything
`
`about the rest of
`
`the network.
`
`Yet global
`
`knowledge seems to be
`
`required to answer questions such as:
`
`“Where is file A?", or "What is
`
`the best way
`
`to reach node N?". Most other technical problems of
`
`decentralized systems are also present
`
`in centralized systems.
`
`It
`
`is
`
`simply a matter of degree. Messages in a distributed system travel
`
`more slowly,
`
`less reliably and
`
`at greater cost.
`
`In
`
`a centralized
`
`system, a data item is rarely more than a disc seek away (~30 ms),
`
`in
`
`a distributed system it may be several satellite hops away (~l sec).
`
`Moreover,
`
`long—haul communication systems sometimes lose messages
`
`and
`
`always charge a lot to deliver one. The lack of global
`
`knowledge adds
`
`an entirely new dimension
`
`to the problems of data placement
`
`and
`
`program execution -- where to keep the data and where to
`
`run
`
`the
`
`programs.
`
`The following model deals with decentralization of knowledge by making
`
`each component a module which may be connected to any other module.
`
`17
`
`17
`
`

`

`2.1
`
`
`Computational model: objects—processes—messages
`
`Much in the style of modern programming languages such
`
`as Smalltalk,
`
`Modula, and Ada,
`
`the system supports the concept of object "type" and
`
`object "instance". Primitive object
`
`types
`
`are Process, Terminal,
`
`File. and Node. Application developers implement new types by writing
`
`programs in a
`
`conventional
`
`language
`
`such as Cobol or PLI.
`
`The
`
`programs are executed as processes —* programs running on computers.
`
`Externally, each process
`
`appears
`
`to be
`
`an object
`
`instance which
`
`supports the verbs of that object
`
`type.
`
`Internally,
`
`processes are
`
`structured as a collection of subvroutines which implement that.type.
`
`A process may address other objects
`
`(processes,
`
`files,
`
`terminals,
`
`etc.)
`
`by OPENing them. The open step takes a
`
`symbolic name
`
`for the
`
`object,
`
`locates the
`
`named object.
`
`checks
`
`the authorization of
`
`the
`
`process to the object and returns a reference for
`
`a "session“
`
`to the
`
`object -- in operating systems terms,
`
`it returns a capability for the
`
`object. Thereafter the process may operate on the object
`
`by
`
`sending
`
`messages via the
`
`session (WRITE),
`
`and
`
`reading the
`
`replies
`
`(READ).
`
`Ultimately the session is terminated (CLOSE).
`
`This discussion is
`
`symmetric,
`
`a process may be OPENed.
`
`In that case
`
`it will receive an OPEN message. which it may accept or reject, and if
`
`accepted it will
`
`receive
`
`a
`
`sequence
`
`of requests_which it READs and
`
`generates a corresponding sequence of replies
`
`(WRITES). Ultimately,
`
`it may receive a CLOSE message or unilaterally close the session.
`
`18
`
`18
`
`

`

`Object creation is accomplished by opening
`
`a
`
`session to a process
`
`managing the object type. This session is then used to communicate the
`
`name and attributes of the new object.
`
`To give a simple example. One
`
`opens a session to a file to access the
`
`file. The session is opened with a process (file server)
`
`representing
`
`that file.
`
`Sending messages via
`
`the
`
`session instructs
`
`the
`
`server
`
`process to position within the file, and
`
`insert,
`
`update or
`
`extract
`
`records in the file.
`
`Because every object has a name, a process can
`
`address any object
`
`in.
`
`the network -- subject to authorization restrictions. This means that
`
`the process is unaware of the physical location of the object.
`
`It can
`
`write on any terminal, read or write any file,
`
`and address
`
`any other
`
`process. This is one key to structuring a decentralized system.
`
`In order to fit into the CALL structure of
`
`conventional
`
`programming
`
`languages,
`
`the concept
`
`of
`
`“remote procedure
`
`call“
`
`is
`
`introduced.
`
`Rather than the
`
`program having
`
`to OPEN, WRITE, READ and CLOSE the
`
`object, it may
`
`CALL
`
`an operation on the object.
`
`A remote procedure
`
`call has the format:
`
`<operation>(<ohj ect>,<p1>,<p2> , . . . , <pn>)RETURNS(<r1>, . . . ,<rm>)
`
`where the <pi) are by-value parameters and <rib are returned results.
`
`This request
`
`is translated as follows:
`
`(1) If the process managing the object
`
`is not OPEN, an open message is
`
`sent to it.
`
`(2) The following message is sent to the object manager:
`
`19
`
`19
`
`

`

`(<operation>,<object>,<pl),<p2>,...,<pn>)
`(3) The object manager
`performs
`the operation and
`
`replies with
`
`message:
`
`(<r1>,<r2>,...,<rm>).
`
`(4) The reply message is unpacked into <r1>,
`
`...,<rm>.
`
`Typically,
`
`the underlying software “saves“
`
`the
`
`open so
`
`that
`
`later
`
`operations on the same object will have low overhead.
`
`This structuring allows functions to be executed within a process,
`
`or
`
`in a different process perhaps at a remote node.
`
`It gives
`
`a uniform
`
`way for a
`
`node
`
`to export
`
`it's primitive types, and abstractions of
`
`these types. As will be argued later, nodes generally export
`
`high-
`
`level abstractions rather than primitive types.
`
`The message formats
`
`(2)
`
`and (3) above, describe the interface to the
`
`type. Defining a decentralized system consists
`
`of defining these
`
`message formats
`
`and
`
`their
`
`semantics.
`
`Thereafter,
`
`independent
`
`organizations can implement these protocols
`
`to provide services or to
`
`make requests of others.
`
`This model
`
`is
`
`the basis for the Tandem system [Tandem]. Argus System
`
`[Liskov] and the R*
`
`System [Lindsay].
`
`It also forms the basis for
`
`IBM's SNA Logical Unit Type 6, which defines the types DL/l Database,
`
`Queue, Process, etc.,
`
`and will
`
`grow to
`
`include many more
`
`types
`
`[Hoffman].
`
`The airlines
`
`reservation system to manage
`
`seats,
`
`the
`
`electronic funds transfer systems also use the message model.
`
`20
`
`10
`
`20
`
`

`

`2.2 Dictionary: naming. authorization and control
`
`The system dictionary plays a
`
`key
`
`role in object-oriented systems.
`
`The dictionary maps symbolic names to object descriptors, and performs
`
`authorization checks.
`
`In addition,
`
`the dictiOnary provides
`
`some
`
`reporting and control functions.
`
`In order to structure the name space, symbolic names are organized as
`
`a hierarchy. The
`
`name 'A.B.C"
`
`names an object but may also name a
`
`subtree of objects
`
`each with ”A.B.C"
`
`as
`
`a prefix of their names.
`
`Interior nodes of the hierarchy act as directory objects. Any object
`
`can act as a directory object. Allowing
`
`names to grow on
`
`the right
`
`allows objects to have sub-objects. Allowing names
`
`to grow on
`
`the
`
`left allows addressing of other
`
`name
`
`spaces
`
`for
`
`inter-networking.
`
`Prefixs and suffixs on telephone numbers exemplify this principal.
`
`The dictionary is partitioned among the nodes
`
`of
`
`the
`
`network,
`
`each
`
`node has a part of the dictionary.
`
`The dictionary stores the generic
`
`attributes of each object,
`
`its
`
`name.
`
`type, owner,
`
`and authorization,
`
`in the name
`
`space.
`
`The manager
`
`for
`
`that object type stores type-
`
`specific attributes in catalogs which parallel
`
`the name
`
`space.
`
`In
`
`this way,
`
`a
`
`new
`
`object
`
`type
`
`can
`
`be
`
`added
`
`by
`
`creating
`
`programs/processes which maintain the catalogs of that type and a
`
`new
`
`type instance can be added by requesting one of these processes to add
`
`records to the catalogs for that type as well as making
`
`an
`
`entry in
`
`the name Space.
`
`21
`
`11
`
`21
`
`

`

`When an object name is presented to OPEN, it is looked up in the name
`space.
`The prefix of
`the
`name designates the partition of the name
`
`space, for example Lookup 'A.B.C.D" might
`
`look for "B.C.D" in the name
`
`space at node
`
`“A".
`
`The alias object type allows renaming of objects
`
`or directories. Aliases are generally followed until
`
`a
`
`non—alias
`
`object or error is encountered.
`
`Lookup returns a descriptor
`
`for
`
`an
`
`object.
`
`The OPEN procedure uses the information in this descriptor to
`
`route the Open message to the appropriate object.
`
`Node autonomy implies
`
`that
`
`each
`
`node must
`
`be
`
`able
`
`to operate in
`
`isolation. Hence, each object or object
`
`fragment at
`
`a
`
`node
`
`is
`
`described by the
`
`name
`
`space
`
`at
`
`that
`
`node.
`
`In addition,
`
`the node
`
`dictionary may have descriptions
`
`of objects
`
`at other nodes.
`
`For
`
`example,
`
`if a
`
`file is partitioned among several nodes of the network,
`
`each node will certainly store the description of the local partition;
`
`but, each node may also store a
`
`replica of
`
`the description of the
`
`whole file.
`
`Authorization is one of the most difficult areas of decentralized
`
`systems. Of course, all network traffic is encrypted and
`
`low level
`
`protocols do node authentication.
`
`Each process
`
`executes under
`
`some
`
`authority (ID). Each object has
`
`an
`
`access control
`
`list associated
`
`with its descriptor in the dictionary.
`
`The entries of
`
`each
`
`access
`
`list are <who,what> pairs.
`
`The “who"
`
`is either an ID or the ID of a
`
`GROUP of
`
`IDs.
`
`The “what"
`
`is a
`
`list of
`
`authorities allowed
`
`to that
`
`object.
`
`22
`
`12
`
`22
`
`

`

`When a process tries to OPEN the object “Z",
`
`the name "Z“ is looked up
`
`in the dictionary to find its descriptor and access control list.
`
`The
`
`name server first checks the access list to see
`
`if the
`
`requestor has
`
`the authority to open
`
`the object.
`
`If not,
`
`the
`
`lookup signals a
`
`security violation.
`
`The issue of authenticating the requestor arises when the OPEN travels
`
`across the network.
`
`At best,
`
`the
`
`server knows that requestor R at
`
`node N made the request ——
`
`i.e. node N will
`
`vouch for R. Either of
`
`the following two approaches deal with this:
`
`either the
`
`access
`
`list
`
`can be structured as "ID at NUDE" elements for the "who" fields or
`
`a
`
`bidirectional password scheme can be required for remote requestors.
`
`As explained so far,
`
`the dictionary implements and exports objects
`
`of
`
`type directory, alias,
`
`type, access
`
`control
`
`list, and group.
`
`The
`
`support of types, allows others
`
`to
`
`implement basic
`
`types
`
`such as
`
`record, file,
`
`terminal,
`
`and application types
`
`such as
`
`invoice
`
`and
`
`customer .
`
`In addition to providing the basic naming,
`
`authorization and type
`
`extension facilities,
`
`the dictionary also provides
`
`a
`
`structure
`
`for
`
`reporting and control.
`
`The dictionary implements an object
`
`of type
`
`"dependency". Each dependency is a binary relation among objects. When
`
`a file definition is based on a record definition,
`
`the relationship
`
`< FILE-A , RECORD—R>
`
`is entered into the DEPENDS—ON relation. When an index is added
`
`to a
`
`file,
`
`the relationship
`
`23
`
`13
`
`23
`
`

`

`< FILE-A ,
`
`INDEX-I
`
`5'
`
`is added. When
`
`a
`
`program is compiled from source Sl,SZ,S3,.. SN,
`
`the
`
`relationships:
`
`< O , 51> and 4 O , $2 >,
`
`...
`
`, < 0 , Sn)
`
`are added.
`
`Centralized systems are able
`
`to recompute
`
`relationships
`
`on
`
`demand
`
`rather then maintain them as they are created. A decentralized system
`
`cannot economically
`
`search
`
`the
`
`entire
`
`system
`
`looking
`
`for
`
`relationships. Hence,
`
`the relationships must be explicitly asserted
`
`and maintained.
`
`Relationships can be used for reporting and for control:
`
`* Reporting: Relationships can tell what objects will be affected by a
`
`change and in turn can be used to automatically propagate the change
`
`in rebuilding an application —— recompile all affected programs. and
`
`reorganize all affected files and displays.
`
`* Control: Relationships
`
`can be used for change control -* to freeze
`
`them and all the objects they depend upon.
`
`24
`
`14
`
`24
`
`

`

`2.3
`
`
`Data Management
`
`
`2.3.1 Data definition
`
`Database systems
`
`implement
`
`the
`
`”record"
`
`type
`
`and operations
`
`on
`
`records.
`
`A record definition specifies
`
`the record fields, each field
`
`storing atoms of a designated type.
`
`In addition,
`
`integrity constraints
`
`may be associated with the
`
`record definition. Record
`
`instances must
`
`satisfy these constraints. Uniqueness, value
`
`range,
`
`and
`
`referential
`
`integrity are typical
`
`integrity constraints
`
`[Date].
`
`In
`
`addition,
`
`record definitions may have physical attributes
`
`such as
`
`location,
`
`recovery attributes, and block size.
`
`The instances of a
`
`record type
`
`are variously called a file, relation, set or table. We use
`
`the term
`
`file here.
`
`A file may be partitioned among nodes
`
`of the
`
`network based
`
`on some
`
`predicate-to-node map.
`
`The
`
`following
`
`is
`
`a
`
`sample
`
`partitioning
`
`criterion:
`
`ACCOUNT.NUMBER < 1000000 <2> WEST
`
`ACCOUNT.NUMBER 3 1000000 <=> EAST
`
`A record instance will
`
`be
`
`stored at
`
`the
`
`node
`
`(partition) which
`
`satisfies the predicate.
`
`Data is partitioned for several reasons:
`
`
`* Load sharing: distributing traffic on the data,
`
`* Locality: placing it closer to the data consumers thus improving
`
`local availability and response time and,
`
`25
`
`15
`
`25
`
`

`

`* Autonomy:
`
`allowing local control of each partition of the data.
`
`Partitioning has long been used to distribute data
`
`across multiple
`
`disk arms at a single node.
`
`Geographic distribution is
`
`a
`
`simple
`
`generalization of
`
`that
`
`concept.
`
`The
`
`implementation
`
`details
`
`are
`
`slightly different since the record definition must be
`
`replicated at
`
`each dictionary of each involved node.
`
`A record may
`
`be
`
`replicated at several nodes. Replication at a node
`
`has long been used for availability. If one copy
`
`is lost,
`
`the others
`
`can be used in its stead.
`
`In
`
`a geographically- distributed system
`
`replication can
`
`have
`
`the
`
`added benefit
`
`of
`
`improved
`
`response
`
`by
`
`eliminating long—haul message delays.
`
`But replication at multiple nodes connected via
`
`communications
`
`lines
`
`presents several novel problems if updates must be broadcast
`
`to all
`
`the replicas. This may cause
`
`substantial delays, and may
`
`inhibit
`
`update if some
`
`replicas
`
`are
`
`inaccessible.
`
`The presumption
`
`in
`
`a
`
`centralized system is that I'if a replica is unavailable
`
`to me,
`
`it is
`
`unavailable to everyone".
`
`This
`
`assumption
`
`is
`
`not valid
`
`in a
`
`decentralized system. Unavailability of a
`
`replica can be
`
`caused by
`
`replica failure or by communication failure.
`
`In the
`
`latter
`
`case,
`
`a
`
`replica may be available to one part of the network but not
`
`to others.
`
`The concepts of current and stale information helps to understand the
`
`algorithms designed to deal with the maintenance of
`
`replicas. Quite
`
`general algorithms to maintain data currency are known, but they
`
`have
`
`26
`
`16
`
`26
`
`

`

`substantial message delay cost.
`
`Other
`
`algorithms
`
`are
`
`known which
`
`tradeoff data
`
`currency in exchange
`
`for
`
`better
`
`performance
`
`or
`
`availability. Three kinds of replication are worth considering:
`
`* Current Global Replicas:
`
`In this scheme a majority of the replicas
`
`are updated as part of each transaction (the definition of majority
`
`differs from proposal
`
`to proposal).
`
`The best of
`
`these algorithms
`
`tolerate some node unavailability but they have the problem that for
`
`a small number nodes (two)
`
`they do not tolerate data unavailability
`
`and for a large number of nodes
`
`(more than
`
`2)
`
`they
`
`introduce long
`
`delays in transaction commit [Gifford],
`
`[Abadi].
`
`
`* éfi Soon gs Possible (ASAP) Updates: This scheme postulates a master
`
`node(per record type or instance) which is allowed
`
`to unilaterally
`
`update the record at his node. The updates are
`
`then asynchronously
`
`sent
`
`to the other replicas. This approach sacrifices consistency for
`
`availability and response time [Norman].
`
`If the system is Operating
`
`correctly, replicas will be only seconds out of date.
`
`* Snapshots: There is
`
`a master
`
`file somewhere
`
`and
`
`slave
`
`copies
`
`elsewhere.
`
`Each slave copy is
`
`a snapshot
`
`of the master as of a
`
`certain time. The
`
`slave copies
`
`are periodically updated.
`
`This is
`
`appropriate for
`
`files which
`
`change very slowly
`
`and
`
`for which
`
`currency is not critical [Adiba].
`
`In this design, replicas may be a
`
`day or a week out of date.
`
`27
`
`17
`
`27
`
`

`

`Each of these approaches to replication has its place. Notice that the
`
`dictionary must
`
`use
`
`a
`
`current
`
`data algorithm for managing
`
`the
`
`replication of object descriptions because its users may not
`
`be
`
`able
`
`to tolerate inconsistencies.
`
`The replication criterion and
`
`replication algorithms
`
`are
`
`specified
`
`when defining or redefining a record. When a program runs, it must be
`
`unaware that
`
`a
`
`record
`
`is partitioned or
`
`replicated because
`
`the
`
`partitioning and replication may change after the program is written.
`
`In order to deal with this,
`
`the program may specify that it is willing
`
`to accept stale data
`
`CURRENCY := FALSE
`
`or that it wants the most current copy
`
`CURRENCY := TRUE.
`
`This approach allows data independence
`
`and also allows performance
`
`improvements which make replicated data workable.
`
`28
`
`18
`
`28
`
`

`

`
`2.3.2 Data manipulation
`
`Record instances are accessed and manipulated by a data manipulation
`
`language (DML) which allows the requestor
`
`to add,
`
`read, alter
`
`and
`
`delete sets of records. The
`
`DML is
`
`really a
`
`compiler for
`
`a limited
`
`for: of
`
`set
`
`theory expressions.
`
`It must
`
`contend with
`
`records
`
`distributed over several nodes of the network.
`
`The DML compiler produces a plan to perform the desired operation. The
`
`compiler is aware of the data distribution criterion and based on that
`
`it picks a global plan which minimizes a cost
`
`function. It
`
`then sub—
`
`contracts the access
`
`to the
`
`local data to the remote nodes. These
`
`nodes develop their own plans for doing the data manipulation at their
`
`nodes.
`
`If the data organization at a node changes,
`
`it recompiles its
`
`plan without affecting the other nodes.
`
`Subcontracting setvoriented operations
`
`rather
`
`than record—at-a-time
`
`operations produces substantial savings in response time
`
`and message
`
`traffic. However,
`
`even greater
`
`savings
`
`can
`
`be
`
`achieved
`
`by
`
`subcontracting the application logic itself. For example, reserving an
`
`airline seat at
`
`a remote node will involve updates to multiple record
`
`types and several integrity checks. Application level protocols
`
`cast
`
`this as a single remote procedure call to a server process which then
`
`does local DML operations.
`
`This
`
`is more efficient
`
`than
`
`invoking
`
`several DML operations across the network.
`
`It also allows the
`
`server
`
`node better control of access to the data.
`
`29
`
`19
`
`29
`
`

`

`
`2.3.3 Data independence
`
`It must be possible to move, partition, replicate,
`
`and physically
`
`reorganize data without
`
`invalidating programs which
`
`access
`
`it.
`
`The
`
`DML plan is automatically reevaluated if the physical structure of the
`
`data changes. This is called data independence.
`
`In addition,
`
`it must be possible to alter the
`
`logical structure
`
`of a
`
`record ~- for
`
`example add
`
`fields to a record type or
`
`replace
`
`one
`
`record type with two others which can be joined together to form the
`
`original. There are limits to what can
`
`be done here, but
`
`the basic .
`
`trick is to define the "old“ interface as a view of the new interface.
`
`A view is
`
`a materialization rule which allows instances of one record
`
`type to be materialized from others.
`
`Unless the view is very simple,
`
`a
`
`one—to-one map
`
`from the base
`
`records, it may not be possible to reflect updates on the view down to
`
`the base files.
`
`For this reason, views are primarily used to subset a
`
`record type to provide access control to data.
`
`They allow one to hide
`
`data from programs unless they have a “need to know".
`
`To repeat
`
`the
`
`theme,
`
`having
`
`a procedural
`
`interface to the data (a
`
`server) allows the provider of the service much more
`
`flexability in
`
`redefining the data without
`
`impacting others.
`
`The view constructs of
`
`data management systems are very restrictive
`
`in
`
`the
`
`kinds
`
`of data
`
`independence they support.
`
`30
`
`20
`
`30
`
`

`

`2.3.4 Database design
`
`Logical database design specifies which
`
`records will be available and
`
`the integrity constraints applied to the records.
`
`Physical database
`
`design specifies
`
`how the
`
`records
`
`are
`
`stored:
`
`file organization,
`
`indices, partitioning criteria, replication criteria.
`
`Distribution has little affect on
`
`logical database design unless the
`
`desired design has poor performance.
`
`In that
`
`case it may have
`
`to be
`
`changed as a concession to performance.
`
`Physical database design is intimately related to data distribution.
`
`At present database design is a largely manual process but much of the
`
`design work could be mechanized.
`
`The first step is to instrument the
`
`system so
`
`that
`
`one
`
`can get
`
`meaningful measurements of
`
`record statistics_ and activity.
`
`Given
`
`these statistics and the response time requirements of the system,
`
`the
`
`database design problem can be mechanized as follows: Since
`
`the DML
`
`can predict the cost
`
`of
`
`a particular plan for a particular database
`
`design, one can evaluate the cost of a particular database design on a
`
`particular application workload —— it is the weighted sum of the costs
`
`of the individual queries.
`
`By
`
`evaluating all possible designs, one
`
`can find acceptable or
`
`least-cost designs.
`
`This
`
`search can
`
`be
`
`mechanized.
`
`31
`
`21
`
`31
`
`

`

`2.4
`
`Networking and terminal support
`
`2.4.1 Network protocols
`
`The discussion so far presumes
`
`a very powerful networking capability.
`
`It assumes that any process in any node can open a
`
`session with any
`
`process in any other
`
`node.
`
`Files
`
`and
`
`terminals are viewed as the
`
`processes which control
`
`them.
`
`Implicitly,
`
`it
`
`assumes
`
`that
`
`these
`
`sessions may be established with relatively little overhead.
`
`This view is implicit
`
`in IBM's System Network Architecture [Hoffman],
`
`Tandem's Guardian-Expand architecture [Tandem] and the
`
`ISO reference
`
`model for Open Systems Interconnection.
`
`These protocols go
`
`on
`
`to
`
`architect higher level objects such as processes, queues
`
`and files
`
`based on the session layer.
`
`In
`
`addition they specify many
`
`layers
`
`below the session layer.
`
`The point of this discussion is that it is reasonable to assume a very
`
`powerful networking capability in which any process in any
`
`node
`
`can
`
`open a session with any process in any other node.
`
`The only debatable
`
`point is the cost of such sessions. with present systems,
`
`the cost
`
`is
`
`high, but it
`
`seems
`
`to be acceptable given the benefits which accrue.
`
`Several commercial systems operate in this way today [CICS].
`
`[Tandem].
`
`32
`
`22
`
`32
`
`

`

`2.4.2 Terminal support -— terminal
`
`independence
`
`The real revolution in data management has
`
`been in the area
`
`of data
`
`display, not data storage.
`
`Information storage has not changed much in the last twenty years. We
`
`had random access memories, discs and tapes then and we have them now.
`
`The cost per byte has declined dramatically, but the lagical interface
`
`to storage has not changed much.
`
`In the same
`
`time, we
`
`have
`
`gone
`
`from card—readers to teletypes to
`
`alpha-numeric displays. We are about to go to bitmap displays with the
`
`heavy use of icons and graphics.
`
`When discs get bigger or
`
`faster,
`
`the
`
`files
`
`grow and the programs
`
`continue to work. When terminals
`
`change from card—readers
`
`to bitmap
`
`displays the programs
`
`need
`
`to be overhauled.
`
`This overhaul
`
`is
`
`expensive. Thus one
`
`frequently sees
`
`a
`
`desk

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket