`CDNCEPTS
`- .
`HENRY F. KDRTH
`
`ABRAHAM SILBERSCHATZ
`
`Facebook's Exhibit No. 1017
`
`Page 1
`
`
`
`McGraw-Hill Computer Science Series
`
`Ahuja: Design and Analysis of Computer Communication Networks
`Donovan: Systems Programming
`Filman and Friedman: Coordinated Computing: Tools and Techniques for
`Distributed Software
`Hamacher, Vranesic, and Zaky: Computer Organization
`Hayes: Computer Architecture and Organization
`Hutchison and Just: Programming Using the C Language
`Keller: A First Course in Computer Programming Using Pascal
`Kohavi: Switching and Finite Automata Theory
`Korth and Silberschatz: Database System Concepts
`Levi and Agrawala: Real-Time System Design
`Liu: Elements of Discrete Mathematics
`Liu: Introduction to Combinatorial Mathematics
`Madnick and Donovan: Operating Systems
`Manna: Mathematical Theory of Computation
`Milenkovic: Operating Systems: Concepts and Design
`Newman and Sproull: Principles of Interactive Computer Graphics
`Payne: Introduction to Simulation: Programming Techniques and Methods
`of Analysis
`Rice: Matrix Computations and Math~matical Software
`Salton and McGill: Introduction to Modem Information Retrieval
`Schalkoff: Artificial Intelligence: An Engineering Approach
`Shooman: Software Engineering: Design, Reliability, and Management
`Tremblay and Bunt: An Introduction to Computer Science: An Algorithmic
`Approach
`Tremblay, DeDourek, and Bunt: An Introduction to Computer Science:
`An Algorithmic Approach, Pascal Edition
`Tremblay and Sorenson: The Theory and Practice of Compiler Writing
`Tucker: Programming Languages
`Tucker: Computer Science: A Second Course Using Modula-2
`
`McGraw-Hill Series in Systems
`
`Consulting Editor
`Abraham Silberschatz
`
`Ceri and Pelagatti: Distributed Databases: Principles and Systems
`Korth and Silberschatz: Database System Concepts
`Levi and Agrawala: Real-Time System Design
`Su: Database Computers: Principles, Architecture, and Techniques
`Wiederhold: Database Design
`Weiderhold: File Organization for Database Design
`
`
`
`DATABASE
`SYSTEM
`CONCEPTS
`
`SECOND EDITION
`
`HENRY F. KDR I H
`ABRAHAM SILBERSCHATZ
`University of Texas at Austin
`
`McGraw-Hill, Inc.
`New York St. Louis San Francisco Auckland Bogota
`Caracas Lisbon London Madrid Mexico Milan
`Montreal New Delhi Paris San Juan Singapore
`Sydney Tokyo Toronto
`
`
`
`DAT ABASE SYSTEM CONCEPTS
`
`Copyright© 1991, 1986 by McGraw-Hill, Inc. All rights reserved.
`Printed in the United States of America. Except as permitted under the United
`States Copyright Act of 1976, no part of this publication may be reproduced or
`distributed in any form or by any means, or stored in a data base or retrieval
`system, without the prior written permission of the publisher.
`
`90 AGM AGM 998765
`
`ISBN 0-07-044754-3
`
`The editors were David M. Shapiro and Joseph F. Murphy;
`the production supervisor was Kathryn Porzio.
`Arcata Graphics/Martinsburg was the printer and binder.
`
`Library of Congress Cataloging-in-Publication Data
`Korth, Henry F.
`Database system concepts / Henry F. Korth, Abraham Silberschatz.-
`2nd ed.
`p. cm.
`Includes bibliographical references and index.
`ISBN 0-07-044754-3
`1. Data base management. I. Silberschatz, Abraham. II. Title.
`QA76.9.D3K67 1991
`005.74-dc20
`This book is printed on acid-free paper.
`
`90-20826
`
`
`
`1
`
`Introduction
`
`A database management system (DBMS) consists of a collection of interrelated
`data and a set of programs to access that data. The collection of data,
`usually referred to as the database, contains information about ont
`particular ~nterprise. The primary goal of a DBMS is to provide an
`environment that is both convenient and efficient to use in retrieving and
`storing database information.
`Database systems are designed to manage large bodies of information.
`The management of data involves both the definition of structures for the
`the provision of mechanisms for
`the
`storage of information and
`manipulation of information. In addition, the database system must
`provide for the safety of the information stored, despite system crashes or
`attempts at unauthorized access. If data is to be shared among several
`users, the system must avoid possible anomalous results.
`The importance of information in most organizations, and hence the
`value of the database, has led to the development of a large body of
`concepts and techniques for the efficient management of data. In this
`chapter, we present a brief introduction to the principles of database
`systems.
`
`1.1 Purpose of Database Systems
`Consider part of a savings bank enterprise that keeps information about all
`customers and savings accounts in permanent system files at the bc1n!<. In
`addition, the system has a number of application programs that allow the
`user to manipulate the files, including:
`
`• A program to debit or credit an account.
`• A progra111 to add a new account.
`• A program to find the balance of an account.
`~ A program to generate monthly statements.
`
`These application programs have been written by system programmers in
`response to the needs of the bank organization.
`
`
`
`2
`
`Introduction
`
`Chapter 1
`
`New application programs are added to the system as the need arises.
`For example, suppose that new government regulations allow the savings
`bank to offer checking accounts. As a result, new permanent files are
`created
`that contain
`information about all
`the checking accounts
`maintained in the bank, and new application programs may need to be
`written. Thus, as time goes by, more files and more application programs
`are added to the system.
`The typical file-processing system described above is supported by a
`conventional operating system. Permanent records are stored in various
`files, and a number of different application programs are written to extract
`records from and add records to the appropriate files. This scheme has a
`number of major disadvantages
`
`• Data redundancy and inconsistency. Since the files and application
`programs are created by different programmers over a long period of
`time, the files are likely to have different formats and the programs
`may be written in several programming languages. Moreover, the
`same piece of information may be duplicated in several places (files).
`For example, the address and phone number of a particular customer
`may appear in a file that consists of savings account records and in a
`file that consists of checking account records. Thjs redundancy leads to
`higher storage and access cost.
`In addition, it may lead to data
`inconsistency -
`that is, the various copies of the same data may no
`longer agree. For example, a changed customer address may be
`reflected in savings account records but not elsewhere in the system.
`Data inconsistency results.
`• Difficulty in accessing data. Suppose that one of the bank officers
`needs to find out the names of all customers who live within the city's
`78733 zip code. The officer asks the data processing department to
`generate such a list. Since this request was not anticipated when the
`original system was designed, there is no application program on hand
`to meet it. There is, however, an application program to generate the
`list of all customers. The bank officer has now two ~hoices: Eit~er get
`the list of customers and extract the needed information manu!3lly, or
`ask the data processing department to have a system programmer
`write
`the necessary application program. Both alternatives are
`obviously unsatisfactory. Suppose that such a program is actua~ly
`written and that, several days later, the same officer needs to trim that
`list to include only those customers with an account balance of $10,000
`or more. As expected, a program to generate such a list does not exist.
`Again, the officer has the preceding two options, neither of which is
`satisfactory.
`The point here is that conventional file-processing environments do
`not allow needed data to be retrieved in a convenient and efficient
`
`
`
`Section 1.1
`
`Purpose of Database Systems
`
`3
`
`manner. Better data retrieval systems must be developed for general
`use.
`
`• Data isolation. Since data is scattered in various files, anp files may be
`in different formats, it is difficult to write new application programs to
`retrieve the appropriate data.
`
`• Concurrent access anomalies.
`the overall
`improve
`to
`In order
`performance of the system and obtain a faster response time, many
`systems allow multiple users to update the data simultaneously. In
`such an environment, interaction of concurrent updates may result in
`inconsistent data. Consider bank account A, with $500. If two
`customers withdraw funds (say $50 and $100 respectively) from
`account A at about the same time, the result of the concurrent
`executions may leave the account in an incorrect (or inconsistent) state.
`In particular, the account may contain either $450 or $400, rather than
`$350.
`In order to guard against
`this possibility, some form of
`supervision must be maintained in the system. Since data may be
`accessed by many different application programs which have not been
`previously coordinated, supervision is very difficult to provide.
`
`• Security problems. Not every user of the database system should be
`able to access all the data. For example, in a banking system, payroll
`personnel need only see that part of the database that has information
`about the various bank employees. They do not need access to
`information about customer accounts. Since application programs are
`added to the system in an ad hoc manner, it is difficult to enforce such
`security constraints.
`
`• Integrity problems. The data values stored in the database must satisfy
`certain types of consistency constraints. For example, the balance of a
`bank account may never fall below a prescribed amount (say, $25).
`These constraints are enforced in the system by adding appropriate
`code in the various application programs. However, when new
`constraints are added, it is difficult to change the programs to enforce
`them. The problem is compounded when constraints involve several
`data items from different files.
`
`These difficulties, among others, have prompted the development of
`database management systems. In what follows, we shall see the concepts
`and algorithms that have been developed for database systems to solve the
`abovementioned problems. For most of this book, we use a bank
`enterprise as a running example of a typical data processing application
`found in a corporation.
`In Chapters 12-14, we consider a different class of database
`applications,
`interactive design applications. Most current interactive
`
`
`
`4
`
`Introduction
`
`Chapter 1
`
`design applications are built as a collection of files and application
`programs. There is a substantial amount of research and development
`work underway to provide database systems that are both sufficiently
`powerful and sufficiently flexible to manage these applications. The
`concepts used in this work are based upon those we shall see in earlier
`chapters of the book.
`
`1.2 Data Abstractiori
`A database management system is a collection of interrelated files and a set
`of programs that allow users to access and modify th~se files. A major
`purpose of a database system is to provide users with an abstract view of
`the data. That is, the system hides certain details of h9w the data is stored
`and maintained. However, in order for the system to be usable, data must
`be retrieved efficiently. This concern has led to the design of complex data
`structures for the representation of data in the database. Since many
`datab~se systems users are not computer-trained, the complexity is hidden
`from them through several levels of abstraction in order to simplify their
`interaction with the system.
`
`• Physical level. The lowest level of abstraction describes how the data
`are ~ctually stored. At the physical level, complex low-level data
`structures are described in detail.
`• Conceptual level. The next-higher-level of abstraction describes what
`data are actually stored in the database, and the relationships that exist
`among the data. Here the entire database is described in terms of a
`small number of relatively simple structures. Although implementation
`of the simple structures at the conceptual level may involve complex
`physical-level structures, the user of the conceptual level need not be
`aware of this. The conceptual level of abstraction is used by database
`administrators, who must decide what information is to be kept in the
`database.
`• View level. The highest ievel of abstraction describes only part of the
`entire database. Despite the use of simpler structures at the conceptual
`level, some complexity remains because of the large size of the
`database. Many users of the database system will not be concerned
`with all of this information. Instead, such users need only a part of the
`database. To simplify their interaction with the system, the view level
`of abstraction is defined. The system may provide many views for the
`same database.
`
`The interrelationship among these three levels of abstraction is illustrated
`in Figure 1.1.
`·
`
`
`
`Section 1.2
`
`Data Abstraction
`
`5
`
`view 1
`
`view 2
`
`view n
`
`conceptual
`level
`
`physical
`level
`
`Figure 1.1 The three levels of data abstraction.
`
`An analogy to the concept of data types in programming languages
`may clarify the distinction among levels of abstraction. Most high-level
`programming languages support the notion of a record type. For example,
`in a Pascal-like language we may declare a record as follows:
`
`type customer = record
`name : string;
`street : string;
`city : string;
`end;
`
`This defines a new record called customer with three fields. Each field has
`a name and a type associated with it. A banking enterprise may have
`several such record types, including:
`
`• account, with fields number and balance.
`• employee, with fields name and salary.
`
`At the physical level, a customer, account, or employee record can be
`described as a block of consecutive storage locations (for example, words or
`bytes). At the conceptual level, each such record is described by a type
`definition, illustrated above, and the interrelationship among these record
`types is defined. Finally, at the view level, several views of the database
`
`
`
`6
`
`Introduction
`
`Chapter 1
`
`are defined. For example, tellers in a bank see only that part of the
`database that has information on customer accounts. They cannot access
`information concerning salaries of employees.
`
`1.3 Data Models
`Underlying the structure of a database is the concept of a data model, a
`collection of conceptual tools for describing data, data relationships, data
`semantics, and consistency constraints. The various data models that have
`been proposed fall into three different groups: object-based logical models,
`record-based logical models, and physical data models.
`
`1.3.1 Object-Based Logical Models
`Object-based logical models are used in describing data at the conceptual
`and view levels. They are characterized by the fact that they provide fairly
`flexible structuring capabilities and allow data constraints to be specified
`explicitly. There are many different models, and more are likely to come.
`Some of the more widely known ones are:
`
`• The entity-relationship model.
`
`• The object-oriented model.
`• The binary model.
`• The semantic data model.
`
`• The infological model.
`• The functional data model.
`
`In this book, we examine the entity-relationship model and the object(cid:173)
`oriented model as representatives of the class of the object-based logical
`models. The entity-relationship model, explored in Chapter 2, has gained
`acceptance in database design and is widely used in practice. The object(cid:173)
`oriented model, examined in Chapter 13, includes many of the concepts of
`the entity-relationship model, but represents executable code as well as
`It
`data.
`is rapidly gaining acceptance in practice. Below are brief
`descriptions of both models.
`
`The Entity-Relationship Model
`The entity-relationship (E-R) data model is based on a perception of a real
`world which consists of a collection of basic objects called entities, and
`relationships among
`these objects. An entity
`is an object
`that
`is
`distinguishable from other objects by a specific set of attributes. For
`example, the attributes number and balance describe one particular account
`
`
`
`Section 1.3
`
`Data Models
`
`7
`
`in a bank. A relationship is an association among several entities. For
`example, a CustAcct relationship associates a customer with each account
`that she or he has. The set of all entities of the same type and relationships
`of the same type are termed an entity set and relationship set, respectively.
`In addition to entities and relationships, the E-R model represents
`certain constraints to which the contents of a database must conform. One
`important constraint is mapping cardinalities, which express the number of
`entities to which another entity can be associated via a relationship set.
`The overall logical structure of a database can be expressed graphically
`by an E-R diagram, which consists of the following components:
`
`• Rectangles, which represent entity sets.
`• Ellipses, which represent attributes.
`• Diamonds, which represent relationships among entity sets.
`• Lines, which link attributes
`to entity sets and entity sets
`relationships.
`
`to
`
`Each component is labeled with the entity or relationship it represents.
`To illustrate, consider part of a database banking system consisting of
`customers and the accounts that they have. The corresponding E-R diagram
`is shown in Figure 1.2. This example is extended in Chapter 2.
`
`The Object-Oriented Model
`Like the E-R model, the object-oriented model is based on a collection of
`objects. An object contains values. stored in instance variables within the
`object. Unlike the record-oriented models, these values are themselves
`objects. Thus, objects contain objects to an arbitrarily deep level of
`nesting. An object also contains bodies of code that operate on the object.
`These bodies of code are called methods.
`
`balance
`
`Figure 1.2 A sample E-R diagram.
`
`
`
`8
`
`Introduction
`
`Chapter 1
`
`Objects that contain the same types of values and the same methods
`are grouped together into classes. A class may be viewed as a type
`definition for objects. This combination of data and code into a type
`definition is similar to the programming language concept of abstract data
`types.
`The only way in which one object can access the data of another object
`is by invoking a method of that other object. This is called sending a
`message to the object. Thus, the call interface of the methods of an object
`defines its externally visible part. The internal part of the object -
`the
`are not visible externally. The
`instance variables and method code -
`result is two levels of data abstraction.
`To illustrate the concept, consider an object representing a bank
`account. Such an object contains instance variables number and balance,
`representing the account number and account balance.
`It contains a
`method pay-interest, which adds interest to the balance. Assume that the
`bank had been paying 6 percent interest on all accounts but now is
`changing its policy to pay 5 percent if the balance is less than $1000 or 6
`percent if the balance is $1000 or greater. Under most data models, this
`would involve changing code in one or more application programs. Under
`the object-oriented model, the only change is made within the pay-interest
`method. The external interface to the object remains unchanged.
`Unlike entities in the E-R model, each object has its own unique
`identity independent of the values
`it contains. Thus,
`two objects
`containing the same values are nevertheless distinct. The distinction
`among individual objects is maintained in the physical level through the
`assignment of distinct object identifiers.
`
`1.3.2 Record-Based Logical Models
`Record-based logical models are used in describing data at the conceptual
`and view levels. In contrast to object-based data models, they are used
`both to specify the overall logical structure of the database and to provide
`a higher-level description of the implementation.
`Record-based models are so named because the database is structured
`in fixed-format records of several types. Each record type defines a fixed
`number of fields, or attributes, and each field is usually of a fixed length.
`As we shall see in Chapter 7, the use of fixed-length records simplifies the
`physical-level implementation of the database. This is in contrast to many
`of the objert-based models in which object5-. may contain other objects to an
`arbitrary depth of nesting. The richer structure of these databases often
`leads to variable-length records at the physical level.
`Record-based data models do not include a mechanism for the direct
`Instead,
`there are separate
`representation of code in the database.
`languages that are associated with the model to express database queries
`
`
`
`Section 1.3
`
`Data Models
`
`9
`
`and updates. Some object-based models (including the object-oriented
`model) include executable code as an integral part of the data model itself.
`The three most widely accepted data models are the relational,
`network, and hierarchical models. The relational model, which has gained
`favor over the other two in recent years, is examined in detail in Chapters
`3- 6. The network and hierarchical models, still used in a large number of
`older databases, are described in the appendices. Below we present a brief
`overview of each model.
`
`Relational Model
`The relational model represents data and relationships among data by a
`collection of tables, each of which has a number of columns with unique
`names. Figure 1.3 is a sample relational database showing customers and
`the accounts they have. It shows, for example, that customer Hodges lives
`on Sidehill in Brooklyn, and has two accounts, one numbered 647 with a
`balance of $105,366, and the other numbered 801 with a balance of $10,533.
`Note that customers Shiver and Hodges share account number 647 (they
`may share a business venture).
`
`Network Model
`Data in the network model are represented by collections of records (in the
`Pascal or PUI sense) and relationships among data are represented by links,
`which can be viewed as pointers. The records in the database are
`organized as collections of arbitrary graphs. Figure 1.4 presents a sample
`network database using the same information as in Figure 1.3.
`
`name
`Lowery
`Shiver
`Shiver
`Hodges
`Hodges
`
`street
`Maple
`North
`North
`Sidehill
`Sidehill
`
`citv
`Queens
`Bronx
`Bronx
`Brooklyn
`Brooklyn
`
`number
`900
`556
`647
`801
`647
`
`number
`900
`556
`647
`801
`
`balance
`55
`100000
`105366
`10533
`
`Figure 1.3 A sample relational database.
`
`
`
`10
`
`Introduction
`
`Chapter 1
`
`Hierirchical Model
`The hierarchical model is similar to the network model in the sense that
`4ata and relationships among data are represented by records and links,
`respectively.
`It differs from the networ:k model in that the records are
`qrgani~ed as collections of trees rather than arbitrary graphs. Figure 1.5
`prese;11ts a sample hierarchical database with the same information as in
`Figure 1.4.
`
`Differences Between the Models
`The relational model differs from the network and hierarchical models in
`that it does not use pofnters ot links. In~tead, the relationai model relates
`records by the values they contain. This freedom from the use of pointers
`allows a formal mathematical foundation to be defined.
`
`1.3.3 Physical Data Models
`In
`Physical data models are used to describe data at the lowest level.
`contrast to logical d~ta rrtodels, there are very few physical data models in
`use. Two of the widely known ones are:
`
`• Unifying model.
`• Frame memory.
`
`Physical data models capture aspects of database system implementation
`that are not covered in this book.
`
`Lowery
`
`Maple
`
`Queens
`
`900
`
`55
`
`Shiver
`
`North
`
`Bronx
`
`556
`
`100 000
`
`647
`
`105 366
`
`Hodges
`
`Sidehill
`
`Brooklyn
`
`801
`
`10 533
`
`Figure 1.4 A sample network database.
`
`
`
`Section 1.4
`
`~nstances and Schemes
`
`11
`
`Lowery
`
`Maple
`
`Sidehill
`
`Brooklyn
`
`556
`
`900
`
`55
`
`647
`
`105 366
`
`801
`
`10 533
`
`Figure 1.5 A sample hierarchical database.
`
`1.4 Instances and Schemes
`Databases change over time as information is inserted and deleted. The
`collection of information stored in the database at a particul11r moment in
`time is called an instance of the database. The overall design of the
`database is called the database scheme. Schemes are changed infrequently,
`if at all.
`An analogy to the concepts of data types, variables, and values in
`programming languages is useful here. Returning to the customer record
`type definition in Section 1.2, note that in declaring the type customer, we
`have not declared any variables. To declare such variables in a Pascal-like
`language, we write:
`
`var customerl : customer;
`
`Variable customerl now corresponds to an area of storage containing a
`customer type record.
`The concept of a database scheme corresponds to the programming
`language notion of type definiti~n. A variable of a given type has a
`particular value at a given instant in time. Thus, the concept of the value
`of a variable in programming languages corresponds to the concept of an
`instance of a database scheme.
`Database systems have several schemes, partitioned according to the
`levels of abstraction discussed in Section 1.2. At the lowest level is the
`
`
`
`12
`
`Introduction
`
`Chapter 1
`
`physical scheme; at the intermediate level, the conceptual scheme; at the
`highest level, a subscheme. In general, database systems support one
`physical scheme, one conceptual scheme, and several subschemes.
`
`1.5 Data Independence
`In Section 1.2, we defined three levels of abstraction at which the database
`may be viewed. The ability to modify a scheme definition in one level
`without affecting a scheme definition in the next higher level is called data
`independence. There are two levels of data independence:
`
`• Physical data independence is the ability to modify the physical
`to be rewritten.
`scheme without causing application programs
`Modifications at the physical level are occasionally necessary in order
`to improve performance.
`• Logical data independence is the ability to modify the conceptual
`scheme without causing application programs
`to be
`rewritten.
`Modifications at the conceptual level are necessary whenever the
`logical structure of the database is altered (for example, the addition of
`money-market accounts in a banking system).
`
`Logical data independence is more difficult to achieve than physical
`data independence since application programs are heavily dependent on
`the logical structure of the data they access.
`The concept of data independence is similar in many respects to the
`concept of abstract data types in modem programming languages. Both hide
`implementation details from the users. This allows users to concentrate on
`the general structure rather than low-level implementation details.
`
`1.6 Data Definition Language
`A database scheme is specified by a set of definitions which are expressed
`by a special language called a data definition language (DDL). The result of
`compilation of DDL statements is a set of tables which are stored in a
`special file called data dictionary (or directory).
`A data directory is a file that contains metadata; that is, "data about
`data." This file is consulted before actual data is read or modified in the
`database system.
`The storage structure and access methods used oy the database system
`are specified by a set of definitions in a special type of DDL called a data
`storage and definition language. The result of compilation of these definitions
`is a set of instructions to specify the implementation details of the database
`schemes which are usually hidden from the users.
`
`
`
`Section 1.8
`
`Database Manager
`
`13
`
`1. 7 Data Manipulation Language
`The levels of abstraction we discussed in Section 1.2 apply not only to the
`definition or structuring of data but also to the manipulation of data. By
`data manipulation we mean:
`
`• The retrieval of information stored in the database.
`• The insertion of new information into the database.
`• The deletion of information from the database.
`• The modification of data stored in the database.
`
`At the physical level, we must define algorithms that allow for efficient
`access to data. At higher levels of abstraction, an emphasis is placed on
`ease of use. The goal is to provide for efficient human interaction with the
`system.
`A data manipulation language (DML) is a language that enables users to
`access or manipulate data as organized by the appropriate data model.
`There are basically two types:
`
`• Procedural DMLs require a user to specify what data is needed and how
`to get it.
`• Nonprocedural DMLS require a user to specify what data is needed
`without specifying how to get it.
`
`Nonprocedural OMLs are usually easier
`than
`learn and use
`to
`procedural DMLs. However, since a user does not have to specify how to
`get the data, these languages may generate code which is not as efficient
`as that produced by procedural languages. This difficulty can be remedied
`through various optimization techniques, some of which are discussed in
`Chapter 9.
`A query is a statement requesting the retrieval of information. The
`portion of a DML that involves information retrieval is called a query
`language. Although technically incorrect, it is common practice to use the
`terms query language and data manipulation language synonymously.
`
`1.8 Database Manager
`Databases typically require a large amount of storage space. Corporate
`databases are usually measured in terms of gigabytes or, for the largest
`databases, terabytes of data. A gigabyte is 1000 megabytes (a billion bytes),
`and a terabyte is a million megabytes (a trillion bytes). Since the main
`memory of computers cannot store this information, it is stored on disks.
`Data is moved between disk storage and main memory as needed. Since
`
`
`
`14
`
`Introduction
`
`Chapter 1
`
`the movement of data to and from disk is slow relative to the speed of the
`central processing unit, it is imperative that the database system structure
`the data so as to minimize the need to move data between disk and main
`memory.
`The goal of a database system is to simplify and facilitate access to
`data. High-level views help to achieve this. Users of the system should not
`be burdened unnecessarily with the physical details of the implementation
`of the system. Nevertheless, a major factor in a user's satisfaction or lack
`thereof with a database system is its performance. If the response time for
`a request is too long, the value of the system is diminished. The
`performance of a system depends on the efficiency of the data structures
`used to represent the data in the database and on how efficiently the
`system is able to operate on these data structures. As is the case elsewhere
`in computer systems, a tradeoff must be made not only between space and
`time but also between the efficiency of one kind of operation versus that of
`another.
`A database manager is a program module which provides the interface
`between the low-level data stored in the database and the application
`programs and queries submitted to the system. The database manager is
`responsible for the following tasks:
`
`• Interaction with the file manager. The raw data is stored on the disk
`using the file system which is usually provided by a conventional
`operating system. The database manager translates the various DML
`statements into low-level file system commands. Thus, the database
`manager is responsible for the actual storing, retrieving, and updating
`of data in the database.
`
`• Integrity enforcement. The data values stored in the database must
`satisfy certain types of consistency constraints. For example, the
`number of hours an employee may work in one week may not exceed
`some specific limit (say, 80 hours). Such a constraint must be specified
`explicitly by the database administrator (see Section 1.9). The database
`manager can then determine whether updates to the database result in
`the violation of the constraint; if so, appropriate action must be taken.
`
`• Security enforcement. As discussed above, not every database user
`needs to have access to the entire content of the database. It is the job
`of the database manager to enforce these security requirements.
`
`• Backup and recovery. A computer system