throbber
THE DESIGN
`OF THE
`
`OPERATING SYSTEM
`MAURICEJ. BACH
`
`In this timely new book, Maurice J. Bach traces the popularity of the
`UNIXGI> System throughout the computer Industry. The author describes the
`Internal algorithms and structures that form the basis of the operating
`system (the kernel) and their relationship to the programmer Interface.
`
`Among its key features, the book:
`
`• describes the outline of the kernel architecture
`• Introduces the system buffer cache mechanism
`• includes data structures and algorithms used Internally by the file system
`• covers the system calls that provide the user Interface to the file system
`• defines the context of a process and Investigates the Internal kernel
`primitives that manipulate the process context
`• presents the system calls that control the process context
`• describes process scheduling
`• discusses memory management, Including swapping and paging
`systems
`• outlines general driver interfaces, with specific discussion of disk drivers
`and terminal drivers
`• presents an overview of streams
`• Introduces Inter-process communication and networking, Including
`System V messages, shared memory, and semaphores
`• explains tightly coupled multiprocessor UNIXGI> systems
`• Investigates distributed UNIXGI> systems
`
`PRENTICE-HALL, INC., Englewood Cliffs, N.J. 07632
`
`' ~
`
`·ofa
`~o
`·~~
`.:z~
`o~
`~ ·
`
`3
`
`MAURICE
`J.BACH
`
`ISBN 0-13-201799-7
`
`PRENTICE-HALL SOFTWARE SERIES
`
`001
`
`

`
`ATs.T
`
`THE DESIGN OF THE UNIX®
`OPERATING SYSTEM
`
`Maurice J. Bach
`
`PRENTICE-HALL, INC., Englewood Cliffs, New Jersey 07632
`
`002
`
`

`
`Copyright © 1986 by Bell Telephone Laboratories, Incorporated.
`
`Published by Prentice-Hall, Inc.
`A division of Simon & Schuster
`Englew ood Cliffs, New Jersey 07632
`Prentice-Hall Software Series
`Brian W. Kernighan, Advisor
`Cover Design Consultant: Sarah Bach
`
`UNIX• is a registered trademark of AT&T.
`DEC, PDP, and VAX are trademarks of Digital Equipment Corp.
`Series 32000 is a t rademark of National Semiconductor Corp.
`llDAda is a registered trademark of the U.S. Government (Ada Joint Program Office).
`UNIVAC is a trademark of Sperry Corp.
`This document w as set on an AUTOLOGIC, Inc. APS-5 phototypesetter driven by the
`TROFF formatter operating under the UNIX system on an AT&T 3B20 computer.
`
`The Publisher offers discounts on this book when ordered in bulk
`quantities. For more information write:
`
`Special Sales/College Marketing
`Prentice-Hall, Inc.
`College Technical and Reference Division
`Englew ood Cliffs, New Jersey 07632
`
`The author and publisher of this book have used their best efforts in preparing
`this book. These efforts include the development, research, and testing of the
`theories and programs to determine their effectiveness. The author and
`publisher make no warranty of any kind, expressed or implied, with regard to
`these programs or the documentation contained in this book. The author and
`publisher shall not be liable in any event for incidental or consequential
`damages in connection with, or arising out of, the furnishing, performance, or
`use of these programs.
`
`All rights reserved. No part of this book may be
`reproduced, in any form or by any means,
`without permission in writing from the publisher.
`
`Printed in the United States of America
`
`10
`
`ISBN 0-13-201799-7 025
`
`Prentice-Hall International (UK) Limited, London
`Prentice-Hall of Australia Pty. Limited, Sydney
`Prentice-Hall Canada Inc., Toronto
`Prentice-Hall Hispanoamericana, S.A., Mexico
`Prentice-Hall of India Private Limited, New Delhi
`Prentice-Hall of Japan, Inc., Tokyo
`Prentice-Hall of Southeast Asia Pte. Ltd., Singapore
`Editora Prentice-Hall do Brasil, Ltda., Rio de Janeiro
`
`To my parents, for their patience and devotion,
`to my daughters, Sarah and Rachel, for their laughter,
`to my son, Joseph, who arrived after the first printing,
`and to my wife, Debby, for her love and understanding.
`
`003
`
`

`
`CONTENTS
`CONTENTS
`
`O
`PREFACE •
`
`O
`
`O
`
`O
`
`O
`
`Q
`
`U
`
`0
`
`O
`
`0
`
`xi
`xi
`
`CHAPTER 1 GENERAL OVERVIEW OF THE SYSTEM
`CHAPTER 1 GENERAL OVERVIEW OF THE SYSTEM
`
`.
`
`.
`
`.
`
`•
`O
`
`.
`
`.
`
`.
`
`U
`
`.
`
`.
`
`.
`
`0
`
`.
`.
`.
`.
`1.1 HISTORY .
`• •
`•
`.
`1.1 HISTORY
`•
`1.2 SYSTEM STRUCTURE
`1.2 SYSTEM STRUCTURE
`1.3 USER PERSPECTIVE .
`1.3 USER PERSPECTIVE
`•
`1.4 OPERATING SYSTEM SERVICES
`1.4 OPERA TING SYSTEM SER VICES
`1.5 ASSUMPTIONS ABOUT HARDWARE
`1.5 ASSUMPTIONS ABOUT HARDWARE
`106
`I
`O
`O
`I
`O
`I
`I
`I
`I
`1.6 SUMMARY
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`.
`• •
`.
`.
`
`.
`
`.
`
`.
`
`.
`
`.
`
`'
`
`004
`
`1
`
`D\-IL!-"-‘
`
`4
`
`6
`14
`14
`1-- U!
`15
`
`18
`
`004
`
`

`
`• • • • •
`CHAPTER 2 INTRODUCTION TO THE KERNEL
`2.1 ARCHITECTURE OF THE UNIX OPERATING
`SYSTEM
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`
`.
`
`.
`
`.
`
`2.2 INTRODUCTION TO SYSTEM CONCEPTS
`2.3 KERNEL DATA STRUCTURES
`
`2.4 SYSTEM ADMINISTRATION
`2.5 SUMMARY AND PREVIEW
`2.6 EXERCISES
`• • • • •
`
`CHAPTER 3 THE BUFFER CACHE
`
`•
`
`•
`
`•
`
`•
`
`• • • • •
`3.1 BUFFER HEADERS
`3.2 STRUCTURE OF THE BUFFER POOL
`• •
`3.3 SCENARIOS FOR RETRIEVAL OF A BUFFER •
`3.4 READING AND WRITING DISK BLOCKS
`3.5 ADVANTAGES AND DISADVANTAGES OF THE BUFFER
`CACHE • • •
`• • • • • •
`3.6 SUMMARY
`3.7 EXERCISES
`
`19
`
`19
`
`22
`
`34
`
`34
`
`36
`
`37
`
`38
`
`39
`
`40
`
`42
`
`53
`
`56
`
`57
`
`58
`
`•
`
`CHAPTER 5 SYSTEM CALLS FOR THE FILE SYSTEM
`5.1 OPEN •
`• • • • •
`.
`5.2 READ • • • •
`• • • • • •
`.
`5.3 WRITE
`• • •
`5.4 FILE AND RECORD LOCKING
`5.5 ADJUSTING THE POSITION OF FILE 1/0 - LSEEK
`5.6 CLOSE
`• •
`.
`5.7 FILE CREATION • • •
`5.8 CREATION OF SPECIAL FILES
`5.9 CHANGE DIRECTORY AND CHANGE ROOT
`5.10 CHANGE OWNER AND CHANGE MODE
`5.11 STAT AND FSTAT
`5.12 PIPES • • • • •
`
`5.13 DUP
`5.14 MOUNTING AND UNMOUNTING FILE SYSTEMS
`
`91
`
`92
`
`96
`
`101
`
`103
`
`103
`
`103
`
`105
`
`107
`
`109
`
`110
`
`110
`
`111
`
`117
`
`•
`
`• 119
`
`• 128
`
`132
`
`138
`
`•
`
`•
`
`• 139
`
`60
`
`61
`
`73
`
`74
`
`76
`
`CHAPTER 4 INTERNAL REPRESENTATION OF FILES
`INODES
`• • • • • • • • • •
`4.1
`4.2 STRUCTURE OF A REGULAR FILE
`• • • • • • • • 67
`4.3 DIRECTORIES • • • • • • • • • • • • •
`4.4 CONVERSION OF A PATH NAME TO AN INODE
`4.5 SUPER BLOCK • • • • • • • • •
`
`•
`
`•
`
`•
`
`•
`
`INODE ASSIGNMENT TO A NEW FILE
`4.6
`4.7 ALLOCATION OF DISK BLOCKS
`4.8 OTHER FILE TYPES
`4.9 SUMMARY
`
`4.10 EXERCISES •
`
`77
`
`84
`
`88
`
`88
`
`89
`
`yj
`
`• • •
`.
`5.15 LINK • • • • • • •
`5.16 UNLINK • • • • • • • • • •
`5.17 FILE SYSTEM ABSTRACTIONS
`• • • • • • •
`5.18 FILE SYSTEM MAINTENANCE
`5.19 SUMMARY •
`
`.
`
`.
`
`.
`
`.
`
`.
`
`5.20 EXERCISES
`
`•
`
`• 140
`
`140
`
`CHAPTER 6 THE STRUCTURE OF PROCESSES
`
`6.1 PROCESS ST ATES AND TRANSITIONS •
`6.2 LAYOUT OF SYSTEM MEMORY
`.
`• •
`
`6.3 THE CONTEXT OF A PROCESS • • • •
`6.4 SA YING THE CONTEXT OF A PROCESS
`
`•
`
`•
`
`• 146
`
`• 147
`
`•
`
`151
`
`• 159
`
`•
`
`•
`
`• 162
`
`6.5 MANIPULATION OF THE PROCESS ADDRESS
`SPACE
`.
`.
`.
`•
`.
`.
`•
`.
`.
`.
`.
`.
`
`•
`
`•
`
`•
`
`• 171
`
`6.6 SLEEP • • • • • • •
`
`.
`
`•
`
`.
`
`• • • • • •
`
`182
`
`005
`
`

`
`• • • • •
`CHAPTER 2 INTRODUCTION TO THE KERNEL
`2.1 ARCHITECTURE OF THE UNIX OPERATING
`SYSTEM
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`.
`
`.
`
`.
`
`.
`
`2.2 INTRODUCTION TO SYSTEM CONCEPTS
`2.3 KERNEL DATA STRUCTURES
`
`2.4 SYSTEM ADMINISTRATION
`2.5 SUMMARY AND PREVIEW
`2.6 EXERCISES
`• • • • •
`
`CHAPTER 3 THE BUFFER CACHE
`
`•
`
`•
`
`•
`
`•
`
`• • • • •
`3.1 BUFFER HEADERS
`3.2 STRUCTURE OF THE BUFFER POOL
`• •
`3.3 SCENARIOS FOR RETRIEVAL OF A BUFFER •
`3.4 READING AND WRITING DISK BLOCKS
`3.5 ADVANTAGES AND DISADVANTAGES OF THE BUFFER
`CACHE • • •
`• • • • • •
`3.6 SUMMARY
`3.7 EXERCISES
`
`19
`
`19
`
`22
`
`34
`
`34
`
`36
`
`37
`
`38
`
`39
`
`40
`
`42
`
`53
`
`56
`
`57
`
`58
`
`•
`
`CHAPTER 5 SYSTEM CALLS FOR THE FILE SYSTEM
`5.1 OPEN •
`• • • • •
`.
`5.2 READ • • • •
`• • • • • •
`.
`5.3 WRITE
`• • •
`5.4 FILE AND RECORD LOCKING
`5.5 ADJUSTING THE POSITION OF FILE 1/0 - LSEEK
`5.6 CLOSE
`• •
`.
`5.7 FILE CREATION • • •
`5.8 CREATION OF SPECIAL FILES
`5.9 CHANGE DIRECTORY AND CHANGE ROOT
`5.10 CHANGE OWNER AND CHANGE MODE
`5.11 STAT AND FSTAT
`5.12 PIPES • • • • •
`
`5.13 DUP
`5.14 MOUNTING AND UNMOUNTING FILE SYSTEMS
`
`91
`
`92
`
`96
`
`101
`
`103
`
`103
`
`103
`
`105
`
`107
`
`109
`
`110
`
`110
`
`111
`
`117
`
`•
`
`• 119
`
`• 128
`
`132
`
`138
`
`•
`
`•
`
`• 139
`
`60
`
`61
`
`73
`
`74
`
`76
`
`CHAPTER 4 INTERNAL REPRESENTATION OF FILES
`INODES
`• • • • • • • • • •
`4.1
`4.2 STRUCTURE OF A REGULAR FILE
`• • • • • • • • 67
`4.3 DIRECTORIES • • • • • • • • • • • • •
`4.4 CONVERSION OF A PATH NAME TO AN INODE
`4.5 SUPER BLOCK • • • • • • • • •
`
`•
`
`•
`
`•
`
`•
`
`INODE ASSIGNMENT TO A NEW FILE
`4.6
`4.7 ALLOCATION OF DISK BLOCKS
`4.8 OTHER FILE TYPES
`4.9 SUMMARY
`
`4.10 EXERCISES •
`
`77
`
`84
`
`88
`
`88
`
`89
`
`yj
`
`• • •
`.
`5.15 LINK • • • • • • •
`5.16 UNLINK • • • • • • • • • •
`5.17 FILE SYSTEM ABSTRACTIONS
`• • • • • • •
`5.18 FILE SYSTEM MAINTENANCE
`5.19 SUMMARY •
`
`.
`
`.
`
`.
`
`.
`
`.
`
`5.20 EXERCISES
`
`•
`
`• 140
`
`140
`
`CHAPTER 6 THE STRUCTURE OF PROCESSES
`
`6.1 PROCESS ST ATES AND TRANSITIONS •
`6.2 LAYOUT OF SYSTEM MEMORY
`.
`• •
`
`6.3 THE CONTEXT OF A PROCESS • • • •
`6.4 SA YING THE CONTEXT OF A PROCESS
`
`•
`
`•
`
`• 146
`
`• 147
`
`•
`
`151
`
`• 159
`
`•
`
`•
`
`• 162
`
`6.5 MANIPULATION OF THE PROCESS ADDRESS
`SPACE
`.
`.
`.
`•
`.
`.
`•
`.
`.
`.
`.
`.
`
`•
`
`•
`
`•
`
`• 171
`
`6.6 SLEEP • • • • • • •
`
`.
`
`•
`
`.
`
`• • • • • •
`
`182
`
`006
`
`

`
`6.7 SUMMARY
`
`6.8 EXERCISES
`
`. . . . . . . . . . . . . .
`
`• • • • • • •
`
`188
`. 189
`
`CHAPTER 7 PROCESS CONTROL • •
`
`7 .1 PROCESS CREATION •
`
`7.2 SIGNALS
`
`7.3 PROCESS TERMINATION
`
`7.4 AWAITING PROCESS TERMINATION
`
`•
`.
`•
`INVOKING OTHER PROGRAMS
`7.5
`7 .6 THE USER ID OF A PROCESS • • • •
`
`7.7 CHANGING THE SIZE OF A PROCESS •
`
`• • • • • • • • • • • •
`7 .8 THE SHELL
`7 .9 SYSTEM BOOT AND THE INIT PROCESS •
`
`.
`
`7.10 SUMMARY • •
`
`7.11 EXERCISES
`
`CHAPTER 8 PROCESS SCHEDULING AND TIME •
`
`•
`
`• 191
`
`• 192
`
`• • 200
`
`212
`• 213
`
`217
`•• 227
`
`• 229
`
`• • • 232
`235
`• 238
`• • 239
`
`247
`
`•• 248
`
`CHAPTER I 0 THE 1/0 SUBSYSTEM
`
`IO.I DRIVER INTERFACES
`
`10.2 DISK DRIVERS
`
`I 0.3 TERMINAL DRIVERS •
`
`I 0.4 STREAMS
`
`10.5 SUMMARY
`10.6 EXERCISES
`
`CHAPTER 11 INTERPROCESS COMMUNICATION
`
`11.1 PROCESS TRACING
`11.2 SYSTEM V IPC
`
`11.3 NETWORK COMMUNICATIONS
`11.4 SOCKETS
`
`11.5 SUMMARY •
`11.6 EXERCISES
`
`CHAPTER 12 MULTIPROCESSOR SYSTEMS •
`
`.
`
`•
`
`.
`
`•
`
`312
`
`313
`
`• 325
`• • 329
`
`• 344
`
`• 351
`• 352
`
`355
`
`356
`359
`
`382
`383
`
`388
`389
`
`391
`
`8.1 PROCESS SCHEDULING
`
`8.2 SYSTEM CALLS FOR TIME
`
`8.3 CLOCK ••
`
`8.4 SUMMARY
`8.5 EXERCISES
`
`CHAPTER 9 MEMORY MANAGEMENT POLICIES
`
`9.1 SWAPPING
`
`•
`
`.
`
`• • • • • • •
`
`.
`
`• •
`
`258
`
`260
`
`268
`
`268
`
`271
`
`• 272
`
`9.2 DEMAND PAGING
`
`• • • • 285
`
`9.3 A HYBRID SYSTEM WITH SWAPPING AND DEMAND
`PAGING . . . . . . . . . . . . . ·

`
`9.4 SUMMARY
`
`9.5 EXERCISES
`
`•
`
`12.1 PROBLEM OF MULTIPROCESSOR SYSTEMS
`12.2 SOLUTION WITH MASTER AND SLAVE
`PROCESSORS
`.
`•
`.
`.
`•
`.
`.
`.
`
`12.3 SOLUTION WITH SEMAPHORES •
`
`12.4 THE TUNIS SYSTEM
`
`• •
`
`.
`
`•
`
`12.5 PERFORMANCE LIMITATIONS
`12.6 EXERCISES
`
`CHAPTER 13 DISTRIBUTED UNIX SYSTEMS
`
`13.1 SATELLITE PROCESSORS
`
`•
`
`.
`
`•
`
`13.2 THE NEWCASTLE CONNECTION
`
`13.3 TRANSPARENT DISTRIBUTED FILE SYSTEMS
`
`392
`
`393
`• • 395
`
`410
`
`410
`
`410
`
`• • 412
`
`• 414
`
`422
`• 426
`
`307
`
`307
`
`•• 308
`
`13.4 A TRANSPARENT DISTRIBUTED MODEL WITHOUT STUB
`PROCESSES
`•
`.
`.
`• •
`.
`.
`•
`.
`• • 429
`
`viii
`
`ix
`
`007
`
`

`
`6.7 SUMMARY
`
`6.8 EXERCISES
`
`. . . . . . . . . . . . . .
`
`• • • • • • •
`
`188
`. 189
`
`CHAPTER 7 PROCESS CONTROL • •
`
`7 .1 PROCESS CREATION •
`
`7.2 SIGNALS
`
`7.3 PROCESS TERMINATION
`
`7.4 AWAITING PROCESS TERMINATION
`
`•
`.
`•
`INVOKING OTHER PROGRAMS
`7.5
`7 .6 THE USER ID OF A PROCESS • • • •
`
`7.7 CHANGING THE SIZE OF A PROCESS •
`
`• • • • • • • • • • • •
`7 .8 THE SHELL
`7 .9 SYSTEM BOOT AND THE INIT PROCESS •
`
`.
`
`7.10 SUMMARY • •
`
`7.11 EXERCISES
`
`CHAPTER 8 PROCESS SCHEDULING AND TIME •
`
`•
`
`• 191
`
`• 192
`
`• • 200
`
`212
`• 213
`
`217
`•• 227
`
`• 229
`
`• • • 232
`235
`• 238
`• • 239
`
`247
`
`•• 248
`
`CHAPTER I 0 THE 1/0 SUBSYSTEM
`
`IO.I DRIVER INTERFACES
`
`10.2 DISK DRIVERS
`
`I 0.3 TERMINAL DRIVERS •
`
`I 0.4 STREAMS
`
`10.5 SUMMARY
`10.6 EXERCISES
`
`CHAPTER 11 INTERPROCESS COMMUNICATION
`
`11.1 PROCESS TRACING
`11.2 SYSTEM V IPC
`
`11.3 NETWORK COMMUNICATIONS
`11.4 SOCKETS
`
`11.5 SUMMARY •
`11.6 EXERCISES
`
`CHAPTER 12 MULTIPROCESSOR SYSTEMS •
`
`.
`
`•
`
`.
`
`•
`
`312
`
`313
`
`• 325
`• • 329
`
`• 344
`
`• 351
`• 352
`
`355
`
`356
`359
`
`382
`383
`
`388
`389
`
`391
`
`8.1 PROCESS SCHEDULING
`
`8.2 SYSTEM CALLS FOR TIME
`
`8.3 CLOCK ••
`
`8.4 SUMMARY
`8.5 EXERCISES
`
`CHAPTER 9 MEMORY MANAGEMENT POLICIES
`
`9.1 SWAPPING
`
`•
`
`.
`
`• • • • • • •
`
`.
`
`• •
`
`258
`
`260
`
`268
`
`268
`
`271
`
`• 272
`
`9.2 DEMAND PAGING
`
`• • • • 285
`
`9.3 A HYBRID SYSTEM WITH SWAPPING AND DEMAND
`PAGING . . . . . . . . . . . . . ·

`
`9.4 SUMMARY
`
`9.5 EXERCISES
`
`•
`
`12.1 PROBLEM OF MULTIPROCESSOR SYSTEMS
`12.2 SOLUTION WITH MASTER AND SLAVE
`PROCESSORS
`.
`•
`.
`.
`•
`.
`.
`.
`
`12.3 SOLUTION WITH SEMAPHORES •
`
`12.4 THE TUNIS SYSTEM
`
`• •
`
`.
`
`•
`
`12.5 PERFORMANCE LIMITATIONS
`12.6 EXERCISES
`
`CHAPTER 13 DISTRIBUTED UNIX SYSTEMS
`
`13.1 SATELLITE PROCESSORS
`
`•
`
`.
`
`•
`
`13.2 THE NEWCASTLE CONNECTION
`
`13.3 TRANSPARENT DISTRIBUTED FILE SYSTEMS
`
`392
`
`393
`• • 395
`
`410
`
`410
`
`410
`
`• • 412
`
`• 414
`
`422
`• 426
`
`307
`
`307
`
`•• 308
`
`13.4 A TRANSPARENT DISTRIBUTED MODEL WITHOUT STUB
`PROCESSES
`•
`.
`.
`• •
`.
`.
`•
`.
`• • 429
`
`viii
`
`ix
`
`008
`
`

`
`13.5 SUMMARY .
`
`13.6 EXERCISES
`
`APPENDIX - SYSTEM CALLS
`
`•
`
`BIBLIOGRAPHY
`
`INDEX • . • •
`
`• 430
`
`• 431
`
`434
`
`454
`
`• 458
`
`PREFACE
`
`The UNIX system was first described in a 1974 paper in the Communications of
`the ACM [Thompson 74) by Ken Thompson and Dennis Ritchie. Since that time,
`it has become increasingly widespread and popular throughout the computer
`industry where more and more vendors are offering support for it on their
`machines. It is especially popular in universities where it is frequently used for
`operating systems research and case studies.
`Many books and papers have described parts of the system, among them, two
`special issues of the Bell System Technical Journal in 1978 [BSTJ 78) and 1984
`[BLTJ 84). Many books describe the user level interface, 'particularly how to use
`electronic mail, how to prepare documents, or how to u~she command interpreter
`called the shell; some books such as The UNIX Pr. gramming Environment
`[Kernighan 84) and Advanced UNIX Programming [ ochkind 85) describe the
`programming interface. This book describes the internal algorithms and structures
`that form the basis of the operating system (called the kernel) and their
`It is
`thus applicable to several
`relationship to the programmer interface.
`environments. First, it can be used as a textbook for an operating systems course
`It is most
`at either the advanced undergraduate or first-year graduate level.
`beneficial to reference the system source code when using the book, but the book
`can be read independently, too. Second, system programmers can use the book as a
`reference to gain better understanding of how the kernel works and to compare
`algorithms used in the UNIX system to algorithms used in other operating systems.
`
`x
`
`:xi
`
`009
`
`

`
`18
`
`GENERAL OVERVIEW OF THE SYSTEM
`
`and data structures or the addresses of instructions such as functions. The compiler
`generates the addresses for a virtual machine as if no other program will execute
`simultaneously on the physical machine.
`When the program is to run on the machine, the kernel allocates space in main
`memory for it, but the virtual addresses generated by the compiler need not be
`identical to the physical addresses that they occupy in the machine. The kernel
`coordinates with the machine hardware to set up a virtual to physical address
`translation that maps the compiler-generated addresses to the physical machine
`addresses. The mapping depends on the capabilities of the machine hardware, and
`the parts of UNIX systems that deal with them are therefore machine dependent.
`For example, some machines have special hardware to support demand paging.
`Chapters 6 and 9 will discuss issues of memory management and how they relate to
`hardware in more detail.
`
`1.6 SUMMARY
`This chapter has described the overall structure of the UNIX system, the
`relationship between processes running in user mode versus kernel mode, and the
`assumptions the kernel makes about the hardware. Processes execute in user mode
`or kernel mode, where they avail themselves of system services using a well-defined
`set of system calls. The system design encourages programmers to write small
`programs that do only a few operations but do them well, and then to combine the
`programs using pipes and 1/0 redirection to do more sophisticated processing.
`The system calls allow processes to do operations that are otherwise forbidden to
`them. In addition to servicing system calls, the kernel does general bookkeeping for
`the user community, controlling process scheduling, managing the storage and
`protection of processes in main memory, fielding interrupts, managing files and
`devices, and taking care of system error conditions. The UNIX system kernel
`purposely omits many functions that are part of other operating systems, providing
`a small set of system calls that allow processes to do necessary functions at user
`level. The next chapter gives a more detailed introduction to the kernel, describing
`its architecture and some basic concepts used in its implementation.
`
`2
`
`INTRODUCTION
`TO THE KERNEL
`
`The last chapter gave a high-level perspective of the UNIX system environment.
`This chapter focuses on the kernel, providing an overview of its architecture and
`outlining basic concepts and structures essential for understanding the rest of the
`book.
`
`2.1 ARCIDTECTURE OF THE UNIX OPERATING SYSTEM
`
`It has been noted (see page 239 of [Christian 83)) that the UNIX system supports
`the illusions that the file system has "places" and that processes have "life." The
`two entities, files and processes, are the two central concepts in the UNIX system
`model. Figure 2.1 gives a block diagram of the kernel, showing various modules
`and their relationships to each other. In particular, it shows the file subsystem on
`the left and the process control subsystem on the right, the two major components
`of the kernel. The diagram serves as a useful logical view of the kernel, although
`in practice the kernel deviates from the model because some modules interact with
`the internal operations of others.
`Figure 2.1 shows three levels: user, kernel, and hardware. The system call and
`library interface represent the border between user programs and the kernel
`depicted in Figure 1.1. System calls look like ordinary function calls in C
`programs, and libraries map these function calls to the primitives needed to enter
`
`19
`
`010
`
`

`
`20
`
`INTRODUCTION TO THE KERNEL
`
`2.1
`
`ARCIDTECTURE OF THE UNIX OPERA TING SYSTEM
`
`21
`
`user programs ' libraries
`
`trap., ......
`""" .'.'. · "" · · · "· · · ...... ·~
`User Level
`Kerneft::iveC - - - - - - - - - - - - -
`- - - - - - - - -
`
`system call interface
`
`process
`
`control
`
`subsystem
`
`: inter-process
`;communication
`
`scheduler
`
`memory
`
`: management
`
`file subsystem
`
`buff er cache
`
`character
`
`block
`
`device drivers
`
`hardware control
`
`Kernel Level
`------~--~--------------------------------------
`Hardware Leve1
`
`hardware
`
`Figure 2.1. Block Diagram of the System Kernel
`
`the operating system, as covered in more detail in Chapter 6. Assembly language
`programs may invoke system calls directly without a system call library, however.
`Programs frequently use other libraries such as the standard I/O library to provide
`a more sophisticated use of the system calls. The libraries are linked with the
`programs at compile time and are thus part of the user program for purposes of
`
`this discussion. An example later on will illustrate these points.
`The figure partitions the set of system calls into those that interact with the file
`subsystem and those that interact with the process control subsystem. The file
`subsystem manages files, allocating file space, administering free space, controlling
`access to files, and retrieving data for users. Processes interact with the file
`subsystem via a specific set of system calls, such as open (to open a file for reading
`or writing), close, read, write, stat (query the attributes of a file), chown (change
`the record of who owns the file), and chmod (change the access permissions of a
`file). These and others will be examined in Chapter 5.
`The file subsystem accesses file data using a buffering mechanism that regulates
`data ftow between the kernel and secondary storage devices. The buffering
`mechanism interacts with block 1/0 device drivers to initiate data transfer to and
`from the kernel. Device drivers arc the kernel modules that control the operation
`of peripheral devices. Block 110 devices are random access storage devices;
`alternatively, their device drivers make them appear to be random access storage
`devices to the rest of the system. For example, a tape driver may allow the kernel
`to treat a tape unit as a random access storage device. The file subsystem also
`interacts directly with "raw" 1/0 device drivers without the intervention of a
`buffering mechanism. Raw devices, sometimes called character devices, include all
`devices that are not block devices.
`The process control subsystem is responsible for process synchronization,
`interprocess communication, memory management, and process scheduling. The
`file subsystem and the process control subsystem interact when loading a file into
`the process subsystem reads
`memory for execution, as will be seen in Chapter 7:
`executable files into memory before executing them.
`Some of the system calls for controlling processes are fork (create a new
`process), exec (overlay the image of a program onto the running process), exit
`(finish executing a process), wait (synchronize process execution with the exit of a
`previously forked process), brk (control the size of memory allocated to a process),
`and signal (control process response to extraordinary events). Chapter 7 will
`examine these system calls and others.
`The memory management module controls the allocation of memory. If at any
`time the system does not have enough physical memory for all processes, the kernel
`moves them between main memory and secondary memory so that all processes get
`a fair chance to execute. Chapter 9 will describe two policies for managing
`memory: swapping and demand paging. The swapper process is sometimes called
`the scheduler, because it "schedules" the allocation of memory for processes and
`influences the operation of the CPU scheduler. However, this text will refer to it as
`the swapper to avoid confusion with the CPU scheduler.
`The scheduler module allocates the CPU to processes. It schedules them to run
`in turn until they voluntarily relinquish the CPU while awaiting a resource or until
`the kernel preempts them when their recent run time exceeds a time quantum. The
`scheduler then chooses the highest priority eligible process to run; the original
`process will run again when it is the highest priority eligible process available.
`
`011
`
`

`
`20
`
`INTRODUCTION TO THE KERNEL
`
`2.1
`
`ARCIDTECTURE OF THE UNIX OPERA TING SYSTEM
`
`21
`
`user programs ' libraries
`
`trap., ......
`""" .'.'. · "" · · · "· · · ...... ·~
`User Level
`Kerneft::iveC - - - - - - - - - - - - -
`- - - - - - - - -
`
`system call interface
`
`process
`
`control
`
`subsystem
`
`: inter-process
`;communication
`
`scheduler
`
`memory
`
`: management
`
`file subsystem
`
`buff er cache
`
`character
`
`block
`
`device drivers
`
`hardware control
`
`Kernel Level
`------~--~--------------------------------------
`Hardware Leve1
`
`hardware
`
`Figure 2.1. Block Diagram of the System Kernel
`
`the operating system, as covered in more detail in Chapter 6. Assembly language
`programs may invoke system calls directly without a system call library, however.
`Programs frequently use other libraries such as the standard I/O library to provide
`a more sophisticated use of the system calls. The libraries are linked with the
`programs at compile time and are thus part of the user program for purposes of
`
`this discussion. An example later on will illustrate these points.
`The figure partitions the set of system calls into those that interact with the file
`subsystem and those that interact with the process control subsystem. The file
`subsystem manages files, allocating file space, administering free space, controlling
`access to files, and retrieving data for users. Processes interact with the file
`subsystem via a specific set of system calls, such as open (to open a file for reading
`or writing), close, read, write, stat (query the attributes of a file), chown (change
`the record of who owns the file), and chmod (change the access permissions of a
`file). These and others will be examined in Chapter 5.
`The file subsystem accesses file data using a buffering mechanism that regulates
`data ftow between the kernel and secondary storage devices. The buffering
`mechanism interacts with block 1/0 device drivers to initiate data transfer to and
`from the kernel. Device drivers arc the kernel modules that control the operation
`of peripheral devices. Block 110 devices are random access storage devices;
`alternatively, their device drivers make them appear to be random access storage
`devices to the rest of the system. For example, a tape driver may allow the kernel
`to treat a tape unit as a random access storage device. The file subsystem also
`interacts directly with "raw" 1/0 device drivers without the intervention of a
`buffering mechanism. Raw devices, sometimes called character devices, include all
`devices that are not block devices.
`The process control subsystem is responsible for process synchronization,
`interprocess communication, memory management, and process scheduling. The
`file subsystem and the process control subsystem interact when loading a file into
`the process subsystem reads
`memory for execution, as will be seen in Chapter 7:
`executable files into memory before executing them.
`Some of the system calls for controlling processes are fork (create a new
`process), exec (overlay the image of a program onto the running process), exit
`(finish executing a process), wait (synchronize process execution with the exit of a
`previously forked process), brk (control the size of memory allocated to a process),
`and signal (control process response to extraordinary events). Chapter 7 will
`examine these system calls and others.
`The memory management module controls the allocation of memory. If at any
`time the system does not have enough physical memory for all processes, the kernel
`moves them between main memory and secondary memory so that all processes get
`a fair chance to execute. Chapter 9 will describe two policies for managing
`memory: swapping and demand paging. The swapper process is sometimes called
`the scheduler, because it "schedules" the allocation of memory for processes and
`influences the operation of the CPU scheduler. However, this text will refer to it as
`the swapper to avoid confusion with the CPU scheduler.
`The scheduler module allocates the CPU to processes. It schedules them to run
`in turn until they voluntarily relinquish the CPU while awaiting a resource or until
`the kernel preempts them when their recent run time exceeds a time quantum. The
`scheduler then chooses the highest priority eligible process to run; the original
`process will run again when it is the highest priority eligible process available.
`
`012
`
`

`
`22
`
`INTRODUCTION TO THE KERNEL
`
`2.2
`
`INTRODUCTION TO SYSTEM CONCEPTS
`
`23
`
`There are several forms of interprocess communication, ranging from asynchronous
`signaling of events to synchronous transmission of messages between processes.
`Finally, the hardware control is responsible for handling interrupts and for
`communicating with the machine. Devices such as disks or terminals may interrupt
`the CPU while a process is executing. If so, the kernel may resume execution of
`the interrupted process after servicing the interrupt: Interrupts are not serviced by
`special processes but by special functions in the kernel, called in the context of the
`currently running process.
`
`2.2 INTRODUCTION TO SYSTEM CONCEPrS
`
`This section gives an overview of some major kernel data structures and describes
`the function of modules shown in Figure 2.1 in more detail.
`
`2.2.1 An Overview of the F'lle Subsystem
`
`The internal representation of a file is given by an inode, which contains a
`description of the disk layout of the file data and other information such as the file
`owner, access permissions, and access times. The term inode is a contraction of the
`term index node and is commonly used in literature on the UNIX system. Every
`file has one inode, but it may have several names, all of which map into the inode.
`Each name is called a link. When a process refers to a file by name, the kernel
`parses the file name one component at a time, checks that the process has
`permission to search the directories in the path, and eventually retrieves the inode
`for the file. For example, if a process calls
`
`open ("/fs2/mjb/rje/sourcefile", 1);
`
`the kernel retrieves the inode for "/fs2/mjb/rje/sourcefile". When a process
`creates a new file, the kernel assigns it an unused inode. Inodes are stored in the
`file system, as will be seen shortly, but the kernel reads them into an in-core1 inode
`table when manipulating files.
`The kernel contains two other data structures, the file table and the user file
`descriptor table. The file table is a global kernel structure, but the user file
`descriptor table is allocated per process. When a process opens or creats a file, the
`kernel allocates an entry from each table, corresponding to the file's inode. Entries
`in the three structures - user file descriptor table, file table, and inode table -
`maintain the state of the file and the user's access to it. The file table keeps track
`of the byte off set in the file where the user's next read or write will start, and the
`
`l. The term core refers to primary memory of a machine, not to hardware technology.
`
`User
`File Descriptor
`Table
`
`File
`Table
`
`I node
`Table
`
`'
`t------1''
`
`'
`
`--- ....... -- .......
`-- :::::
`' ' '
`
`'
`
`Figure 2.2. File Descriptors, File Table, and lnode Table
`
`access rights allowed to the opening process. The user file descriptor table
`identifies all open files for a process. Figure 2.2 shows the tables and their
`relationship to each other. The kernel returns a file descriptor for the open and
`creat system calls, which is an index into the user file descriptor table. When
`executing read and write system calls, the kernel uses the file descriptor to access
`the user file descriptor table, follows pointers to the file table and inode table
`entries, and, from the inode, finds the data in the file. Chapters 4 and 5 describe
`these data structures in great detail. For now, suffice it to say that use of three
`tables allows various degrees of sharing access to a file.
`The UNIX system keeps regular files and directories on block devices such as
`tapes or disks. Because of the difference in access time between the two, few, if
`any, UNIX system installations use tapes for their file systems. In coming years,
`diskless work stations will be common, where files are located on a remote system
`and accessed via a network (see Chapter 13). For simplicity, however, the ensuing
`text assumes the use of disks. An installation may have several physical disk units,
`each containing one or more file systems. Partitioning a disk into several file
`systems makes it easier for administrators to manage the data stored there. The
`kernel deals on a logical level with file systems rather than with disks, treating each
`one as a logical device identified by a logical device number. The conversion
`between logical device (file system) addresses and physical device (disk) addresses
`is done by the disk driver. This book will use the term device to mean a logical
`device unless explicitly stated otherwise.
`A file system consists of a sequence of logical blocks, each containing 512, 1024,
`2048, or any convenient multiple of 512 bytes, depending on the system
`implementation. The size of a logical block is homogeneous within a file system but
`may vary between different file systems in a system configuration. Using large
`logical blocks increases the effective data transfer rate between disk and memory,
`
`013
`
`

`
`22
`
`INTRODUCTION TO THE KERNEL
`
`2.2
`
`INTRODUCTION TO SYSTEM CONCEPTS
`
`23
`
`There are several forms of interprocess communication, ranging from asynchronous
`s

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket