`
`PROCEEDINGS of ’rhe
`
`ELEVENTH HAWAII
`INTERNATIONAL CONFERENCE
`ON
`.
`SYSTEM SCIENCES
`
`
`
`VOLUME III
`
`SELECTED PAPERS IN
`
`MINI AND MICRO COMPUTER SYSTEMS
`
`EDITED BY
`
`BRUCE SHRIVER
`
`UNIVERSITY OF SOUTHWESTERN LOUISANA
`
`RICHARD ECKHOUSE
`
`DIGITAL EQUIPMENT CORPORATION
`RALPH H. SPRAGUE, JR.
`
`1031
`EMCVMW 1031
`EMCVMW
`
`UNIVERSITY OF HAWAII
`
`
`
`
`
`130253' 3“)
`
`TABLE OF CONTENTS
`
`ELEVENTH HAWAII
`
`INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES
`
`PART 3
`
`MINI—MICRO COMPUTERS SYSTEMS
`
`A Mathematical Model
`structure
`DuC T. Nguyen, J. w. Anderson, B. D. Shriver and R. E. Michelsen,
`University of Southwestern Louisiana
`
`for a Virtual Machine Monitor and its Supportive
`
`for Minicomputer Maintenance Evaluation and Penalty Compensation
`A Model
`Barry L. Bateman, *Jim C. Netherbe and Chadwick H. Nestman,
`Southern Illinois University and *University of Houston
`
`Optimal Design of Memory Hierarchies
`u, D. Strecker, Digital Equipment Corporation
`
`Control Structures for Mini and Micro Computers
`Charles H. Kaman, Robert M. Glorioso, Fernando C. Colon,
`Digital Equipment Corporation
`
`Run-Time Support
`The TI Pascal System:
`Edward E.Ferguson and George T. Ligler, Texas Instruments Incorporated
`
`A Fault—Tolerant Computing System
`James A. Katzman, Tandem Computers
`
`”NonStop"* Operating System
`oel F. Bartlett, Tandem Computers,
`
`Inc.
`
`A Role in Datacommunication
`Minicomputers-Microcomputers:
`U. u. Pooch and G. N. Williams, Texas A&M University
`
`A Programmable Multiplexer
`D. D. Drew, Texas A&M University and C. w. McMath, Jr
`Agency Records Control,
`Inc.
`
`'
`
`a
`
`Data Acquisition System for Laser Damage Experiments
`Thomas H. Kuckertz and Dennis H. Gill, Los Alamos Scientific Laboratory
`
`25
`
`46
`
`57
`
`69
`
`85
`
`103
`
`l18
`
`l4l
`
`l62
`
`
`
`
`
`A "NonStop"* Operating System
`
`Joel F. Bartlett
`Tandem Computers Inc.
`19333 Vallco Parkway
`Cupertino, California
`
`Copyright
`
`(C) 1977, Tandem Computers Inc..
`All Rights Reserved
`
`@3533
`
`The Tandem/16 computer system is an attempt at providing a
`general-purpose, multiple-computer system which is at least one
`order of magnitude more reliable than conventional commercial
`offerings.
`Through software abstractions a multiple-computer
`structure, desirable for failure tolerance,
`is transformed into
`something approaching a symmetric multiprocessor, desirable for
`programming ease.
`Section 1 of this paper provides an overview of
`the hardware structure.
`In section 2 are found the design goals for
`the operating system, "Guardian". Section 3 provides a bottom—up
`view of Guardian.
`The user-level
`interface is then discussed in
`section 4.
`Section 5 provides an introduction to the mechanism used
`to provide failure tolerance at the application level and to
`application structuring. Finally, section 6 contains a few comments
`on system reliability and implementation.
`
`
`
`-fl’———__?3
`
`_-E‘._—.
`
`1.
`
`INTRODUCTION
`
`(3) definition of the
`A PMS
`buses.
`hardware is found in Figure 1.
`
`1.1 Background
`
`On—line computer processing has become a
`way of life for many businesses.
`As
`they
`make the transition from manual or batch
`methods to on—line systems,
`they become
`increasingly vulnerable to computer
`failures. Whereas in a batch system the
`direct costs of a failure might simply be
`increased overtime for the operatibns
`staff,
`a failure of an on—line system
`results in immediate business losses.
`
`1.2
`
`System Overview
`
`The Tandem/l6 (1,2) was designed to
`provide a system for on—line applications
`that would be significantly more reliable
`than currently available commercial
`computer systems.
`The hardware structure
`consists of multiple processor modules
`interconnected by redundant
`interprocessor
`
`Each processor has its own power supply,
`memory, and I/O channel and is connected
`to all other processors by redundant
`interprocessor buses.
`Each I/O controller
`is redundantly powered and connected to
`two different I/O channels.
`As a result,
`any interprocessor bus failure does not
`affect the ability of a processor to
`communicate with any other processor.
`failure of an I/O channel or of a
`processor does not cause the loss of an
`I/O device. Likewise,
`the failure of a
`module (processor or
`I/O controller) does
`not disable any other module or disable
`any inter—module communication. Finally,
`certain I/O devices such as disc drives
`may be connected to two different I/O
`controllers, and disc drives may in turn
`be duplicated such that the failure of an
`I/O controller or disc drive will not
`result in loss of data.
`
`The
`
`* "Nonstop" is a trademark of Tandem Computers Inc.
`
`103
`
`
`
`
`
`
`
`q‘
`
`INTERPROCESSOR
`
`SINTERPROCESSOR
`
`CENTRAL
`
`10
`
`P 8
`
`PCENTRAL
`
`
`
`
`K DISC
`
`Hardware Structure
`Figure l
`
`The system is not a true multiprocessor
`(4), but rather a "multiple computer"
`system.
`The multiple computer approach is
`preferable for several reasons. First,
`since no module is shared by the entire
`system, it increases the system's relia-
`bility.
`Second,
`a multiple computer
`system does not require the complex
`hardware needed to handle multiple access
`paths to a common memory.
`In smaller
`systems,
`the cost of such a multiported
`memory is undesirable; and in larger
`systems, performance suffers because of
`memory access interference.
`
`On-line repair is as necessary as
`reliability in assuring system
`availability.
`The modular structure of
`the Tandem/l6 system allows processors,
`I/O controllers, or buses to be repaired
`or replaced while the rest of the system
`continues to operate. Once repaired,
`they
`may then be reintegrated into the system.
`
`The system structure allows a wide range
`of system sizes to be supported.
`As many
`as sixteen processors, each with up to
`512k bytes of memory, may be connected
`into one system.
`Each processor may also
`have up to 256 I/O devices connected to
`it. This provides for
`tremendous growth
`of application programs and processing
`loads without
`the requirement
`that
`the
`application be reimplemented on a larger
`system with a different architecture.
`
`to provide a
`the system is meant
`Finally,
`general solution to the problem of
`providing a failure-tolerant, on-line
`enhironment suitable for commercial use.
`As such,
`the system supports conventional
`programming languages and peripherals and
`is oriented toward providing large numbers
`of
`terminals with access to large data
`bases.
`
`l04
`
`2.
`
`SYSTEM DESIGN GOALS
`
`2.1
`
`Integrated Hardware/Software Design
`
`The Tandem/16 system was designed to solve
`a specific problem. This problem was not
`stated in terms of hardware and software
`requirements, but rather in terms of
`system requirements.
`The hardware and
`software designs then proceeded in tandem
`to provide a unified solution.
`The
`hardware design concerned itself with the
`contents of each module,
`their inter—
`connections to the common buses, and error
`detection and correction within modules
`and on the communication paths.
`The
`software design was given the problem of
`control;
`that is, selection of which
`modules to use and which buses to use to
`communicate with them.
`Furthermore, as
`errors are detected, it was the respons—
`ibility of
`the software to control
`recovery actions.
`
`
`
`.—_.—___.._.———..—-—
`
`2.2 Operating System Design Goals
`
`The first and foremost goal of the
`operating system, Guardian, was to provide
`a failure-tolerant system. This trans-
`lated into the following design "axiomS"=
`
`- the operating system should be able
`to remain operational after any
`single detected module or bus
`failure
`
`- the operating system should allow
`any module or bus to be repaired
`on-line and then reintegrated into
`the system.
`
`w
`
`”Eflafifigfifififi
`
`away-“Ep-ererQ-IS'UP‘D’
`nnmmmmwsnfl
`
`3'13
`
`
`
`
`
`- the operating system should be
`implemented in a reliable manner.
`Increased reliability provided by
`the hardware architecture must not
`be negated by software problems.
`
`A second set of requirements came from the
`great numbers and sizes of hardware
`configurations that are possible:
`
`Semaphore operations are
`“event“ flags.
`performed via the functions PSEM and VSEM:
`corresponding to Dijkstra's P and V
`operations.
`Semaphores may only be used
`for synchronization between processes
`within the same processor.
`They are
`typically used to control access to
`resources such as resident memory buffers,
`message control blocks, and I/O
`controllers.
`
`- the operating system should support
`all possible hardware config—
`urations,
`ranging from a two-
`processor, discless system through
`a sixteen—processor system with
`billions of bytes of disc storage.
`
`- the operating system should hide
`the physical configuration as much
`as possible such that applications
`could be written to run on a great
`variety of system configurations.
`
`3.
`
`OPERATING SYSTEM STRUCTURE
`
`the
`To satisfy these requirements,
`operating system was designed to have the
`appearance of a true multiprocessor at the
`user level.
`The design of the system was
`strongly influenced by Dijkstra's work on
`the "THE“ system (5), and Brinch Hansen's
`implementation of an operating system
`nucleus for a single—processor system (6).
`The primary abstractions are processes,
`which do work, and messages, which allow
`interprocess communication.
`
`3.1 Processes
`
`the lowest level of the system is the
`At
`basic hardware as earlier described.
`It
`provides the capability for redundant
`modules, i.e.
`I/O controllers,
`I/O
`devices, and processor modules consisting
`of a processor, memory, and a power
`supply.
`These redundant modules are in
`turn interconnected by redundant buses.
`Error detection is provided on all
`communication paths and error correction
`is provided within each processor's
`memory.
`The hardware does not concern
`itself with the selection of communication
`paths or
`the assignment of tasks to
`specific modules.
`
`The first abstraction provided is that of
`the process.
`Each processor module may
`have one or more processes residing in it.
`A process is initially created in a
`specific processor and may not execute in
`another processor.
`Each process has an
`execution priority assigned to it.
`Processor time is allocated on a strict
`priority basis to the highest priority
`ready process.
`
`Process synchronization primitives include
`"counting semaphores" and process local
`
`105
`
`When certain low-level actions such as
`device interrupts, processor power—on,
`message completion or message arrival
`occur,
`they result
`in "event" flags being
`set for the appropriate process.
`A
`process may wait for one or more events to
`occur via the function WAIT.
`The process
`is activated as soon as the first WAITed
`for event occurs.
`Events are signaled via
`the function AWAKE.
`Event signals are
`gueued using a "wake up waiting" mechanism
`so that
`they are not lost
`if the event is
`signaled when the process is not waiting
`on it.
`Like semaphores, event signals may
`not be passed between processors.
`Event
`flags are predefined for eight different
`events and may not be redefined.
`
`When a process blocks itself to wait for
`some event
`to occur or for a semaphore to
`be allocated to it, it may specifiy a
`maximum time to block.
`If the time limit
`expires and the event has not occurred or
`the resource has not been obtained,
`then
`the process will continue execution but an
`error condition will be returned to it.
`This timeout allows “watch dog“ timers to
`be easily placed on device interrupts or
`on resource allocations where a failure
`may occur.
`
`Each process in the system has a unique
`identifier or "processid" in the form:
`<cpu #,process #>, which allows it to be
`referenced on a system-wide basis. This
`leads to the next abstraction,
`the message
`system, which provides a processor—
`independent, failure—tolerant method for
`interprocess communication.
`
`3.2 Messages
`
`The message system provides five primitive
`operations which can be illustrated in the
`context of a process making a request
`to
`some server process, Figure 2.
`The
`process'
`request for service will send a
`message to the appropriate server process
`via the procedure LINK.
`The message will
`consist of parameters denoting the type of
`request and any needed data.
`The message
`will be queued for
`the server process,
`setting an event flag, and then the
`requestor process may continue executing.
`
`When the server process wishes to check
`for any messages, it calls LISTEN.
`LISTEN
`
`
`
`
`
`
`
`— MESSAGE —*
`
`SERVER
`
`REQUESTOR K2“!—
`
`
`REQUESTOR
`
`DATA COPIED
`
`
`
`
`REQUESTOR
`
`-*- RESULT COPIED
`
`
`
`SERVER
`
`
`
`XZHF
`
`
`SERVER
`
`Message System Primitive Operations
`Figure 2
`
`returns the first message queued or an
`indication that no messages are queued.
`The server process will
`then obtain a copy
`of
`the requestor's data by calling the
`procedure READLINK.
`
`the server process will process the
`Next,
`request.
`The status of
`the operation and
`any result will
`then be returned by the
`the
`WRITELINK procedure, which will signal
`requestor process via another event flag.
`Finally,
`the requester process will
`complete its end of the transaction by
`calling BREAKLINK.
`
`A communications protocol was defined for
`the interprocessor buses that would
`tolerate any single bus error during the
`execution of any message system primitive.
`This design assures that a communications
`failure will occur
`if and only if the
`sender or receiver processes or
`their
`processors fail.
`Any bus errors which
`occur during a message system operation
`will be automatically corrected in a
`manner
`transparent
`to the communicating
`processes and logged on the system
`console.
`The interprocessor buses are not
`used for communication between processes
`in the same processor, which can be done
`faster in memory. However,
`the processes
`involved in the message transfer are
`unable to detect
`this difference.
`
`The message system is designed such that
`resources needed for message transmission
`(control blocks) are obtained at
`the start
`of a message transfer request. Once LINK
`has been successfully completed, both
`processes are assured that sufficient
`resources are in hand to be able to
`
`complete the message transfer. Further-
`more, a process may reserve control blocks
`to guarantee that it will always be able
`to send messages to process a request that
`it picks up from its message queue.
`Such
`resource controls assure that deadlocks
`can be prevented in complex producer/
`consumer
`interactions,
`if the programmer
`correctly analyzes and anticipates poten-
`tial deadlocks within the application.
`
`3.3 Process-pairs
`
`with the implementation of processes and
`messages,
`the system is no longer seen as
`separate modules.
`Instead,
`the system can
`be viewed as a set of processes which may
`interact via messages in any arbitrary
`manner, as shown in Figure 3.
`
`By defining messages as the only legit-
`imate method for process-to—process
`interaction,
`interprocess communication is
`not limited by the multiple—computer
`organization of the system.
`The system
`then starts to take on the appearance of a
`true multiprocessor.
`Processor boundaries
`have been blurred, but I/O devices are
`still not accessible to all processes.
`
`System-wide access to I/O devices is
`provided by the mechanism of "process—
`pairs".
`An I/O process-pair consists of
`two cooperating processes located in two
`different processors that control a
`particular I/O device. One of the
`processes will be considered the "primary”
`and one will be considered the "backup".
`The primary process handles requests sent
`to it and controls the I/O device. When a
`request
`for an operation such as a file
`
`106
`
`PROCES
`
`open
`send
`proce
`"Che;
`prooc
`to t;
`evenE
`of E
`proo
`disc
`
`Beca
`syst
`bloc
`call
`Whil
`899%devn
`
`proc
`rev=
`tht
`dom
`the
`pro
`Err
`aut
`Pr0
`
`
`
`
`
`
`
`
`
`-—.___.—___
`
`._9-I
`
`PROCESS —_——l———-—' PROCESS
`PROCESS
`
`PRIMARY
`PROCESS
`
`
`
`BACKUP
`CHECKPOINTS-————————>~PROCESS
`
`
`PROCESS
`
`PROCESS
`
`PROCESS
`
`K DISC
`
`System Structure After the Addition
`of Processes and Messages
`Figure 3
`
`the primary will
`open or close occurs,
`send this information to the backup
`process via the message system.
`These
`"checkpoints“ assure that
`the backup
`process will have all
`information needed
`to take over control of
`the device in the
`event of an I/O channel error or
`a failure
`of
`the primary process' processor.
`A
`process-pair for a redundantly—recorded
`disc volume is illustrated in Figure 4.
`
`Because of the distributed nature of the
`system, it is not possible to provide a
`block of "driver" code that could be
`called directly to access the device.
`While potentially more efficient, such an
`approach would preclude access to every
`device in the system by every process in
`the system.
`
`The 1/0 process-pair and associated I/O
`device(s) are known by a
`logical device
`name such as "$DISC1" or by a logical
`device number rather than by the processid
`of either process.
`I/O device names are
`mapped to the appropriate processes via
`the logical device table (LDT)
`in every
`processor, which supplies two processids
`for each device.
`A message request made
`on the basis of a device name or number
`results in the message being sent
`to the
`first process in the table.
`If the
`message cannot be sent or
`if the message
`is sent
`to the backup process,
`an error
`indication will be returned.
`The
`then be
`processid entries in the LDT will
`reversed and the message resent. Note two
`things:
`first,
`the error
`recovery can be
`done in an automatic manner; and second,
`the requestor
`is not concerned with what
`process actually handled the request.
`Error
`recovery cannot always be done
`automatically.
`For example,
`the primary
`process of a pair controlling a
`line
`
`DISC
`
`|<
`
`for a Redundantly-
`Process—pair
`Recorded Disc Volume
`Figure 4
`
`to
`printer fails while handling a request
`print a
`line on a check.
`The application
`process would prefer to see the process
`failure as an error rather than have the
`request automatically retried, which might
`result
`in two checks being printed.
`
`The two primitives, processes and
`messages, blur
`the boundaries between
`processors and provide a failure-tolerant
`method for
`interprocess communication.
`By
`defining a method of grouping processes
`(process—pairs),
`a mechanism for uniform
`access to an I/O device or other
`This
`system—wide resource is provided.
`access method is independent of the
`functions performed within the processes,
`their locations, or
`their implementations.
`Within the process-pair,
`the message
`system is used to checkpoint state changes
`so that
`the backup process may take over
`in the event of a failure. This check—
`point mechanism is in turn independent of
`all other processes and messages in the
`system.
`
`The system’structure can be summarized as
`follows. Guardian is constructed of
`processes which communicate using
`messages.
`Fault
`tolerance is provided by
`duplication of components in both the
`hardware and the software. Access to I/O
`devices is provided by process-pairs
`consisting of a primary process and a
`backup process.
`The primary process must
`checkpoint state information to the backup
`process so that
`the backup may take over
`on a failure. Requests to these devices
`are routed using the logical device name
`or number
`so that
`the request
`is always
`routed to the current primary process.
`The result
`is a set of primitives and
`protocols which allow recovery and
`continued processing in spite of bus,
`
`107
`
`
`
`
`
`
`
`..——..———
`.———_,———.
`__.—.—_
`_~._._—..-
`
`I/O device
`I/O controller, or
`processor,
`Furthermore,
`these primitives
`failures.
`provide access to all system resources
`from every process in the system.
`
`3.4
`
`System Processes
`
`The next step in structuring the system
`comes in assigning functions to processes.
`As previously shown,
`I/O devices are
`controlled by process—pairs. Another
`process—pair known as the "operator" is
`present
`in the system. This pair is
`responsible for
`formatting and printing
`error messages on the system console.
`Here is an example of where Guardian has
`not followed a strict level structure.
`The operator makes requests to a terminal
`process to print the messages, yet the
`terminal process wishes to send messages
`to the operator to report
`I/O channel
`errors.
`An
`infinite cycle is prevented by
`having the terminal process not send
`messages for errors on the operator
`terminal and having I/O processes never
`wait for message completions when sending
`errors to the operator. While it may be
`preferable to prevent cycles of any type
`in system design,
`they have been allowed
`in Guardian when it can be shown that
`they
`will
`terminate.
`The ability to reserve
`message control blocks assures that no
`cycle will be blocked because of resource
`problems.
`
`I/O
`
`the file
`For example,
`in the system.
`system procedures do not do the actual
`operations.
`Instead,
`they check the
`caller's parameters, and if all
`is in
`order a message is sent
`to the appropriate
`I/O process—pair. Likewise, process
`to
`creation is seen as a procedure call
`NEWPROCESS, which does nothing but check
`the caller's parameters and then send a
`message to the system monitor process in
`the processor where the process is to be
`created.
`On the other hand, a procedure
`such as TIME which returns the current
`time of day does not send any messages.
`In either case,
`the access to system
`resources appears simply as procedure
`calls, effectively hiding the process
`structure, message system, hardware
`organization, and associated failure
`recovery mechanisms.
`
`3.6
`
`Initialization and Processor Reload
`
`System initialization starts with one
`processor being cold loaded from some disc
`on the system.
`The load file contains a
`memory image of the operating system
`resident code and data, with all system
`processes in existence and at their
`initial states.
`The system monitor
`process then creates a command interpreter
`process.
`
`memes
`
`
`
`smomoyenuanarsm
`
`=15
`
`'SMHE‘f
`«(Mrnrf'n‘tv
`Hev—P‘ffi‘f‘fh—a
`
`.._,._
`
`_-__._.—.,—_._
`
`Each processor has a "system monitor"
`process which handles such functions as
`process creation and deletion, setting
`time of day, and processor failure and
`reload cleanup operations.
`
`A memory management process is also
`resident
`in each processor. This process
`is responsible for allocating a page of
`physical memory and then sending messages
`to the appropriate disc processes to do
`in
`the actual disc I/O.
`Pages are brought
`on a demand basis and pages to overlay are
`selected on a "least recently used" basis
`over
`the entire memory of
`the processor.
`
`The choice of relatively unsophisticated
`algorithms for scheduling and memory
`management was a result of the fact that
`the system was not
`intended to be a
`general-purpose timeshare system. Rather,
`it was
`to be a system which supported
`multiple processes and terminals in an
`extremely flexible manner.
`
`3.5 Application Process Interface
`
`Above the process and communication
`structure there exists a library of
`fir0cedures which are used to access system
`resources.
`These procedures run in the
`calling process' environment and may or
`.may not send messages to other processes
`
`Guardian may be brought up even though a
`processor or peripheral device is down.
`This is possible because operating system
`disc images may be kept on multiple disc
`drives,
`I/O controllers may be accessed by
`two different processors, and the terminal
`that has the initial command interpreter
`on it is selected by using the processor's
`switch register.
`
`the system logically
`After a cold load,
`consists of one processor and any periph-
`erals attached to it. More processors and
`peripherals may be added to the system via
`the command interpreter command:
`
`:RELOAD
`
`1,$DISC
`
`read the disc image for
`This command will
`processor 1
`from the disc SDISC and send
`it over either interprocessor bus to
`processor 1. Once it is loaded, all
`processes residing in other processors in
`the system will be notified that processor
`1 is up.
`
`This command is also used to reload a
`processor after it has been repaired.
`Guardian does not differentiate between an
`initial load of a processor and a later.
`reload.
`In each case, resources are being
`logically added to the system and
`processes must be notified so that
`may make use of them.
`
`they
`
`108
`
`
`
`
`
`
`
`A
`
`The previous example of a reload message
`being sent
`to all processes is an example
`of how functions are split in Guardian.
`mechanism is provided for
`informing a
`process of a system status change.
`It may
`then take some unspecified action
`a
`(including doing nothing). Similarly,
`system power-on simply sets the PON event
`flag for all processes.
`The operating
`system kernel must only insure that the
`process structure and message system are
`correctly saved and restored.
`It is then
`the responsibility of individual processes
`to do such things as reinitialize their
`I/O controllers.
`
`4.
`
`USER—LEVEL SYSTEM INTERFACE
`
`Tools are provided for interactive program
`development using COBOL or a block—
`Str9Ctured implementation language, T/TAL.
`A £119 SYStem with facilities comparable
`t°,°§ exceedlng those offered by other
`"mldl" computer Systems allows access to
`dISC files and other I/O devices.
`Process
`creation,
`intercommunication, and
`checkpointing primitives are also
`implemented.
`
`The application process level facilities
`and the interactive program development
`tools have been heavily influenced by the
`HP 3000 (7)
`and by UNIX (8).
`
`3.7 Operating System Error Detection
`
`4.1
`
`Interactive System Access
`
`Besides the hardware-provided single error
`detection and correction on memory, and
`single error detection on the inter-
`processor and I/O buses, additional
`The
`software error checks are provided.
`first of these is the detection of a down
`processor.
`Every second, each processor
`in the system sends a special “I'm alive"
`message over each bus to all processors in
`the system.
`Every two seconds, each
`processor checks to see that it has
`received one of these messages from each
`processor.
`If a message has not been
`received,
`then it assumes that that
`processor
`is down.
`
`the operating system makes
`Additionally,
`checks on the correctness of data
`structures such as linked lists when
`operations are done on them.
`Any
`processor detecting such an error will
`halt.
`
`I/O interrupts are bracketed by a
`All
`“watch dog"
`timer such that
`the system
`will not hang up if an I/O operation does
`not complete with the expected interrupt.
`If an I/O bus error occurs then the backup
`process will
`take over control of
`the
`‘
`device using the second I/O bus.
`
`the interprocessor
`As previously noted,
`.bus protocol
`is designed to correct single
`bus errors.
`In addition to this, exten-
`sive checks are made on the control
`information received over
`the buses to
`verify that it is consistent with the
`state of
`the receiving processor.
`
`Power—fail/automatic restart to provided
`within each processor.
`A power-failure is
`detected independently by each processor
`module and as a result is not a system-
`wide, synchronous event.
`The system was
`designed to recover
`from either a complete
`system power-fail, or a transient which
`will cause some of
`the processors to
`power—fail and then immediately restart.
`
`interactive access to the
`General—purpose,
`system is provided by the command inter—
`preter, COMINT, similar in many ways to
`the Shell of UNIX. Normally a command
`interpreter is run interactively from a
`terminal, but commands may be read from
`any type of file.
`The command interpreter
`is seen by the operating system as simply
`another type of application process.
`
`Commands are read from the terminal,
`prompted by a colon ( ":" ):
`
`command / process parameters / arguments
`
`If the command is recognized, it will be
`directly executed.
`A command of this type
`is:
`
`:LOGON
`
`SOFTWARE.JOEL
`
`which is used to gain access to the
`system.
`If the command is not recognized,
`then a process will be created using the
`program file "$SYSTEM.SYSTEM.command" and
`the arguments for
`the command will be sent
`to this new process.
`The command inter—
`preter will
`then suspend itself until
`a
`message is received indicating that the
`process has stopped.
`If this process
`cannot be created,
`then an error message
`is printed.
`For example,
`the text editor
`is accessed by typing EDIT followed by any
`command string:
`
`:EDIT
`
`FILE
`
`in a process being
`This will result
`created using the program file SSYSTEM.
`SYSTEM.EDIT and the command string,
`"FILE", being sent
`to it. Also a part of
`this command string message are the names
`of
`the files that are being used for
`input
`and output by the command interpreter.
`its
`These are then used by the process for
`input and output.
`If the previous command
`was
`typed at a terminal,
`the input and
`
`109
`
`
`
`
`
`
`
`
`
`
`
`
`___._.____——___~.—__
`
`output files would be the device name of
`the terminal. Alternative names for
`the
`input and output files may be specified.
`For example:
`
`: EDIT
`
`/I N COMMANDS/
`
`will create an editor process and pass it
`the file name "COMMANDS“
`for the input
`file and the terminal's file name,
`the
`default, for the output file. Finally,
`the processor
`to use and the priority at
`which to run the process may also be
`specified:
`
`:EDIT /PRI 100, CPU 3/
`
`This will create an editor process in
`processor three with a priority of 100.
`
`Additional features allow multiple
`processes to be started from one command
`interpreter and allow the previously typed
`command line to be edited.
`
`¢
`
`call to the procedure NEWPROCESS.
`Parameters supplied include the name of
`the file holding the object code for the
`process,
`the processor number
`to use, and
`the priority at which to run the process.
`The parameters will be checked and then
`sent
`to the system monitor process in the
`appropriate processor.
`The system monitor
`will
`then create the process and return a
`"creationid" identifying the new process
`to the calling process.
`Part of this
`value is the processid previously defined,
`and the rest is the value of the processor
`clock at
`the time of process creation.
`The clock is kept as a 48 bit value which
`is the number of lOms
`intervals since 12
`a.m. on December 31, 1975, which assures
`that creationid's will be unique over
`the
`life of the system.
`
`Processes are not grouped in classical
`ancestry trees.
`No process is considered
`subservient
`to any other process on the
`basis of parentage.
`Two processes, one
`created by the other, will be treated as
`equals by the system. When a process, A,
`creates another process, B, no record of B
`is attached to A.
`The only record kept
`is
`in process B where the creationid of A is
`saved. This creationid is known as B's
`"mom". when process B stops, process A is
`sent a stop message indicating that
`process B no longer exists.
`A process's
`mom is flexible and a process may adopt
`another process.
`For example,
`(Figure 5),
`process A creates process B.
`Process B in
`turn creates a cooperating process, C.
`Since C would like to know if B stops, C
`will adopt B.
`
`A process may stop itself or some other
`process by calling STOP.
`Process deletion
`is again a function of the system monitor
`process. Resources will be released and a
`stop message will be sent
`to the process'
`mom.
`If the mom process does not exist,
`then no message will be sent.
`
`4.5 Application Process—pairs
`
`introduced
`The process—pair concept
`earlier is a powerful method for making
`some resource available to all processes
`in the system in a fault-tolerant manner.
`It is extended to the application
`processes as follows. When a process is
`created via NEWPROCESS,
`a process-pair
`name may be supplied.
`The creationid
`returned for this process consists of the
`processid and the process name rather than
`the processor clock value.
`For example,
`(Figure 6), process A wishes to create a
`process with the name "$SPOOL". Once B
`has been created, any process in the
`system may send a message to that process
`via the name "$SPOOL".
`
`
`
`4.2
`
`Programming Languages
`
`Compilers have been implemented for two
`languages, T/TAL, and ANSI 74 COBOL.
`T/TAL is a block—structured imple—
`mentation language.
`Its capabilities are
`similar to those offered by C on UNIX or
`SPL on the HP3000. All Tandem software is
`written in T/TAL as are most user
`applications.
`
`Code generated by either compiler may be
`shared by multiple processes in the same
`processor. Both compilers generate an
`object file which may be immediately run
`without any intervening link edit
`operation. However,
`the object file also
`contains enough information so that an
`object editor, UPDATE, may combine the
`objects produced by several compilations
`or selectively replace procedures in an
`object file.
`
`4.3 Tools
`
`tools include an
`Program development
`interactive text editor, object file
`editor,
`text formatter, and interactive
`debugger.
`A screen generation program and
`access routines are provided to facilitate
`application interaction with page mode CRT
`terminals. File utilities exist which
`allow file backup and restore, file
`copying and dumping, and initial loading
`of key—sequenced files.
`A peripheral
`utility is provided to do such operations
`as disc formatting, disc track sparing,
`and mounting or demounting disc volumes.
`
`4.4
`
`Process Creation and Deletion
`
`-,Processes are created by the command
`; interpreter or by an application process
`
`110
`
`
`
`
`
`
`
`
` A‘ CREATES B: A <—~ MOM B
`
`C
`MOM
`B CREATES C:
`A <
`WM
`B <
`C ”ADOPTS“ B:
`A
`B <— MOM —> C
`
`Flexible Process Relationships
`Figure 5
`
`A
`
`<~—-—
`
`ANCE