throbber
RELIABLE
`COMPUTER
`SYSTEMS
`
`DESIGN AND EVALUATION
`
`SECOND EDITION
`
`DANIEL P. SIEWlOREK
`ROBERT S. SWARZ
`
`DIGITAL PRESS
`
`IBM-Oracle 1014
`Page 1 of 157
`
`

`
`Copyright © 1992 by Digital Equipment Corporation.
`
`All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or
`
`transmitted, in any form or by any means, electronic, mechanical, photocopying, recording’, or otherwise,
`without prior written permission of the publisher¯
`
`Printed in the United States of America.
`987654321
`
`Order number EY-Hag0E-DP
`
`The Publisher offers discour~ts on bulk orders of this book. [:or information, please write:
`
`Special Sales Department
`Digital Press
`One Burlington Woods Drive
`Burlington, MA 01003
`
`Design: Outside Designs
`Production: Technical Texts
`Composition: DEKR Corporation
`Printer: Arcata/Halliday
`
`Trademark products mentioned in this book are listed on’page 890.
`
`Views expressed in this book are those of the authors, not of the pt~blisher. Digital Equipment Corporation is
`
`not responsible for any errors that may appear in thfs book.
`
`Library of Congress Cataloging-in-Publication Data
`
`Siewiorek, Daniel P.
`Reliable computer systems : design and evaluation / Daniel P.
`Siewiorek, Robert S. Swarz. -- 2nd ed.
`p. crrl.
`Roy. ed~ of: The theory and practice of reliable system design.
`Bedford, MA : Digital Press, c1982.
`Includes bibliographical references and index.
`’[SBN 1-55558-075-0
`
`1. Electronic digital computers---Reliability. 24 Fault-tolerant
`computing. I. Swarz, Robert S. II. Siewiorek, Daniel P. Theory
`¯ and practice of re|iab]e system design. Ill. "]’ifle.
`QA76.5.$537 1992
`(}04-dc:Z0 92-10671
`C~P
`
`CREDITS
`
`Figure 1-3: Eugene Foley, "The Effects of Microelectronics Revolution on Systems and Board Tesb" Computers~
`Vol. 12, No. 10 (October 1979). Copyright © 1979 tEEF. Reprinted by permission.
`
`Figure 1-6: S. Russell Craig, "incoming Inspection and Test Programs," Electronics Test (October 1980). Re-
`printed by permissiom
`
`Credits are continued on p. 885, which is considered a continuation of the copyright page,
`
`IBM-Oracle 1014
`Page 2 of 157
`
`

`
`To Karon and Lonnie
`
`A Special Remembrance:
`
`During the development of this book, a friend, co[league, and fault-tolerant pioneeF
`passed away. Dr~ Wing N. Toy documented his 37 years of experience in designing
`several generations of fault-tolerant computers for the Bell System electronic switching
`systems described in Chapter 8. We dedicate this book to Dr. Toy in the confidence
`that his writings will continue to influence designs produced by those who learn from
`these pages.
`
`IBM-Oracle 1014
`Page 3 of 157
`
`

`
`CONTENTS
`
`Preface xv
`
`I THE THEORY OF ]~ELIABLF SYSTEM DESIGN
`
`i
`
`I FUNDAMENTAL CONCEPTS 3
`
`Physical Levels in a Digital System 5
`Temporal Stages of a Digital System
`Cost of a Digital System 18
`Summary 21
`References 21
`
`2 FAULTS AND THEIR MANIFESTATIONS 22
`
`System Errors 24.
`Fault Manifestations 31
`Fault Distributions 49
`Distribution Models for Permanent Faults: The MIL-HDBK-217 Model
`Distribution Models for Intermittent and Transient Faults 6.5
`Software Fault Models 73
`Summary 76
`References .76
`Problems 77
`
`57
`
`3 RELIABILITY TECHNIQUES 79
`Steven A. Elkind and Daniel P. Siewi~rek
`
`System-Failure Response Stages 80
`Hardware Fault-Avoidance Techniques 84
`Hardware Fault-Detection Techniques 96
`Hardware Masking Redundancy Techniques
`Hardware Dynamic ReduF~dancy Techniques
`Software Reliability Techniques 201
`Summary 2i9
`Referer~ces 219
`Problems 221
`
`138
`169
`
`4 MAINTAINABILITY AND TESTING TECHNIQUES 228
`
`Specificatioa-Based Diagnosis 229
`Symptom-Based Diagnosis 260
`
`vii
`
`IBM-Oracle 1014
`Page 4 of 157
`
`

`
`viii
`
`CONTENTS
`
`Summary 268
`References 268
`Problems 269
`
`5 EVALUATION CRITERIA 271
`Stephen MeConnel and Daniel P. Siewiorek
`
`Introdt~ct[on 271
`Survey Of Eva[uation Criteria: Hardware 272
`Survey of Evaluation Criteria: Software 279
`Reliability Modeling Techniques: Combinatorial Models 285
`Examples of Cornbinatoria[ Modeling 294
`ReJ!ability and.Availability Modeling Techniques: Markov Models
`Examples of Markov Modeling 334
`Availability Modeling Techniques ~42
`Software Assistance for Modeling T~chniques 349
`Applications of Modeling Techniques to Systems Designs 356
`Summary 391
`References 391
`Problems 392
`
`305
`
`6 FINANCIAL CONSIDERATIONS 402
`
`402
`
`Fund.amenta! Concepts
`Cost,Mode]~ 408
`Summary 419
`References 419
`Problems 420
`
`II THE PRACTICE OF RELIABLE SYSTEM DESIGN
`
`423
`
`Fundamental Concepts 402
`GeneraI-PuFpose Computing 424
`High-Availability Systems 424
`Long-Life Systems 425
`Critical Computations 425
`
`7 GENERAL-PURPOSE COMPUTING 427
`
`Introduction 427
`Generic Computer 427
`DEC 430
`
`IBM 43!
`
`The DEC Case: RAMP in the VAX Family 433
`Daniel p. Siewior~k
`
`IBM-Oracle 1014
`Page 5 of 157
`
`

`
`CONTENTS
`
`ix
`
`The VAX Architecture 4~g
`
`First-Generation VAX Implementations 439
`Second-GeneraLion VAX Implementations 455
`References 484
`
`The IBM Case Part i: Reliability, AvailabifiO/, ahd 5en’iceability in IBM 308X
`and tBM 3090 Processor Complexes 485
`Daniel P. Siewiorek
`
`Technology 485
`Manufacturing 486
`Overview of the 3090 Processor Con~plex 493
`References 507 .
`
`The IBM Case Part H: Recovery Through Programming: MVS
`Recovery Management 508
`C.T. ConnolIy
`
`introduct{on 508
`RAS Objectives 509
`Overview of Recovery Management 509
`MVSiXA Hardware Error Recovery 511
`MVS/XA Serviceability Facilities 520
`Availability 522
`Summary 523
`Bibliography 523
`Reference 523
`
`8
`
`H~GH-AVAILABILITY SYSTEMS
`
`524
`
`Introduction 524
`AT&T Switching Systems
`Tandem Computers, Inc.
`Stratu~ Computers, Inc.
`References 533
`
`~2~
`
`The AT&T Case Part l: Fault-Tolerant Design of AT&T Telephone
`Switching System Processors 333
`W.N. Toy
`
`Introduction 533
`Allocation and Causes of System Downtime 534
`Duplex Architecture 535
`Fault Simulation Techniques 538
`First-Generation ESS Processors 540
`Second-Generation.Processors 544
`Third-Generation 3B20D Processor 551
`Summary 572
`References 573
`
`IBM-Oracle 1014
`Page 6 of 157
`
`

`
`CONTENTS
`
`The AT&T Case Part I l: Large-Scale Real-Time Program Retrofit Methodology in
`AT&T 5ESS® Switch 574
`L.C. Toy
`
`5ESS Switch Architecture Overview 574
`Software Replacement 576
`Summary 585
`References 586
`
`The Tandem Case: Fault Tolerance in Tandem Computer Systems 586
`Joel Bartlett, Wendy Bartlett, Richard Carr, Dave Garcia, lim Gray, Robert Horat,
`Robert ]ardine, Doug Jewett, Dan Lenoski, and Dix McGuire
`
`Hardware 588
`Processor Module Implerffentation Details 597
`Integrity $2 618
`Maintenan(:e Facilities and Practices 622
`Software 625
`operations. 647
`Summary and Conclusions 647
`References 648
`
`The Stratus Case: The Stratus Architecture
`Steven Webber
`
`,648
`
`:~tratus Solutions to Downtime 650
`Issues of Fault Tolerance 652 .
`System Architecture Overview 653
`Recovery Scenarios 664
`Architecture Tradeoffs 665
`Stratus Software 666
`Service Strategies 669
`Summary 670
`
`9 ~.ONG-LIFE SYSTEMS 671
`
`Introduction 671
`Generic Spacecraft 671
`Deep-Space Planetary Probes 676
`Other Noteworthy Spacecraft Designs
`References 679
`
`679
`
`The Galileo Case: Galileo Orbiter Fault Protection System
`Robert W. Kocsis
`
`679
`
`The Galileo Spacecraft 680
`Attitude and Articulation Control Subsystem
`Command and Data Subsystem 683
`AACSiCDS Interactions 687
`Sequences and Fault Protection 688
`
`680
`
`IBM-Oracle 1014
`Page 7 of 157
`
`

`
`CONTENTS
`
`xi
`
`Fault-Protection Design Problems and Their Resolution 689
`Summary 690
`References 690
`
`10 CRITICAL COMPUTATIONS 691
`
`Introduction 691
`C.vmp 691
`SIFT 693
`
`The C.vmp Case: A Voted Muftiprocessor 694
`Daniel P. Siewiorek, Vittal Kini, Henry Mashburn, Stephen McConnel, and Michael Tsao
`
`System Architecture 694
`Issues of Processor Synchronization 699
`Performance Measurements 702
`Operational Experiences 707
`References 709
`
`The SIFT Case: Design and Analysis of a Fault-Tolerant Computer for
`Aircraft Control 710
`John H. Wensley, Leslie Lampo~t, Jack Goldberg, Milton W. Green, Karl N. Levitt,
`P.ML Me]liar-Smith, Robert E. Shostak, and Charles B. Weinstock
`
`Motivation and Background 710
`SIFT Concept of Fault Tolerance 711
`The SIFT Hardware 719
`The Software System 723
`The Proof of Correctness 728
`Summary 733
`Appendix: Sample Special Specification 733
`References 735
`
`III A DESIGN METHODOLOGY AND EXAMPLE OF DEPENDABLE SYSTEM
`DESIGN
`737
`
`11 A DESIGN METHODOLOGY 739
`Daniel P. Siewiorek and David Johnson
`
`Introduction 739
`A Design Methodology for Dependable System Design 739
`
`The VAXft 310 Case: A Fault-Tolerant System by Digital Equipment Corporation
`William Bruckert and Thomas Bissett
`
`745
`
`Defining Design Goals and Requirements for the VAXff 310 746
`VAXff 310 Overview 747
`Details of VAXfl 310 Operation 756
`
`Summary 766
`
`IBM-Oracle 1014
`Page 8 of 157
`
`

`
`xii
`
`CONTENTS
`
`APP£NDIXES
`
`769
`
`APPENDIX A 771
`
`Error-Correcting Codes for Semiconductor Memory Applications:
`A State-of-the-Art Review 771
`C.L. Chen and M.Y. Hsiao
`
`introduction 771
`Binary Linear Block Codes 773
`SEC-DEC Codes 775
`SEC-DED-SBD Codes 778
`SBC-DBD Codes 779
`DEC-TED Codes 781
`Extended Error Correctio~ 784
`Conclusions 786
`References 786
`
`APPENDIX B 787
`
`Aflthmetic Error Codes: Cost and Effectiveness Studies for Application in Digital
`System Design 787
`Algirdas Avizienis ~
`
`i
`
`Methodology of Code Evaluation 787
`Fault Effects in Binary Arithmetic Processors 790
`Lo~c-Cost Radix-2 Arithmetic Codes 794
`Multiple Arithmetic Error Codes 799
`References 802
`
`APPENDIX C 803
`
`Design for Testability--A Survey 803
`Thomas W. Williams aad Kenneth P. Parker
`
`Introduction 803
`Design for Testability 807
`Ad-Hoc Design for Testability 808
`Structured Design for Testability 813
`Self-Testing and Built-ln Tests 821
`Conclusion 828
`References 829
`
`APPENDIX D 831
`
`Summary of MIL-HDBK-217E Reliability Model 831
`
`Failure Rate Model and Factors 831
`Reference 833
`
`IBM-Oracle 1014
`Page 9 of 157
`
`

`
`CONTENTS
`
`APPENDIX E 835
`
`Algebraic Solutions to Markov Models 835
`Jeffrey P. Hansen
`
`Solution of M~IF Models 837
`Complete Solution for Three- and Four-State Models
`Solutions to Commonly Encountered Markov Models
`References 839
`
`838
`839
`
`GLOSSARY 841
`
`REFERENCES 845
`
`CRED]TS 885
`
`TRADEMARKS 890
`
`INDEX 891
`
`IBM-Oracle 1014
`Page 10 of 157
`
`

`
`8
`
`HIGH-AVAILABILITY
`SYSTEMS
`
`INTRODUCTION
`
`Dynamic redundancy is the basic approach used in high-availability systems. These
`systems are typically composed of multiple processors with extensive error-detection
`mechanisms. When an error is dete~t~d, the computation is resumed on another
`processor. The evolution of high-availability systems is traced th rough the family history
`Of three commercial vendors: AT&T, Tandem, and Stratu’s.
`
`AT&T pioneered fault-tolerant computing in the telephone switching application.
`The two AT&T case studies given in this chapter trace the vaFiations of duplication and
`matching dgMsed for the swit~:hing systems’to detect failures and to automatically
`resume computations. The primary form of dete~tion is hardware Iock-s~ep duplication
`~nd comparison that requires ab0~t 2.5 times the hardware cost of a nonredundant
`system. Thousands of switching systems have been installed and th’ey are currently
`commercially.available in the form of the 3B20 processor. Table8-1 summarizes the
`evolution of the AT&T switching systems. It includes system characteristics such as the
`number of telephoDe lines accommodated as well as the processor model used to
`control the switching gear.
`Telephone switching systems utilize natural redundancy in the network and its
`operation to meet an aggressive availability goal of 2 hours downtime in 40 years (3
`minutes per year). Telephone users will redial if they get a wrong number or are
`disconnected. However, there~is a user aggravation level that must be avoided: users
`will redial as long as errors do not happen too ffequen.tly. User aggravation thresholds
`awe different for failure to establish a call (moderately high) and disconnection of an
`established call (very low). Thus, a telephone switching system follows a staged failure
`recovery process, a~ shown in Table 8-2.
`Figure 8-1 illustrates that the telephone switching application requires quite a
`different organization than that of ~ g~nera[-purpose computer. In particular, a sub-
`stantia[ portion of the telephone switching system complexity is in the peripheral
`hardware. As depicted in Figure 8-1, the telephone switching system is composed of
`four maior components: the transmission interface, the network, s!gnal processors,
`and the central controller. Telephone lines ~arrying analog, signals attach to the voice
`band interface frame (V]F), which samples and digitally encodes the analog signals.
`The output is pulse code modulated (P~M). The echo suppressor terminal (EST) re-
`moves echos that may have been introduced on long distance trunk lines. The PCM
`
`AT&T
`SWITCHING
`SYSTEMS
`
`524
`
`IBM-Oracle 1014
`Page 11 of 157
`
`

`
`TABLE 8-1
`
`installed AF&T
`telephone
`switching systems
`
`8, HIGH-AVAILABILITY SYSTEMS
`
`525
`
`System
`
`Number
`of Lines
`
`.Year
`Introduced
`
`Number
`Installed
`
`Processor Comments
`
`1 ESS
`
`5,000-65,000
`
`1965
`
`1,000
`
`No. 1
`
`2 ESS 1,000 10,000
`1A ESS 100,000
`
`2B ESS
`
`1,000-20~000
`
`1969
`1976
`
`1975
`
`500
`2,000
`
`No. 2
`No. 1A
`
`>500
`
`No. 3A
`
`3 ESS
`5 ESS
`
`500-5,000 1976
`1982
`1,000-85,000
`
`>500
`>1,000
`
`No. 3A
`No. 3B
`
`First proeessor with
`separate control and
`data memories
`
`Four to eight times
`faster than No. 1
`Combined control and
`data store;
`micr9coded; emu]ates
`No. 2
`
`Multipurpose processor
`
`TABLE 8-2
`Levels of recovery
`in a telephone
`switching system
`
`Phase
`
`1
`
`2
`
`3
`
`Recovery Action Effect
`
`initialize specific transient memory. Temporary storage affected; no
`calls lost
`Lose calls being established; calls
`in progress not !ost
`Lose calls being established; calls
`in progress not affected
`
`Reconfigure peripheral hardware. Initialize
`all transient memory.
`Verify memory operation, establish a
`workable processor configuration, verify
`program, configure peripheral hard.ware,
`initialize all transient memory.
`Establish a workable processor
`configuration, configure periphera!
`hardware, initialize air memoff/.
`
`ca!ls lost
`
`signals are multiplexed onto a time-slotted digital bus. The digital bus enters a time-
`space-time network. The time slot interchange (TSI) swit~;hes PCM signals to different
`time slots on the bus. The output of the TSI goes to the time multiplexed switch (TMS),
`which switches the PCM signals in a particular time slot from any bus to any other
`bus. The output of the TMS returns to the TS1, where the PCM signals may be
`interchanged to another time slot. Signals ’intended for analog lines are converted
`from PCM to analog signals in the VlF. A network clock coordinates the timing for all
`of the switching functions.
`The signal processors provide scanning and signal distribution functions, thus
`~elieving the central processor of these agtivities. The common channel interface
`signaling (CCJS) provides an independent data line between telephone switching sys-
`tems. The CCIS terminal is used to send s~pervi~ory switching information for the
`
`IBM-Oracle 1014
`Page 12 of 157
`
`

`
`526
`
`II. THE PRACTICE OF RELIABLE SYSTEM DESIGN
`
`FIGURI~ 8-1
`Diagram of a typi-
`cal teJephone
`switching system
`
`Sen~ice cir,:nits
`
`~
`
`PCM
`
`PCM
`
`Wire facilities gand
`inte~
`
`Analog carrier L~
`
`Digital carr!er
`
`Analog
`c~rrier
`signaling
`
`I signaling and Timing
`
`control ¯
`
`PU bus
`
`Peripheral unit (PU) bus
`
`I Bus ~nterface I PU bug
`
`’
`~
`
`I PUI~B
`
`Data lin~ chanriel
`
`interoffice
`s~gnaling
`
`Master
`control
`console
`MCC
`
`cC~ Program store (PS) bus
`
`Auxiliary unit (A.U) b~s
`
`[ CC
`
`Call store (CS)
`
`various trunk lines coming into the office. The entire peripheral hardware is interfaced
`to the central ~:ontrol (CC) over AC-coupled buses. A telephone switching processor
`is composed o~ the central control, which manipulates data associated with call pro-
`cessing, admirOstrative tasks, and recovery; program store; call store for storing tran-
`sient information related to the processing of telephone calls; file store disk system
`used to store backup program copies; auxiliary units magnetic tapes storage containing
`basic resta.rt programs and new software releases; input/output (I/O) interfaces to
`;~erminal deviges; and master control console used as the control and display console
`for the system. In general, a telephone switching processor could be used to control
`more than one type of telephone switching system.
`The history ofAT&T Processors is summarized in Table 8-3. Even though all the
`processors ~re based upon full dupiicatior~, it is interesting to observe the evolution
`from the tightly lock-stepped ma~ching of every machine cycle in the early processors
`to a higher dependence on self-checking and matching only on writes to memory.
`Furthermore, as the processors evOlved from dedicated, real-time controllers to mul-
`
`IBM-Oracle 1014
`Page 13 of 157
`
`

`
`Processor/
`
`Year
`
`Introduced
`
`No. 1, 1965
`
`Complexity
`
`(Gates)
`
`12,000
`
`No. 2, 1969
`
`5,000
`
`PS, CS, CC,
`buses
`
`Six internal nodes, 24
`bits per node; one
`node matched each
`machine cycle; node
`selected to be matched
`dependent on instruc-
`tion being executed
`Single match point on
`Entire
`computer call store input
`
`No. 1A, 1976
`
`50,000
`
`PS, cS, CC,
`buses
`
`16 internal nodes, 24 bits
`per node; two nodes
`matched each machine
`cycle
`
`No. 3A, 1975
`
`16,500
`
`Entire
`computer
`
`None
`
`3B2OD, 1981
`
`75,000
`
`Entire
`computer
`
`None
`
`8. HIGH-AVAILABII-[TY SYSTEMS
`
`527
`
`TABLE 8-3, Summary of AT&T Telephone Switching Processors
`
`Unit.~of
`
`Switching
`
`Matching
`
`Other Error Detection/Correction
`
`Hamming code on PS; parity on CS;
`automatic retry on CS, PS; watch-
`dog timer; sanity program to de-
`termine if reorganization led to a
`valid configuration
`
`Diagnostic programs; parity on PS;
`detection of multiword accesses
`in CS; w.atch-dog timer
`Two-parity bits on PS; roving spares
`
`(i.e., contents of PS not com-
`pletely duplicated, can be loaded
`from disk upon error detection);
`two-parity bits on CS; roving
`spares sufficient for complete du-
`plication of transient data; proces-
`sor configuration circuit to search
`automatically for a yalid configura-
`tion
`On-line processor writes into both
`stores; m-of-2m code on micro-
`store plus parity; serf-checking
`decoders; two-parity bits on regis-
`ters; duplication of ALU; watch-
`dog timer; maintenance channel
`for observability and controllabil-
`ity of the other processor; 25% of
`logic devoted to self-checking
`logic and 14% to maintenance
`acceSS
`On-line processor write into both
`stores; byte parity on data paths;
`parity checking where parity pre-
`served, duplication otherwise;
`modified Hamming code on main
`memory; maintenance channel for
`observability and controllability of
`the other processor; 30% of con-
`tro] logic devoted to self-check-
`ing; error-correction codes on
`disks; software audits, sanity
`timer, integrity monitor
`
`IBM-Oracle 1014
`Page 14 of 157
`
`

`
`TANDEM
`COMPUTERS~
`INC.
`
`THE PRACTICE OF R~-LIABLE SYSTEM DESIGN
`
`tiple-purpose processors, the operating system and software not only became more
`sophisticated but also became a dominant portion of the system design and mainte-
`nance effort.
`The part I of the AT&T case study in this chapter, by Wing Toy, sketches the
`evolution of the telephone switching system processors and focuses on the latest
`member of the family, the 3B20D. Part II of the case study, by Liane C. Toy, outlines
`the procedure used in the 5ESS for updating hardware and/or software without incur-
`ring any downtime.
`
`Over a decade after the first AT&T computer-controlled switching system was installed,
`Tandem designed a high-availability system targeted for the on-line transaction pro-
`cessing (OLTP) market. Replication of processors, memories, and disks was used not
`only to tolerate failures, but also to provide modular expansion of computing re-
`sources. Tandem was concerned about the propagation of errors, and thus developed
`a loosely coupled multiple computer architecture. While one computer acts as primary,
`the backup computer is active only to receive periodic checkpoint information. Hence,
`1.3 physical computers are required to behave as one logical fault-tolerant computer.
`Disks, of course, have to be fully replicated to’provide a complete backup copy of the
`database. This approach places a heavy burden’ upon the system and user software
`developers to guarantee correct operation no matter when or where a failure occurs.
`In particular~ the primary memory state of a computation may not be available due to
`the failure of the processors. Some feel, however, that the multiple computer structure
`is superior to a lock-step duplication approach in tolerating design errors.
`The architecture discussed in the Tandem case study, by Bartlett, Bartlett, Garcia,
`Gray, Horst, Jardine, Jewett, Lenoski, and McGuire, is the fi.rst commercially available,
`modularly expandable system designed specifically for high availability. Design objec-
`tives for the system include the following:
`
`¯ "Nonstop" operation wherein failures are detected, components are reconfigured
`out of service, and repaired components are configured back into the system
`without stopping the other system components
`¯ Fail-fast logic whereby no single hardware failure can compromise the data integ-
`rity of the system
`¯ Modular system expansion through adding more processing power, memory, and
`peripherals without impacting applications software
`
`As in the AT&T switching systems, the Tandem architecture is designed to take advan-
`tage of the OLTP application to simplify error detection and recovery. The Tandem
`architecture is composed of up to 16 computers interconnected by two message-
`oriented Dyriabuses. The hardware and soft~vare modules are designed to be fast-fail;
`that is, to rapidly detect errors and subsequent terminate processing. Software modules
`employ consistency checks and defensive programming techniques. Techniques em-
`ployed in hardware modules include the following:
`
`IBM-Oracle 1014
`Page 15 of 157
`
`

`
`8. HiGH-AVAIEABIL[TY SYSTEMS
`
`529
`
`Checksums on Dynabus messages
`Parity on data paths
`Error-correcting code memory
`Watch-dog timers
`
`AI] 1/O device controllers are dual ported for access by an alternate path in case of
`processor or I/O failure. The software builds a process-oriented system with all com-
`munications handled as messages on this hardware structure. This abstraction allows
`the blurring of the physical boundaries between processors and peripherals. Any I/O
`device or resource in the system can be accessed by a process, regardless of where
`the resource and process reside.
`Retry is extensively used to access an I/O device. Initially, hardware/firmware
`retries the access assuming a temporary fault. Next, software retries, followed by
`alternative path retry and finally alternative device retry.
`A network systems management program provides a set of operators that helps
`reduce the number of administrative errors typically encountered in complex systems.
`The Tandem Maintenance and Diagnostic System analyzes event logs to successfully
`call out failed field-replaceable units 90 percent of the time. Networking software exists
`that allows interconnection of up to 255 geographically dispersed Tandem systems.
`Tandem applications include order entry, hospital records, bank transactions, and
`library transactions.
`Data integrity is maintained through the mechanisms of i/O "process pairs"; one
`I/O process is designated as primary and the other is designated as backup. All file
`modification messages are delivered to the primary 1/O process. The primary sends a
`message with checkpoint information to the backup so that it can take over if the
`primary’s processor or access path to the I/O device fails. Files can also be duplicated
`on physically distinct devices controlled by an I/O process pair on physically distinct
`processors. All file modification messages are delivered to both 1/O processes. Thus,
`in the event of physical failure or isolation of the primary, the backup fi]e is up-to-date
`and available. "
`User applications can also utilize the process-pair mechanism. As an example of
`how process pairs work, consider the nonstop application, program A, shown in Figure
`8-2. Program A starts a backup process, A1, in another processor. There are also
`duplicate file images, one designated primary and the other backup. Program A peri-
`odically (at user~specified points) sends checkpoint information to A1. A1 is the same
`program as A, but knows that it is a backup program. A1 reads checkpoint messages
`to update its data ~rea, file status, and program counter.
`The checkpoint information is inse~ted in the corresponding memory locations of
`the backup process, as opposed to the more usual approach of updating a disk file.
`This approach permits the backup process to take over immediately in the event of
`failure without having to perform the usual recovery journaling and disk accesses
`before processing resumes.
`Program A1 loads and executes if the system reports that A’s processor is down
`(error messages sent from A’s operating system image or A’s p.~ocessor fails to respond
`
`IBM-Oracle 1014
`Page 16 of 157
`
`

`
`II. THE PRACTICE OF RELIABLE SYSTEM DESIGN
`
`FIGURE 8-2
`Shadow processor
`in Tandem
`
`Backup
`
`exists? ~ __~
`
`l, ,,~ " Data
`
`[..
`
`A!
`
`Backup
`
`~
`
`OS ~ I/0
`
`to a periodic "I’m alive" message). All file activity by A is performed on both the
`primary and backup fiie copies. When A1 starts to execute from the last checkpoints,
`it may attempt to repeat I/O operations successfully completed by A. The system file
`handler will recognize this and send A1 a successfully completed ]/O message.
`Program A periodically asks the operating system if a backup process exists. Since one
`no longer does, it can request the creation and’initialization of a copy of both the
`process and file structure.
`A major issue in the design of loosely coupled duplicated systems is how both
`copies can be kept identical in the face of errors. As an example of how consistency
`is maintained, consider the interaction of an I/O processor pair as depicted in Table
`8-4. Initially, all sequence numbers (SeqNo) are set to zero. The requester sends a
`request to the server. If the sequence number is less than the server’s local copy, a
`failure has occurred and the status of the completed operation is returned. Note that
`the requested operation is done only once. Next, the operation is performed and a
`checkpoint of the request is sent to the server backup. The disk is written, the sequence
`number incremented to one, and the results checkpointed to the server backup, which
`also increments its sequence number. The results are returned from the server to the
`requestor. Finally the results are checkpointed to the requester backup, which also
`increments its sequence number.
`Now consider failures. If either backup fails, the operation completes successfully.
`If the requester faifs after the request has been made, the server will complete the
`operation but be unable to return the result. When the requester backup becomes
`active, it will repeat the request. Since its sequence number is zero, the server test at
`step 2 will return the result without performing the operation again. Finally, if the
`server fails, the server backup either does nothing .or completes the operation using
`checkpointed information. When the requester resends the request, the new server
`(that is, the old server backup) either performs the operation or returns the saved
`results. More information on the operating system and the programming of nonstop
`applications can be found in Bartlett [1978].
`
`IBM-Oracle 1014
`Page 17 of 157
`
`

`
`8. HIGH-AVAILA.BILITY SYSTEMS
`
`531
`
`TABLE 8-4
`Sample process-
`pair transactions
`
`Step
`
`1
`
`5
`6
`
`Requester
`SeqNo = 0
`
`Requester Backup
`SeqNo - 0
`~
`
`Server
`SeqNo = 0
`
`Server Backup
`SeqNo - 0
`
`Issue
`request
`to write
`record ~
`If SeqNo <
`MySeqNo, then
`return saved status
`Otherwise, read disk,
`perform operation,
`checkpoint request.
`Write to disk
`SeqNo = 1
`checkpoint result
`Return results
`
`Checkpoint -
`~- SeqNo = I
`results
`
`Saves request
`
`Saves result
`SeqNo = 1
`
`STRATUS
`
`COMPUTERS,
`INC.
`
`Source: Bartlett, 1981; © 1981 ACM.
`
`Whereas the Tandem architecture was based upon minicomputer technology, Stratus
`entered the OLTP market five years after Tandem by harnessing microprocessors. By
`1980, the performance of microprocessor chips was beginning to rival that of minicom-
`puters, Because of the smaller form factor of microprocessor chips, it was possible to
`place two microprocessors on a single board and to compare their output pins on
`every dock cycle. Thus, the Stratus system appears to users as a conventional system
`that does not require special software for error detection and recovery. The case study
`by Steven Webber describes the Stratus approach in detail.
`The design goal for Stratus systems is continuous processing, which is defined as
`uninterrupted operation without loss of data, performance degradation, or special
`progFamm[ng. The Stratus self-checking, duplicate-and-match architecture is shown in
`Figure 8-3. A module (or computer) is composed of replicated power and backp!ane
`buses (StrataBus) into which a variety of boards can be inserted. Boards are logically
`divided into halves that drive outputs to and receive inputs from both buses. The bus
`drivers/receivers are ’duplicated and controlled independently. The logical halves are
`driven in Iockvstep by the same clock. A comparitor is used to detect any disagreements
`between the two halves of the board. Multiple failures that affect the two independent
`halves of a board could cause the module to hang as it alternated between buses
`seeking a fault-free path. Up to 32 modules can be interconnected into a system via a
`message-passing Stratus intermodule bus (SIB). Access to the SIB is by dual 14 mega-
`byte-per-second links. Systems, in return, are tied together by an X.25 packet-switched
`network.
`
`IBM-Oracle 1014
`Page 18 of 157
`
`

`
`532
`
`!1. THE PRACTICE OF RELIABLE SYSTEM DESIGN
`
`FIGURE 8-3
`The Stratus pair-
`and-spare architec-
`ture
`
`Power 0
`
`Bus
`
`A
`
`Bus B
`
`Power 1
`
`Processor
`
`B ha!f L
`A half
`
`Processor
`
`half
`
`Memory ]~
`
`--~ Memory ~-~
`
`Disk ~
`
`~-~ C°mmunicati°ns
`
`~ Communications ]~-
`
`Link
`
`~1 Link
`
`Now consider how the system in Figure 8-~3 tolerates failure. The two processor
`boards (each containing a pair of microprocessors), each self-checking modules, are
`used in a pair-aqd-spare configuration. Each board operates independently. Each half
`of ~ach board (for example, side A) received inputs from a different bus (for example,
`bus A) and drives a different bus (for example, bus A). Each bus is the wired-OR of
`urge-half of each board (for example, bus A is the wired-OR of all A board halves). The
`boards constantfy compare their two haives, and upon disagreement, the board re-
`moves itself from service, a maintenance interrupt is generated, and a red light is
`illuminated. The spare pair on the other processor board continues processing and is
`now the sole driver of both buses. The operating system executes a diagnostic on the
`failed board to determine whether the error was caused by a transient or permanent
`fault. In the case of a transient, the board is returned to service. Permanent faults are
`reported by phone to the Stratus Customer Assistance Center (CAC). TEe CAC recon-
`firms the problem, selects a replacement board of the same revision, prints installation
`instructions, and ships the board by overnight courier. The first time the user realizes
`there is a problem is when the board is delivered. The user removes the old board
`and inserts the new board without disrupting the system (that is, makes a "hot" swap).
`The new board interrupts the system, and the processor tha

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket