throbber

`
`
`
`RELIABLE
`
`COMPUTER
`SYSTEMS
`
`DESIGN AND EVALUATION
`
`SECOND EDITION
`
`DANIEL P. SIEWIOREK
`
`ROBERT s. SWARZ
`
`DIGITAL PRESS
`
`1
`1
`
`VMWARE, INC. 1014
`
`VMWARE, INC. 1014
`
`

`

`
`
`Copyright
`
`1992 by Digitai Equipment Corporation.
`
`All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or
`transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise,
`without prior written permission of the publisher.
`
`Printed in the United States of America.
`9 B 7 6 5 4 3 2 ‘1
`
`Order number EY—HBBDE-DP
`
`The Publisher offers discounts on bulk orders of this book. For information, please write:
`Special Sales Department
`Digital Press
`One Burlington Woods Drive
`Buriington, MA 01003
`
`Design: Outside Designs
`?roduction: Technical Texts
`Composition: DEKR Corporation
`Printer: Arcatai'Halliday
`
`I
`Trademark products mentioned in this book are listed on'page 890.
`
`Views expressed in this book are those of the authors, not of the publisher. Digital Equipment Corporation is
`not responsible for any errors that may appear in this book.
`
`Library of Congress Cataloging-En-Publication Data
`
`Siewiorek, Daniel P.
`Reliable computer systems : design and evaluation 1 Daniel P.
`Siewiorek, Robert S. Swarz. —- 2nd ed.
`p.
`cm.
`Rev. ed. of: The theory and practice of reiiable system design.
`Bedford, MA : Digital Press, c1982.
`Includes bibliographical references and index.
`'ISBN 1-5553-1175-0
`1. Electronic digital computers—Reliability.
`2. Fault-toierant
`computing.
`I. Swarz, Robert 5.
`II. Siewiorek, Danie} P. Theory
`.and practice of reiiabie system design.
`JII. Title.
`QA76.5.55377 1992
`004-dc20
`
`92-10671
`CIP
`
`CREDITS
`
` ,
`
`‘
`l3
`
`
`
`.
`9'
`g
`a
`’
`3*
`
`
`
`
`
`
`E
`
`;
`3
`
`‘
`
`Figure 1—3: Eugene Foley, "The Effects of Microelectronics Revolution on Systems and Board Test,” Computers,
`Vol. 12, No. 10 (October 1979). Copyright © 1979 JEEE. Reprinted by permission.
`Figure 1—6: S. Russell Craig, ”Incoming Inspection and Test Programs,” Electronics Test (October1980), Re-
`printed by permission.
`
`Credits are continued on p. 885, which is considered a continuation of the copyright page.
`
`
`
`2
`
`

`

`3‘:
`l?
`
`ii
`
`
`
`
`
`
`To Karon and Lonnie
`
`A Special Remembrance:
`
`During the deveiopment of this book, a friend, colleague, and fault-toierant pioneer
`passed away. Dr. Wing N. Toy documented his 37 years of experience in designing
`severai generations of fault—tolerant computers for the Bell System electronic switching
`systems described in Chapter 8. We dedicate this book to Dr. Toy in the confidence
`that his writings will continue to influence designs produeed by those who learn from
`these pages.
`
`
`
`
`
`
`
`3
`
`

`

`
`
`CONTENTS
`
`Preface
`
`xv
`
`I THE THEORY OF RELIABLE SYSTEM DESIGN
`
`1
`
`I
`
`FUNDAMENTAL CONCEPTS 3
`
`Physical Levels in a Digital System 5
`Temporal Stages of a Digital System 6
`Cost of a Digital System 18
`Summary 21
`References
`21
`
`2
`
`FAULTS AND THEIR MANIFESTATIONS
`
`22
`
`24‘
`System Errors
`31
`Fault Manifestations
`Fault Distributions 49
`
`Distribution Models for Permanent Faults: The MIL-HDBK-217 Model
`Distribution Models for intermittent and Transient Faults
`65
`Software Fault Modeis
`73
`
`57
`
`Summary 76
`References 76
`Problems
`77
`
`79
`3 RELIABILITY TECHNIQUES
`Steven A. Elkind and Daniel P. Siewiorek
`
`80
`System—Failure Response Stages
`84
`Hardware Fault-Avoidance Techniques
`96
`Hardware Fault-Detection Techniques
`Hardware Masking Redundancy Techniques
`Hardware Dynamic Redundancy Techniques
`Software Reliability Techniques
`201
`Summary 219
`References
`219
`Problems
`221
`
`138
`169
`
`4 MAINTAINABILITY AND TESTING TECHNIQUES 228
`
`Specification-Based Diagnosis
`Symptom-Based Diagnosis
`260
`
`229
`
`,.
`t;
`I”
`
`
`
`
`
`
`vii
`
`
`
`4
`
`

`

`
`
`
`397W1$¥W€memfiwy‘aymrmmmx—uma—._.
`
`
`
`
`
`
`
`CONTENTS
`
`Summary 268
`References
`268
`Problems
`269
`
`'5 EVALUATION CRITERIA 271
`Stephen McConneI‘and Daniel P. Siewiorek
`Introduction 271
`
`272
`Survey of Evaluation Criteria: Hardware
`279
`Survey of Evaluation Criteria: Software
`Reliability Modeling Techniques; Combinatorial Models
`Examples of Combinatorial Modeling 294
`Reliability and-Availability Modeling Techniques: Markov Models
`Examples of Markov Modeling 334
`‘
`Availability Modeling Techniques
`342
`349
`Software Assistance for Modeling Techniques
`Applications of Modeling Techniques to Systems Designs
`Summary 391
`7
`'
`References
`391
`
`235
`
`356
`
`Problems
`
`392
`
`6
`
`FINANCIAL CONSIDERATIONS 402
`
`Fundamental Concepts
`Cost‘Models
`408
`Summary 419
`References
`41 9
`
`402
`
`Problems
`
`420
`
`II THE PRACTICE OF RELIABLE SYSTEM DESIGN
`
`423
`
`402-
`Fundamental Concepts
`GeneraI-Purpose Computing 424
`High-Availability Systems
`.424
`Long-Life Systems 425
`Critical Computations
`
`425
`
`7 GENERAL-9URPOSE COMPUTING 427
`
`Introduction 427
`
`Generic Computer 427
`DEC .430
`IBM 431,
`
`The DEC CaSE: RAMP in the VAX Family 433
`Daniel P. Siewiorek
`
`5
`
`305
`
`
`
`

`

`
`CONTENTS
`
`ix
`
`The VAX Architecture 433
`
`439
`First-Generation VAX Implementations
`Second-Generation VAX Implementations
`References
`484
`
`455
`
`The lBM Case Part I; Reliability, Availability, and Serviceability in {BM 303x
`and {BM 3090 ProcesSOr Complexes
`485
`Daniel P. Siewiorek
`
`Technology 485'
`Manufacturing 486
`Overview of the 3090 Processor Complex 493
`References
`507
`
`The IBM Case Part ll: Recovery Through Programming: MVS
`Recovery Management
`508
`CT. Connolly
`
`Introduction 508
`RAS Objectives
`509
`509
`Overview of Recovery Management
`MVS/XA Hardware Error Recovery 511
`MVS/XA Serviceability Facilities
`520
`Availability 522
`Summary 523
`Bibliography 523
`Reference
`523
`
`8 HIGH-AVAILABILITY SYSTEMS
`
`524
`
`Introduction 524
`
`AT&T Switching Systems 524
`Tandem Computers, Inc.
`528
`Stratus Computers, Inc.
`531
`References
`533
`
`The AT&T Case Part 1: Fault—Tolerant Design of AT&T Telephone
`Switching System Processors
`533
`W.N. Toy
`Introduction 533
`
`Allocation and Causes of System Downtime
`Duplex Architecture
`535
`538
`Fault Simulation Techniques
`First-Generation ESS Protessors
`
`540
`
`534
`
`544
`Second-GenerationProcessors
`Third-Generation 38200 Processor
`
`551
`
`Summary 572
`References
`573
`
`
`
`
`
`
`
`6
`
`

`

`
`
`
`3=
`
`i
`
`ii
`
`i
`J
`J
`
`3
`i!
`l
`
`3
`i
`I
`y
`ii
`
`1
`
`x
`
`CONTENTS
`
`The AT&T Case Part ii: Large-Scale Real-Time Program Retrofit Methodology in
`AT&T SESS® Switch 574
`LC. Toy
`‘
`
`BESS Switch Architecture Overview 574
`Software Replacement 576
`S um mary 585
`Refe rerices
`586
`
`586
`The Tandem Case: Fault Tolerance in Tandem Computer Systems
`Joe! Bartlett, Wendy Bartlett, Richard Carr, Dave Garcia, Jim Gray, Robert Horst,
`Robert Jardine, Doug Jewett, Dan Lenoski, and Dix McGuire
`Hardware
`588
`
`Processdr Module Implementation Details
`Integrity 52
`613
`Maintenance Facilities and Practices
`Software
`625
`
`622
`
`597
`
`647
`Operations
`Summary and Conclusions
`References
`648
`
`647
`
`‘
`
`The Stratus Case: The Stratus Architecture 648
`\
`Steven Webber
`
`650
`
`Stratus Solutions to Downtime
`issues of Fault Tolerance
`652 .
`' System Architecture Overview 653
`Recovery Scenarios
`664
`Architecture Tradeoffs
`665
`Stratus Software
`666
`
`'
`
`Service Strategies
`Summary 670
`
`669
`
`9 LONG-LIFE SYSTEMS
`
`671
`
`Introduction 67’]
`67‘]
`Generic Spacecraft
`676
`Deep~$pace Planetary Probes
`Other Noteworthy Spacecraft Designs
`References
`679
`
`679
`
`_
`
`The Galileo Case: Galileo Orbiter Fault Protection System 679
`Robert W. Kocsis
`
`680
`The Galileo Spacecraft
`Attitude and Articulation Control Subsystem 680
`Command and Data Subsystem 683
`AACS/CDS Interactions
`667
`
`Sequences and Fault Protection 688
`
`
`
`7
`
`

`

`
`
`CONTENTS
`
`xi
`
`
`
`
`
`
`
`.4.swag-mrwnafdmswerwwwrim-fie
`
`
`
`
`Fault-Protection Design Problems and Their Resoiution
`Summary 690
`References
`690
`
`689
`
`10 CRITICAL COMPUTATIONS 691
`
`Introduction 691
`
`C.vmp 691
`SIFT 693
`
`The C.vmp Case: A Voted Multiprocessor 694
`Daniel P. Siewiorek, Vittal Kini, Henry Mashburn, Stephen McConnel, and Michael Tsao
`
`694
`System Architecture
`Issues of Processor Synchronization 699
`Performance Measurements
`702
`
`Operational Experiences
`References
`709
`
`707
`
`The SIFT Case: Design and Analysis of a Fault-Tolerant Computer for
`Aircraft Control 710
`
`John H. Wensley, Leslie Lamport, Jack Goldberg, Milton W. Green, Karl N. Levitt,
`P.M. Melliar-Smith, Robert E. Shostak, and Charles B. Weinstock
`
`710
`Motivation and Background
`SIFT Concept of Fault Tolerance
`The SIFT Hardware
`719
`
`711
`
`The Software System 723
`The Proof of Correctness
`
`728
`
`Summary 733
`Appendix: Sample Special Specification
`References
`735
`
`733
`
`III A DESIGN METHODOLOGY AND EXAMPLE OF DEPENDABLE SYSTEM
`DESIGN
`737
`
`11
`
`A DESIGN METHODOLOGY 739
`Daniel P. Siewiorek and David Johnson
`
`Introduction 739
`
`A Design Methodology for Dependable System Design
`
`739
`
`The VAXft 310 Case: A Fault-Tolerant System by Digital Equipment Corporation 745
`William Bruckert and Thomas Bissett
`
`Defining Design Goals and Requirements for the VAXft 310 746
`VAXft 310 Overview 747
`
`Details of VAXft 310 Operation 756
`Summary 766
`
`
`
`
`
`
`
`8
`
`

`

`
`
`
`
`
`l
`
`l
`
`l
`ii
`
`is
`
`
`
`
`‘
`
`3i.‘
`
`xii
`
`CONTENTS
`
`APPENDIXES
`
`769
`
`APPENDIX A 771
`
`Error-Correcting Codes for Semiconductor Memory Applications:
`A State—of—the-Art Review 771
`CL. Chen and M.Y. Hsiao
`
`Introduction 771
`
`Binary Linear Block Codes
`
`773
`
`‘
`
`778
`
`775
`SEC-DEC Codes
`SEC-DED-SBD Codes
`SBC-DBD Codes
`779
`DEC-TED Codes
`781
`Extended Error Correction 784
`Conclusions
`786
`References
`786
`
`APPENDIX B
`
`787
`
`Arithmetic Error Codes: Cost and Effectiveness Studies for Application in Digital
`\
`System Design 787
`Algirdas Avizienis
`
`787
`Methodology of Code Evaluation
`Fault Effects in Binary Arithmetic Processors
`Low-Cost Radix-2 Arithmetic Codes
`794
`Multiple Arithmetic Error Codes
`799
`References
`802
`
`790
`
`APPENDIX C 803
`
`Design for Testabiiity—A Survey 803
`Thomas W. Williams and Kenneth P. Parker
`introduction 803
`
`Design for Testability 807
`Ad-Hoc Design for Testability 808
`Structured Design for Testability 813
`Self~Testing and Built-In Tests
`821
`Conclusion 828
`References
`829
`
`APPENDIX D 83‘]
`
`Summary of MiL-HDBK—277E Reliability Model
`Failure Rate Model and Factors
`831
`Reference
`833
`
`831
`
`
`
`9
`
`

`

`CONTENTS
`
`'
`
`xiii
`
`GLOSSARY 841
`
`REFERENCES
`
`845
`
`CREDITS
`
`885
`
`TRADEMARKS
`
`890
`
`INDEX 891
`
`APPENDIX E
`
`835
`
`Algebraic Solutions to Markov Models
`Jeffrey P. Hansen
`Solution of MTTF Models
`
`837
`
`835
`
`CompEete Solution for Three- and Four-State Modeis
`Solutions to Commonly Encountered Markov Models
`References
`839
`
`
`
`838
`839
`
`
`
`-.._x.y,z,_gn,.5
`
`menu.55
`H,
`
`
`
`10
`10
`
`

`

`
`
`
`
`
`
`HIGH-AVAILABILITY
`
`SYSTEMS
`
`INTRODUCTION
`
`Dynamic redundancy is the basic approach used in high-availability systems. These
`systems are typically composed of multiple processors with extensive error—detection
`mechanisms. When an error is detefited, the computation is resumed on another
`processor. The evolution of high-availability systems is traced th rough the family history
`'of three commercial vendors: AT&T, Tandem, and Stratus.
`
`AT&T pioneered fault-tolerant computing in the telephone switching application
`The two AT&T case studies given in this chapter trace the variations of duplication and
`matching devised for the switching systems to detect failures and to automatically
`resume computations. The primary form of detectionIs hardware lock——step duplication
`and comparison that requires about 2.5 times the hardware cost of a nonredundant
`system. Thousands of switching systems have been installed and they are currently
`commerciallyavailable in the form of the 3320 processor Table 8—1 summarizes the
`evolution of the AT&T switching systems it includes system characteristics such as the
`number of telephone lines accommodated as well as the processor model used to
`control the switching gear.
`Telephone switching systems utilize natural redundancy in the network and its
`operation to meet an aggressive availability goal of 2 hours downtime in 40 years (3
`minutes per year}. Telephone users will redial
`if they get a wrong number or are
`disconnected. However, thereIs a user aggravation level that must be avoided. users
`will redial as long as errors do not happen too frequently. User aggravation thresholds
`are different for failure to establish a call (moderately high) and disconnection of an
`established call (very low}. Thus, a telephone switching system follows a staged failure
`recovery process, as shown in Table 8—2.
`Figure 8—1
`illustrates that the telephone switching application requires quite a
`different organization than that of a generai- purpose computer.
`in particular, a sub-
`stantial portion of the telephone switching system complexity is in the peripheral
`hardware. As depicted in Figure 8—1, the telephone switching system is composed of
`four major components: the transmission interface, the network, signal processors,
`and the central controller. Telephone lines carrying analog, signals attach to the voice
`band interface frame (VlF), which samples and digitally encodes the analog signals.
`The output is pulse code modulated (PCM). The echo suppressor terminal (EST) re-
`moves echos that may have been introduced on long distance trunk lines. The PCM
`
`AT&T
`
`SWITCHING.
`SYSTEMS
`
`524
`
`11
`11
`
`

`

`6. HIGH-AVAILABILITY SYSTEMS
`
`525
`
`
`
`W N
`
`umber
`
`System
`
`of Lines
`
`Year
`Number
`
`Introduced
`Installed
`Processor
`
`Comments
`
`1 E55
`
`5,000—65,000
`
`1965
`
`1,000
`
`No. 1
`
`2 E55
`1A ESS
`
`1000710000
`100,000
`
`213 E85
`
`LOGO—20,000
`
`1969
`1976
`
`1975
`
`'
`
`500
`2,000
`
`No. 2
`No. 1A
`
`>500
`
`N0. 3A
`
`First processor with
`separate control and
`data memories
`
`Four to eight times
`faster than No. 1
`
`Combined control and
`data store;
`-
`microcoded; emulates
`No. 2
`
`3 E35
`5 E55 '
`
`SUD—5,000
`1,000a85,000
`
`1976
`>500
`No. 3A
`
`1982
`>1 ,000
`No. BB
`Multipurpose processor
`
`
`
`1
`
`2
`
`3
`
`4
`
`Initialize specific transient memory.
`
`Temporary storage affected; no
`calls lost
`
`Reconfigure peripheral hardware. Initialize
`all transient memory.
`Verify memory operationr establisha
`workable processor configuration, verify
`program, configure peripheral hardware,
`initialize all transient memory.
`I
`Establish a workable processor
`configuration, configure peripheral
`hardware, initialize all memory.
`
`Ail calls lost
`’
`
`Lose calls being established; calls
`in progress not lost
`Lose calls being established; calls
`in progress not affected
`I
`
`signals are multiplexed onto a time-slotted digital bus. The digital bus enters a time-
`space—time network. The time slot interchange (TSI}'switches PCM signals to different
`time slots on the bus. The output of the TSI goes to the time multiplexed switch (TMS),
`which switches the PCM signals in a particular time slot from any bus to any other
`bus. The output of the TMS returns to the TS], where the PCM signals may be
`interchanged to another time slot‘. Signals intended for analog lines are converted
`from PCM to analog signals in the VIF. A network clock coordinates the timing for all
`of the switching functions.
`‘
`l
`The signal processors provide scanning and signal distribution functions, thus
`relieving the central processor of these activities. The common channel
`interface
`signaling (CCIS) provides an independent data link: between telephone switching sys-
`tems. The CCIS terminal is used to send supervisory switching information for the
`
`12
`12
`
`TABLE 8—1
`
`Summary of
`installed AT&T
`
`telephone
`switching systems
`
`TABLE 8—2
`
`Levels of recovery
`in a telephone
`switching system
`
`
`
`
`
`

`

`
`
`526
`
`ll. THE PRACTICE OF RELIABLE SYSTEM DESIGN
`
`FIGURE 8—1
`
`Diagram of a typi-
`cal telephone
`switching system
`
`
`
`
`
`
`
`
`Service circuits
`
`Wire facilities
`
`Analog carrier
`
`Digital carrier
`'
`,
`.
`
`Analog
`mrrler
`signaling
`
`Signal
`processor
`1,
`
`PCM
`
`PCM
`
`PCM
`
`
`
`
`Digroup
`terminal
`DT
`
`‘
`
`'
`I
`
`Data
`signaling and
`control
`
`
`Signal
`
`processor
`
`
`2
`
`
`Echo
`Time slot
`
`
`suppressor
`
`terminal!
`interchange
`1'5]
`
`
`
`EST
`
`
`
`
`‘
`.
`.
`Peripheral unit (PU) bus
`
`Time
`multiplexed
`switch
`TMS
`
`
`
`PU bus
`
`l/ 0
`
`PU bus
`
`Bus interface
`
`pugg
`MCC Central
`CC
`
`Data links
`
`
`
`' Common
`channel
`intemf-fice
`signaling
`
`
`
`
`
`Master
`control '
`console
`
`I
`control
`
`Auxiliary unit (AU))bus
`
`AU'
`units
`
`File
`store
`
`Program store (PS) bus
`
`at
`
`Call store (C5) bus
`
`
`
`various trunk lines coming into the office. The entire peripheral hardware is interfaced
`to the central control (CC) over AC—coupled buses. A telephone switching processor
`is composed oi the central control, which manipulates data associated with call pro-
`cessing, administrative tasks, and recovery; program store; call store for storing tran-
`sient information related to the processing of telephone calls; fiie store disk system
`used to store backup program copies; auxiliary units magnetic tapes storage containing
`basic restart programs and new software releases;
`input/output (l/O)
`interfaces to
`terminal devices; .and master control console used as the control and display console
`for the system. In general, a telephone switching processor could be used to control
`more than one type of telephone switching system.
`The history ofAT&T processors is summarized in Table 8—3 Even though all the
`processors-are based upon full duplication, it is interesting to observe the evolution
`from the tightly lock--stepped matching of every machine cycle'In the early processors
`to a higher dependence on self-checking and matching oniy on writes to memory.
`Furthermore, as the processors evolved from. dedicated, real-time controllers to mul-
`
`13
`13
`
`

`

`B. HIGH-AVAILABILITY SYSTEMS
`
`527
`
`Processont
`
`Year
`Introduced
`No. 1, 1965
`
`Complexity
`(Gates)
`12,000
`
`Unit,of
`Switching
`PS, CS, CC,
`buses
`
`No. 2, 1969
`
`5,000
`
`No. 1A, 1976
`
`50,000
`
`Entire
`computer
`
`PS, CS, CC,
`buses
`
`Matching
`Six internal nodes, 24
`bits per node; one
`node matched each
`machine cycle; node
`seiected to be matched
`dependent on instruc-
`tion being executed
`Single match point on
`call store input
`
`16 internal nodes, 24 bits
`per node; two nodes
`matched each machine
`cycle
`
`7
`
`No. 3A, 1975
`
`16,500
`
`Entire
`computer
`
`None
`
`‘
`
`3BZOD, 1981
`
`75,000
`
`Entire
`computer
`
`None
`
`
`
`TABLE 8—3 Summary of AT&T Telephone Switching Processors
`
`
`
`.
`
`I
`
`
`
`:
`
`'
`
`,.
`
`i
`If
`5
`
`'1'
`l
`
`'
`
`Other Error Detection/Correction
`Hamming code on P5; parity on C5;
`automatic retry on C5, PS; watch-
`dog timer; sanity program to de-
`termine if reorganization led to a
`valid configuration
`
`,
`
`Diagnostic programs; parity on PS;
`detection of multiword accesses
`in CS; watch—dog timer
`Two-parity bits on PS; roving spares
`(i.e., contents of PS not com-
`pletely duplicated, can be loaded
`from disk upon error detection);
`two-parity bits on C5; roving
`spares sufficient for complete du-
`plication of transient data; proces-
`sor configuration circuit to search
`automatically for a yaiid configura—
`tion
`On-iine processor writes into both
`stores; m-of-Zm code on micro—
`store plus parity; self-checking
`decoders; two—parity bits on regis-
`ters; duplication of ALU; watch-
`dog timer; maintenance channel
`for observability and controllabil—
`ity of the other processor; 25% of
`logic devoted to self—checking
`logic and 14% to maintenance
`access
`
`On-line processor write into both
`stores; byte parity on data paths;
`parity checking where Parity pre-
`served, duplication otherwise;
`modified Hamming code on main
`memory; maintenance channel for
`observability and controllability of
`the other processor; 30% of con-
`trol logic devoted to self-check—
`ing; error—correction codes on
`disks; software audits, sanity
`timer, integrity monitor
`
`14
`14
`
`

`

`
`
`528
`
`N. THE PRACTICE OF RELIABLE SYSTEM DESIGN
`
`TANDEM
`
`COMPUTERS,
`INC.
`
`
`
`tiple-purpose processors, the operating system and software not only became more
`sophisticated but also became a dominant portion of the system design and mainte-
`nance effort.
`
`I of the AT&T case study in this chapter, by Wing Toy, sketches the
`The part
`evolution of the telephone switching system processors and focuses on the latest
`member of the family, the 33201). Part II of the case study, by Liane C. Toy, outlines
`the procedure used in the BESS for updating hardware and/or software without incur—
`ring any downtime.
`
`Over a decade after the first AT&T computer-controlled switching system was installed,
`Tandem designed a high-availability system targeted for the on-line transaction pro-
`cessing (OLTP) market. Replication of processors, memories, and disks was used not
`only to tolerate failures, but also to provide modular expansion of computing re-
`sources. Tandem was concerned about the propagation of errors, and thus developed
`a loosely coupled multiple computer architecture. While one computer acts as primary,
`the backup computer is active only to receive periodic checkpoint information. Hence,
`1.3 physical computers are required to behave as one logical fault-tolerant computer.
`Disks, of course, have to be fully replicated to‘provide a complete backup copy of the
`database. This approach places a heavy burden‘upon the system and user software
`developers to guarantee correct operation no matter when or where a failure occurs.
`in particular, the primary memory state of a computation may not be available due to
`the failure of the processors. Some feel, however, that the multiple computer structure
`is superior to a lock-step duplication approach in tolerating design errors.
`The architecture discussed in the Tandem case study, by Bartlett, Bartlett, Garcia,
`Gray, Horst, Jardine, Jewett, Lenoski, and McGuire, is the first commercially available,
`modularly expandable system designed specifically for high availability. Design objec-
`tives for the system include the following:
`
`- ”Nonstop" operation wherein failures are detected, components are reconfigured
`out of service, and repaired components are configured back into the system
`without stopping the other system components
`. Fail-fast logic whereby no single hardware failure can compromise the data integ-
`rity of the system
`- Modular system expansion through adding more processing power, memory, and
`peripherals without impacting applications software
`
`As in the AT&T switching systems, the Tandem architecture is designed to take advan-
`tage of the OLTP application to simplify error detection and recovery. The Tandem
`architecture is composed of up to 16 computers interconnected by two message—
`oriented Dynabuses. The hardware and software modules are designed to be fast-fail;
`that is, to rapidly detect errors and subsequent terminate processing. Software modules
`employ consistency checks and defensive programming techniques. Techniques em-
`ployed in hardware modules include the following:
`'
`
`15
`15
`
`

`

`
`
`8. HIGH—AVAILABILITY SYSTEMS
`
`529
`
`Checksums on Dynabus messages
`Parity on data paths
`Errdr-correcting code memory
`Watch-dog timers
`‘
`
`l/O device controllers are dual ported for access by an alternate path in case of
`All
`processor or l/O failure. The software builds a process-oriented system with all come
`munications handled as messages on this hardware structure. This abstraction allows
`the blurring of the physical boundaries between processors and peripherals. Any [/0
`device or resource in the system can be accessed by a process, regardless of where
`the resource and process reside.
`Initially, hardware/firmware
`Retry is extensively used to access an l/O device.
`retries the access assuming a temporary fault. Next, software retries, followed by
`alternative path retry and finally alternative device retry.
`A network systems management program provides a set of operators that helps
`reduce the number of administrative errors typically encountered in complex systems.
`The Tandem Maintenance and Diagnostic System analyzes event logs to successfully
`call out failed field-replaceable units 90 percent of the time. Networking software exists
`that allows interconnection of up to 255 geographically dispersed Tandem systems.
`Tandem applications include order entry, hospital records, bank transactions, and
`library transactions.
`Data integrity is maintained through the mechanisms of U0 ”process pairs”; one
`l/O process is designated as primary and the other is designated as backup. All file
`modification messages are delivered to the primary i/O process. The primary sends a
`message with checkpoint information to the backup so that it can take over if the
`primary’s processor or access path to the IIO device fails. Files can also be duplicated
`on physically distinct devices controlled by an IIO process pair on physically distinct
`processors. All file modification messages are delivered to both l/O processes. Thus,
`in the event of physicaE failure or isolation of the primary, the backup file is up-to—date
`and available.
`'-
`
`User applications can also utilize the process-pair mechanism. As an example of
`how process pairs work, consider the nonstop application, program A, shown in Figure
`3—2. Program A starts a backup process, A1,
`in another processor. There are also
`duplicate file images, one designated primary and the other backup. Program A peri-
`odically (at user—specified points) sends checkpoint information to A1. A1 is the same
`program as A, but knows that it is a backup program. A1 reads checkpoint messages
`to update its data area, file status, and program counter.
`The checkpoint information is inserted in the corresponding memory locations of
`the backup process, as opposed to the more usual approach of updating a disk file.
`This approach permits the backup process to take over immediately in the event of
`failure without having to perform the usual recovery journaling and disk accesses
`before processing resumes.
`Program A1 loads and executes if the system reports that A’s processor is down
`(error messages sent from A’s operating system image or A’s processor fails to respond
`
`16
`16
`
`

`

` 530
`
`FIGURE 8—2
`
`Shadow processor
`in Tandem
`
`ll. THE PRACTICE OF RELIABLE SYSTEM DESIGN
`
`Backup
`exists?
`
`
`
`Backup
`exists?
`
`Checkpoint
`0 Data
`0 File status
`
`
`
`to a periodic ”l’m alive” message). All file activity by A is performed on both the
`primary and backup file copies. When A’t starts to execute from the last checkpoints,
`it may attempt to repeat U0 operations successfully completed by A. The system file
`handler will
`recognize this and send A1 a successfully Completed l/O message.
`Program A periodically asks the operating system if a backup process exists. Since one
`no longer does,
`it can request the creation and‘initialization of a copy of both the
`process and file structure.
`A major issue in the design of loosely c0upled duplicated systems is how both
`copies can be kept identical in the face of errors. As an example of how consistency
`is maintained, consider the interaction of an l/O processor pair as depicted in Table
`84. Initially, all sequence numbers (SeqNo) are set to zero. The requester sends a
`request to the server.
`If the sequence number is less than the server’s local copy, a
`failure has occurred and the status of the completed operation is returned. Note that
`the requested operation is done only once. Next, the operation is performed and a
`checkpoint of the request is sent to the server backup. The disk is written, the sequence
`number incremented to one, and the results checkpointed to the server backup, which
`also increments its sequence number. The results are returned from the server to the
`requester. Finally the results are checkpointed to the requester backup, which also
`increments its sequence number.
`Now consider failures. If either backup fails, the operation completes successfully.
`If the requester fails after the request has been made, the server will complete the
`operation but be unable to return the result. When the requester backup becomes
`active, it will repeat the request. Since its sequence number is zero, the server test at
`step 2 will return the result without performing the operation again. Finally,
`if the
`server fails, the server backup either does nothing or completes the operation using
`checkpointed information. When the requester resends the request, the new server
`(that is, the old server backup) either performs the operation or returns the saved
`results. More information on the operating system and the programming of nonstop
`applications can be found in Bartlett [1978].
`
`17
`17
`
`

`

`8. HIGH-AVAILABILITY SYSTEMS
`
`531
`
`TABLE 8—4
`
`Sample process-
`pair transactions
`
`
`
`Server Backup
`Server
`Requester Backup
`Requester
`Step
`SeqNo : 0
`SeqNo : D
`SeqNo : 0
`SeqNo = 0
`“mm—
`1
`Issue
`‘
`
`STRATUS
`COMPUTERS,
`INC.
`
`I
`
`request
`to write
`
`record W—h—
`if SeqNo <
`MySeqNo, then
`return saved status
`
`_
`
`Otherwise, read disk,
`perform operation, —>— Saves request
`checkpoint request
`Write to disk
`
`SeqNo = 1 —--—- Saves result
`checkpoint result
`SeqNo = 1
`<—-————. Return results
`
`2
`
`3
`
`4
`
`5
`
`'
`Checkpoint ——a-5eqNo = 1
`6
`results
`Source: Bartlett, 1981; © 1981 ACM.
`
`Whereas the Tandem architecture was based upon minicomputer technology, Stratus
`entered the OLTP market five years after Tandem by harneSsing microprocessors. By
`1980, the performance of microprocessor chips was beginning to rival that of minicom-
`puters, Because of the smaller form factor of microprocessor chips, it was possible to
`place two microprocessors on a single board and to compare their output pins on
`every clock cycle. Thus, the Stratus system appears to users as a conventional system
`that does not require special software for error detection and recovery. The case study
`by Steven Webber describes the Stratus approach in detail.
`The design goal for Stratus systems is continuous processing, which is defined as
`uninterrupted operation without loss of data, performance degradation, or special
`programming. The Stratus self-checking, duplicate-and-match architecture is shown in
`Figure 8—3. A module (or computer) is composed of replicated power and backplane
`buses (StrataBus) into which a variety of boards can be inserted. Boards are logically
`divided into halves that drive outputs to and receive inputs from both buses. The bus
`drivers/receivers are duplicated and controlled independently. The logical halves are
`driven in lockrstep by the same clock. A comparitor is used to detect any disagreements
`between the two halves of the board. Multiple failures that affect the two independent
`halves of a board could cause the module to hang as it alternated between buses
`seeking a fault-free path. Up to 32 modules can be interconnected into a system via a
`message-passing Stratus intermodule bus (SIB). Access to the SIB is by dual 14 mega-
`byte-per-second links. Systems, in return, are tied together by an X25 packet-switched
`network.
`
`
`
`
`
`18
`18
`
`

`

`
`
`532
`
`ll. THE PRACTICE OF RELIABLE SYSTEM DESIGN
`
`FIGURE 8—3
`The Stratus pair-
`and-spare architec—
`ture
`'
`
`Power 0
`
`Bus A
`
`Bus B
`'
`
`Processor
`
`Power 1
`
`
`
`
`Now consider how the system in Figure 8—3 tolerates faiiure. The two processor
`boards (each containing a pair of microprocessors), each self-checking modules, are
`used in a pair-and-spare configuration. Each board operates independently. Each half
`of each board (for example, side A) received inputs from a different bus {for example,
`bus A) and drives a different bus (for example, bus A). Each bus is the wired-OR of
`one-half of each board (for example, bus A is the wired-OR of all A board halves). The
`boards constantly compare their two halves, and upon disagreement, the board re—
`moves itself from service, a maintenance interrupt is generated, and a red light is
`illuminated. The spare pair on the other processor board continues processing and is
`now the sole driver of both buses. The operating system executes a diagnostic on the
`failed 'board to determine whether the error was caused by a transient or permanent
`fault. In the case of a transient, the board is returned to service. Permanent faults are
`reported by phone to the Stratus Customer Assistance Center (CAC). The CAC recon-
`firms the problem, selects a replacement board of the same revision, prints installation
`instructions, and ships the board by overnight courier. The first time the user realizes
`there is a problem is when the board is delivered. The user removes the old board
`and inserts the‘new board without disrupting the system {that is, makes a "hot” swap).
`The new board interrupts the system, and the processor that has been running brings
`the replacement into full synchronization, at which point the full configuration is
`available again. Detection and recovery are transparent to the application software.
`The detection and recovery procedures for other system components are similar,
`although the'fuli implementation of pair-and—spare is restricted to oniy the processor
`and memory. The disk controllers contain dupliCate read/write circuitry. Communica-
`
`19
`19
`
`

`

`8. HiCH~AVAILABILl1Y SYSTEMS
`
`533
`
`In" addition the memory controllers monitor
`tions controllers are also self-Checking.
`the bus for parity errors. The controElers‘ can declare a bus broken and instruct all
`boards to stop using that bus. Other boards monitor the bus for data directed to them.
`If the board detects an inconsistency but the memory controllers have not declared
`the bus broken, the board assumes that its bus receivers have failed and declares itself
`fail

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket