`for
`hour
`to 10“° per
`simplex processor
`processor that uses parallel hybrid redundancy. For
`those functions
`requiring fault masking a
`triplex
`level of
`redundancy is provided.
`For
`low criticall-
`ty functions or noncritical functions,
`the GPCs may
`be duplex or simplex. Parallel hybrid redundancy is
`used
`for extremely high levels of fault
`tolerance
`and/or for
`longevity (long mission durations). GPCs
`can also be made damage tolerant by physically dis-
`persing redundant GPC elements and providing secure
`and damage
`tolerant
`communications between these
`elements. Within AIPS, computers of varying levels
`of fault tolerance can coexist such that
`less reli-
`able computers are not
`a detriment
`to higher reli-
`ability computers.
`
`framework in which AIPS operates can
`The overall
`be characterized as a
`limited form of a fully dis-
`tributed multicomputer system.
`A fully distributed
`fault
`and
`damage
`tolerant
`system must
`satisfy
`several
`requirements.
`The
`following subsections
`describe these requirements
`and characterize the
`AIPS architecture in the context of
`these require-
`ments.
`
`2.2 FUNCTION MIGRATION
`
`fully distributed system must have a multi-
`A
`plicity of resources which are freely assignable to
`functions on a short-term basis [3]. AIPS has mul-
`tiple processing sites; however.
`they are not freely
`assigned to functions on a short-term basis. During
`routine operations the General Purpose Computers at
`various processing sites are assigned to perform a
`fixed set of functions, each computer doing a unique
`set of
`tasks. However,
`in response to some internal
`or external
`stimulus,
`the computers can be reas-
`signed
`to
`a different
`set
`of
`functions.
`This
`results in some functions migrating from one proc-
`essing site to another site in the system. Under
`certain conditions,
`it may also result in some func-
`tions being suspended entirely for
`a brief time
`period or for
`the remainder of the mission.
`In AIPS
`this
`form of
`limited distributed processing is
`called semi-dynamic function migration.
`
`in function
`result
`that
`stimuli
`internal
`The
`migration may consist of detection of a fault in the
`system,
`a change in the system load due to a change
`in mission phase, etc.
`An example of
`an external
`stimulus is a crew initiated reconfiguration of
`the
`system.
`
`2.3 RESOURCE TRANSPARENCY
`
`a fully distributed
`Another characteristic of
`system is that the multiplicity of resources should
`be transparent
`to the user.
`To a large extent,
`this
`is true in the AIPS.
`Function migration is trans-
`parent
`to the function and the person implementing
`that function in software.
`Interfunction communi-
`cation is handled by the operating system such that
`the location of
`the two communicating functions is
`also transparent
`to both.
`The two functions could
`be collocated in a GPC or
`they may be executing in
`different GPCs.
`Indeed, at one time they may be col-
`located. while at a later time one of
`them may have
`been migrated to another site. This transparency is
`achieved through a layered approach to interfunction
`communication.
`One of
`these layers determines the
`current processing site of the function to which one
`wishes to communicate.
`If it is another GPC, anoth-
`er
`layer
`in the communication hierarchy is invoked
`that
`takes care of appropriate IC bus message for-
`mattinq and interface to the bus
`transmitters and
`
`GLOBAL I/O BUS
`i
`LOCAL I/0
`.3...
`La?
`
`comvuren 1
`
`INTERCOMPUTER aus
`i
`,..__.MAss MEMORY BUS
`A
`l—‘*—"‘l
`
`REGIONAL I/D BUS
`
`
`
`COMPUTER Z
`
`COMPUTER 3
`
`SYSTEMMASS
`
`
`MEMO"
`
`
`
`COMPUTER N
`
`
`
`
`Figure 1. AIPS Architecture: A Software View
`
`layer. This lay-
`the physical
`is,
`that
`receivers,
`ered approach is responsible for hiding the exist-
`ence of multiple computers
`from the applications
`programmer .
`
`2.4 SYSTEM CONTROL
`
`Another characteristic of a totally distributed
`system is that the system control
`is through multi-
`ple cooperating autonomous operating systems.
`The
`AIPS operational philosophy differs considerably in
`this regard.
`The overall AIPS system management and
`control authority is vested in one GPC at any given
`time. This GPC is called the Global Computer. All
`other GPCs are subservient to this GPC as far as sys-
`tem level
`functions are concerned. However, all
`the
`local
`functions are handled quite independently by
`each computer. This philosophy is more akin to a
`hybrid of hierarchical and federated systems. This
`is explained in the following.
`
`each GPC operates
`circumstances
`normal
`Under
`fairly autonomously of other computers.
`Each GPC
`has a Local Operating System that performs all
`the
`functions necessary to keep that processing site
`operating in the desired fashion. The local operat-
`ing system is responsible for an orderly start and
`initialization of
`the GPC, scheduling and dispatch-
`ing of tasks.
`input/output services,
`task synchroni-
`zation and
`communication
`services.
`and
`resource
`management.
`It also is responsible for maintaining
`the overall
`integrity of
`the processing site in the
`presence of faults. This involves fault detection,
`isolation,
`and
`reconfiguration (FDIR).
`The local
`operating system performs all of the redundancy man-
`agement
`functions
`including FDIR, background self
`tests,
`transient and hard fault analysis. and fault
`logging.
`
`The services provided by local operating systems
`at various processing sites are similar although
`they may differ in implementation.
`For example,
`the
`multiprocessor version of the operating system must
`take
`into account
`the multiplicity of processors
`
`200
`
`BOHNG
`
`Ex.1031,p.255
`
`BOEING
`Ex. 1031, p. 255
`
`
`
`it must also con-
`task scheduling. Similarly,
`for
`sider the more complex task of redundancy management
`and cycling of spare units. The uniprocessor oper-
`ating system can also have different variations
`depending upon the level of
`redundancy and the I/O
`configuration.
`
`The Local Operating System in each computer
`interfaces with the Network Operating System.
`The
`Network Operating System is responsible for system
`level
`functions. These include an orderly start and
`initialization of various buses and networks, commu-
`nication between processes executing in different
`computers,
`system level
`resource management,
`and
`system level
`redundancy management.
`System level
`resources are the GPCS,
`the I/O, IC, and the HM bus-
`es, and the shared data and programs stored in the
`mass memory or
`in some other commonly accessible
`location.
`System redundancy management
`includes
`FDIR in the I/O and IC node networks, correlation of
`faults
`in GPCs
`(both transient and hard faults),
`reassignment of
`computers
`to functions
`(function
`migration),
`and graceful degradation in case of
`a
`loss of a processing site.
`
`the Network Operating
`the functions of
`Some of
`System are centralized the Global Computer.
`The
`Global Computer
`is
`responsible for
`system start,
`resource management,
`redundancy management,
`and
`function migration.
`It needs status knowledge of
`all processing sites and it must be able to command
`other GPCs
`to perform specific functions. This com-
`munication is accomplished via the Network Operating
`System,
`a portion of which is resident
`in each com-
`puter.
`The Global Computer does not participate in
`every system level
`transaction.
`Some of
`the system
`level
`functions performed by the Network Operating
`System may involve only a pair of nonglobal GPCs.
`
`is designated to be the Global
`the GPCs
`One of
`the
`system bootstrap time. However,
`Computer at
`this designation can be changed during system opera-
`tion by an internal or an external stimulus.
`
`to be reliable, fault
`allows each hardware element
`tolerant,
`and
`damage
`tolerant.
`From a
`software
`viewpoint,
`however.
`this underlying complexity of
`the system is transparent. This is true not only in
`the context of
`the applications programs but
`for
`most of
`the operating system as well; however,
`those
`elements of
`the operating system that are concerned
`with fault detection and recovery and other redun-
`dancy management
`functions have an intimate know-
`ledge of the underlying complexity.
`
`Hardware redundancy in the AIPS is implemented
`at a fairly high level,
`typically at
`the processor,
`memory,
`and bus
`level. There are two fundamental
`reasons for providing redundancy in the system: one,
`to detect
`faults
`through comparison of
`redundant
`results, and two,
`to continue system operation after
`component failures. Processors, memories, and buses
`are replicated to achieve a very high degree of
`reliability and fault tolerance.
`In some cases cod-
`ed redundancy is used to detect faults and to pro-
`vide backups more efficiently than would be possible
`with replication.
`
`redundant elements are always operated in
`The
`tight synchronism which results in exact replication
`of computations and data. Fault detection coverage
`with this approach is one hundred per cent once a
`fault
`is manifested.
`To uncover latent faults,
`tem-
`poral and diagnostic checks are employed. Given the
`low probability of
`latent faults,
`the checks need
`not be run frequently.
`
`Fault detection and masking are implemented in
`hardware, relieving the software from the burden of
`verifying the correct operation of
`the hardware.
`Fault
`isolation and reconfiguration are largely per-
`formed in software with some help from the hardware.
`This
`approach
`has
`flexibility
`in
`reassigning
`resources after failures are encountered, and yet it
`is not burdensome since isolation and reconfigura-
`tion procedures are rarely invoked.
`
`2.5 DATA BASE
`
`2.7 DAMAGE TOLERANCE
`
`a distributed
`important attribute of
`Another
`system is the treatment of
`the data base. The data
`base can be completely replicated in all subsystems
`or
`it can be partitioned among the subsystems.
`In
`addition.
`the data base directory can be centralized
`in one subsystem, duplicated in all subsystems, or
`partitioned among the subsystems. The AIPS approach
`is a combination of these.
`
`GPCs will
`the mass memory data base, all
`For
`This can be
`contain a directory of the MMU contents.
`implemented as
`a
`‘directory to the directory‘
`in
`order to limit the involvement of GPCs
`in the direc-
`tory change process.
`The MMU directory will
`be
`static over extended intervals.
`
`system
`The data base that reflects the global
`state will be maintain ed by the Global Computer
`in
`its local memory.
`A copy will be maintained by any
`alternate Global Computer, also in local memory.
`
`The data base that reflects the distribution of
`functions among GPCs will be contained in all GPCS.
`
`2.5 FAULT TOLERANCE
`
`hardware
`considerable amount of
`a
`is
`There
`redundancy and complexity associated with each of
`the
`elements
`shown
`in Figure 1}
`This redundancy
`
`the AIPS survivability related require-
`One of
`ments
`is that
`the information processing system be
`able
`to tolerate those damage events that do not
`otherwise
`impair
`the
`inherent capability of
`the
`vehicle to fly, be it an aircraft or a spacecraft.
`
`tolerance will be
`for damage
`requirement
`The
`applied to redundant GPCs,
`intercomputer communi-
`cations, and to communication links between GPCs and
`sensors, effectors, and other vehicle subsystems.
`
`the redundant com-
`The internal architecture of
`puters supports the damage tolerance requirement
`in
`several ways. The links between redundant channels
`of
`a
`computer are point-to-point.
`That
`is, each
`channel has a dedicated link to every other channel.
`Second,
`these
`links
`can be
`several meters
`long.
`This makes
`it possible to physically disperse redun-
`dant
`channels
`in the target vehicle.
`The channel
`interface hardware is such that
`long links do not
`pose
`a problem in synchronizing widely dispersed
`processors.
`
`For communication between GPCs and between a GPC
`and I/O devices a damage and fault tolerant network
`is employed. The basic concept of the network is as
`follows.
`
`The network consists of a number of full duplex
`links
`that are interconnected by circuit switched
`
`201
`
`BOHNG
`
`Ex.1031,p.256
`
`BOEING
`Ex. 1031, p. 256
`
`
`
`In
`to form a conventional multiplex bus.
`nodes
`steady state.
`the network configuration is static
`and
`the circuit
`switched nodes pass
`information
`through them without the delays which are associated
`with packet
`switched networks.
`The protocols and
`operation of
`the network are identical
`to a multi-
`plex bus.
`Every transmission by any subscriber on a
`node is heard by all
`the subscribers on all
`the nodes
`just as
`if they were all
`linked together by a linear
`bus.
`
`a virtual bus.
`The network performs exactly as
`However,
`the network concept has many advantages
`over
`a bus. First of all, a single fault can disable
`only a small fraction of
`the virtual bus,
`typically
`a
`link connecting two nodes, or a node. The network
`is able to tolerate such faults due to a richness of
`interconnections between nodes.
`By
`reconfiguring
`the network around the faulty element, a new virtual
`bus
`is constructed.
`Except
`for
`such reconfigu-
`rations,
`the structure of
`the virtual bus
`remains
`static.
`
`to recognize
`The nodes are sufficiently smart
`reconfiguration commands
`from the network manager
`which is one of
`the GPCs.
`The network manager can
`change
`the
`bus
`topology by
`sending appropriate
`reconfiguration commands to the affected nodes.
`
`induced damage or other
`Second, weapons effect
`damage caused by electrical shorts, overheating, or
`localized fire would affect only subscribers in the
`damaged portion of the vehicle. The rest of the net-
`work.
`and
`the
`subscribers on it, can continue to
`operate normally.
`If the sensors and effectors are
`themselves physically dispersed for damage tolerance
`or other
`reasons
`and
`the damage event does not
`affect
`the
`inherent capability of
`the vehicle to
`continue to fly,
`then the control
`system would con-
`tinue
`to function in a
`normal manner or
`in some
`degraded mode
`as
`determined
`by
`sensor/effector
`availability.
`The communication mechanism,
`that is,
`the network itself, would not be a reliability bot-
`tleneck.
`
`in the
`isolation is much easier
`fault
`Third.
`network than in multiplex buses.
`For example,
`a
`remote terminal
`transmitting out of
`turn,
`a rather
`common failure mode, can be easily isolated in the
`network through a systematic search where one termi-
`nal
`is disabled at a time. This,
`in fact,
`is a stan-
`dard algorithm for isolating faults in the network.
`
`the network can be expanded very easily
`Fourth,
`by adding more nodes.
`In fact, nodes and subscrib-
`ers to the new nodes
`(I/O devices or GPCs)
`can be
`added without
`shutting down the existing network.
`In bus systems, power
`to buses must be turned off
`before
`new subscribers or
`remote terminals can be
`added.
`
`to distribute simplex data to redundant
`sufficient
`elements
`in one step.
`The redundant elements must
`exchange their copy of
`the data with each other to
`make sure that every element has a congruent value
`of the simplex data. The AIPS architecture not only
`takes
`this requirement
`into account but also pro-
`vides efficient ways of performing simplex source
`congruency through a mix of hardware and software.
`The simplex to redundant interface is also the place
`where the applications programmer gets
`involved in
`the processor
`redundancy and the applications code
`complexity multiplies.
`The AIPS processor
`level
`architecture is designed such that it separates the
`source congruency and computational
`tasks
`into two
`distinct functional areas. This reduces the appli-
`cations code complexity and aids validation.
`
`2.9 MASS MEMORY
`
`The mass memory in AIPS provides the following
`capabilities.
`
`1.
`
`2.
`
`System Cold Start/Restart.
`
`Function Migration Support.
`
`3. Overlays for
`Computers.
`
`local memory of General Purpose
`
`4.
`
`System Table Backup.
`
`5. Storage for system-wide common files.
`
`6.
`
`Program Checkpointing.
`
`3 Q EBQQE-QE_§QNcEEI 51515”
`
`the Advanced
`feasibility of
`To demonstrate
`Information Processing System concept described in
`the preceding sections,
`a
`laboratory proof-of-con-
`cept
`system will be built.
`Such a system is now in
`the detailed design phase.
`The POC system config-
`uration is shown in Figure 2.
`It consists of five
`processing sites which are interconnected by a tri-
`plex circuit
`switched network.
`Four of
`the five
`GPCs are uniprocessors, one simplex, one duplex, and
`two triplex processors.
`The fifth GPC is a multi-
`processor that uses parallel hybrid redundancy. The
`redundant GPCs are to be built such that they can be
`physically dispersed for damage tolerance.
`Each of
`the redundant channels of a GPC could be as far as 5
`meters from other channels of the same GPC.
`
`tolerant processors
`the triplex fault
`Each of
`(FTPs) and the fault tolerant multiprocessor
`(FTMH
`interfaces with three nodes of
`the Intercomputcr
`(IC)
`node network.
`The duplex and the simplex pro-
`cessors
`interface with two and one nodes,
`respec-
`tively.
`
`there are no topological constraints
`Finally,
`which might be encountered with linear or ring bus-
`es.
`
`The mass memory is a highly encoded memory that
`interfaces with the GPCs on a triplex multiplex bus.
`
`2.8 SOURCE CONGRUENCY
`
`An important consideration in designing AIPS is
`interface between redundant
`and simplex ele-
`the
`ments. This interface design is crucial
`in avoiding
`single point faults in a redundant system. One must
`perform source congruency operations on all simplex
`data
`coming into a redundant computer.
`It
`is not
`
`is mechanized using a 16 node
`The Input/Output
`circuit switched network that
`interfaces with each
`of
`the GPCs on 1
`to 6 nodes depending on the GPC
`redundancy level.
`
`Redundant system displays and controls are driv-
`en by the Global Computer and interface through the
`I/O network.
`
`Each GPC has a Local Operating System and a por-
`tion of
`the Network Operating System.
`For
`the
`proof-of-concept system,
`initially the FTMP will be
`
`202
`
`BOHNG
`
`Ex.1031,p.257
`
`BOEING
`Ex. 1031, p. 257
`
`
`
` ——1
`
`TO ALL
`PROCESSORS
`
`'
`MASS
`MEMDRV
`
`FTMP
`
`Figure 2.
`
`Proof-of-Concept
`AIPS
`Configuration
`
`System
`
`the Global Computer.
`
`3.1 ARCHITECTURE OF AIPS BUILDING BLOCKS
`
`hardware building
`the major
`Architecture of
`blocks of
`the AIPS Proof-of-Concept System config-
`uration is described in the following sections.
`
`
`
`the FTP is divided
`The architectural description of
`into three sections: Software View, Hardware View,
`and External Interfaces.
`
`3.1.1.1 Fault Tolerant Processor: Software View
`
`the uniprocessor architecture from a
`FTP or
`The
`software viewpoint appears as shown in Figure 3.
`
`DATA
`x
`c
`”
`G
`
`RAM
`ROM
`
`"0
`CPU
`
`MEMORY
`MAPPED
`no
`oevmes
`
`INTERVALTIMER
`+
`NEALTIMECLOCK
`’
`WATCHDOGIWMER
`
`Ji
`
`iiIOP BUS
`
`IC
`BUS
`
`I/0
`BUSES
`
`1
`
`IC
`aus
`'NT
`l
`
`I/O
`aus
`‘NT
`L >
`
` AT
`
`x
`c
`H
`G
`
`RAM
`+
`ROM
`
`°°MPUgF’,‘UT'°”AL
`
`CF BUS
`
`I
`
`
`
`INTERVAL TIMER
`+
`REAL TIME CLOCK
`+
`WATCHDOG TIMER
`
`
`SHARED
`MEMORY
`
`
`
`
`
`Figure 3.
`
`Tolerant
`Fault
`Architecture: Software View
`
`Processor
`
`The uniprocessor can be thought of as consisting
`two separate and rather independent sections:
`the
`Of
`°°mPutational core and the Input/Output channel.
`
`The computational core has a conventional pro-
`cessor architecture.
`It has a CPU, memory (RAM and
`ROM),
`a Real Time Clock, and interval timer(s). The
`Real Time Clock counts up and can be read as a memory
`location (a pair of words) on the CP bus.
`Interval
`timers are used to time intervals for scheduling
`tasks
`and keeping time—out
`limits on applications
`tasks (task watchdog timers).
`An interval
`timer can
`be
`loaded with a given value which it
`immediately
`starts counting down and when the counter has been
`decremented to zero,
`the CPU is interrupted with a
`timer
`interrupt.
`A watchdog timer
`is provided to
`increase fault coverage and to fail-safe in case of
`hardware
`or
`software malfunctions.
`The watchdog
`timer resets the processor and disables all
`its out-
`puts
`if
`the timer
`is not reset periodically. The
`watchdog timer
`is mechanized independently of
`the
`basic processor timing circuitry.
`
`There also appears on the processor bus a set of
`registers,
`called the data
`exchange
`registers.
`These are used in the redundant fault tolerant pro-
`cessor
`to exchange data amongst
`redundant process-
`ors.
`From a software viewpoint,
`this is the only
`form in which hardware redundancy is manifested.
`
`On a routine basis the only data that needs to be
`exchanged consists of error
`latches and cross chan-
`nel
`comparisons of
`results for
`fault detection.
`These operations can be easily confined to the pro-
`gram responsible for Fault Detection, Isolation, and
`Reconfiguration. Voting
`of
`the
`results of
`the
`redundant computational processors is performed by
`the Input/Output processors. Therefore.
`the remain-
`ing pieces of
`the Operating System software and the
`applications
`programs
`need not
`be
`aware of
`the
`existence of
`the data exchange registers. The task
`scheduler and dispatcher,
`for example, can view the
`computational core as a single reliable processor.
`
`the
`is
`processor
`the
`of
`half
`other
`The
`has a CPU
`I/O channel
`The
`Input/Output channel.
`(same instruction set architecture as the CP), memo-
`ry (RAM and ROM),
`a Real Time Clock. and an Interval
`Timer(s). This part of
`the I/O channel
`is identical
`to the CP except that it has less memory than the CP.
`
`The IOP has interfaces to the intercomputer bus,
`one or more
`I/0 buses,
`and memory mapped
`I/O
`devices.
`The CP
`and the IOP also have a shared
`interface to the system mass memory. These external
`interfaces of
`the FTP will be discussed in the next
`two sections.
`
`IOP and CP exchange data through a shared
`The
`The IOP and CP have independent operating
`memory.
`that cooperate to assure that
`the sensor
`systems
`and other data from Input devices is made
`values
`available to the control
`laws and other applications
`programs running in the CP
`in a timely and orderly
`fashion. Similarly,
`the two processors cooperate on
`the outgoing information so that
`the actuators and
`other output devices receive commands at appropriate
`times. This is necessary to minimize the transport
`lag for closed loop control functions such as flight
`control and structural control.
`
`CP and IOP actions are therefore synchro-
`The
`nized to some extent.
`To help achieve this syn-
`chronization in
`software,
`a hardware feature has
`been provided. This feature enables one processor
`to interrupt
`the other processor.
`By writing to a
`reserved address in shared memory the CP can inter-
`rupt
`the IOP
`and by writing to another
`reserved
`location the IOP can interrupt
`the CP. Different
`
`203
`
`BOEING
`
`Ex. 1031, p. 258
`
`BOEING
`Ex. 1031, p. 258
`
`
`
`meanings can be assigned to this interrupt by leav-
`ing an appropriate message, consisting of commands
`and/or data,
`in some other predefined part of
`the
`shared memory just before the cross-processor inter-
`rupt
`is asserted.
`
`both
`in
`information
`flow of
`routine
`For
`directions,
`the shared memory will be used without
`interrupts but with suitable locking semaphores to
`pass a consistent set of data. The interrupts can be
`used to synchronize this activity as well as to pass
`time critical data
`that must meet
`tight
`response
`time requirements.
`In order to assure data consist-
`ency it
`is necessary that while one side is updating
`a block of data the other side does not access that
`block of data.
`This
`can either
`be
`implemented
`through semaphores
`in
`software or
`through double
`buffering. Hardware support for semaphores,
`in the
`form of
`test
`8 set
`instruction,
`is provided in the
`IOPs and CPs.
`
`this
`are many attractive features of
`There
`The
`architecture from an operational viewpoint.
`most
`important of
`these is the decoupling of compu-
`tational stream and the input/output stream of tran-
`sactions.
`The computational processor
`is
`totally
`unburdened from having to do any I/O transaction. To
`the CP all I/0 appears memory mapped.
`And this not
`only includes I/O devices but also all other comput-
`ers
`in the system as well. That
`is, each sensor,
`actuator,
`switch, computer, etc.
`to which the FTP
`interfaces can simply be addressed by writing to a
`word or words in the shared memory.
`
`Data from other processing sites is received by
`each IOP on the redundant
`IC buses, hardware voted,
`and then deposited in their respective shared memo-
`ries.
`Simplex source data such as
`that
`from I/O
`devices,
`local processors, etc.
`is received by the
`single I/O processor that is connected to the target
`device. This data is then sent
`to the other two I/O
`processors using the IOP data exchange hardware.
`The congruent data is
`then deposited in all
`three
`shared memory modules.
`In either case,
`the computa-
`tional processors obtain all data from outside that
`has already been processed for faults and source
`congruency requirements by the I/O processors.
`
`The data exchange mechanism appears to the soft-
`as
`a set of registers on the processor bus.
`ware
`exchange between redundant processors
`takes
`Data
`one word at
`a
`time.
`Two
`types of data
`place
`exchanges are possible:
`a simplex exchange or a vot-
`ed exchange. The purpose of a simplex exchange is to
`distribute congruent copies of data that
`is avail-
`able only in one channel of
`the FTP to all other
`channels. The purpose of a voted exchange is to com-
`pare
`and vote computational
`results produced by
`redundant processors.
`In
`the
`FTP architecture,
`these exchanges are mechanized as follows.
`
`each processor
`To perform a voted exchange,
`writes the value to be voted in a
`transmit register
`called X_V. Writing to this register
`initiates a
`sequence of events in hardware which culminates with
`the voted value being deposited in the receive reg-
`ister of each processor. The processor can read the
`receive register at
`this point
`to fetch the voted
`value. The whole transaction takes of the order of 5
`microseconds.
`The hardware is designed to lock out
`access to the receive register while the exchange is
`in progress.
`If
`the processor
`tries to read the
`receive
`register before the
`transaction has com-
`pleted,
`the processor hangs up.
`As soon as the data
`
`the processor is released and the
`becomes available,
`register read cycle completes normally. The proces-
`sor wait
`is transparent to the software.
`It
`is not
`necessary to time the interval between writing of
`the transmit
`register
`and
`reading of
`the receive
`register
`in
`software.
`The
`two operations can be
`performed as a sequence of
`two instructions without
`an intermediate wait.
`
`the data to be
`To perform a simplex exchange,
`transmitted is written to one of
`the simplex trans-
`mit registers.
`In the triple redundant version of
`the
`FTP there are three such registers.
`They are
`called X_A, X_B,
`and X_C.
`X_A is used to transmit
`simplex data from channel
`A to all others. Similar-
`ly X_B transmits data from B and X_C transmits data
`from C. Writing to one of these registers initiates
`a
`sequence of events
`in hardware which culminates
`with a congruent copy of
`the data word being depos-
`ited in the receive register of each processor. The
`receive register can be read at
`this point by each
`processor to fetch the congruent copy of
`the simplex
`data.
`
`It has been pointed out earlier that the soft-
`ware appearance of
`the redundant FTP is the same as
`that of a simplex processor. All
`redundant process-
`ors
`have
`identical
`software and execute identical
`instructions at all
`times.
`This architecture is
`carried forth in the data exchange hardware and
`software as well.
`The data exchange hardware is
`designed such that all
`redundant processors execute
`identical
`instructions when exchanging data.
`As an
`example, consider a simplex source transmission from
`channel A. Assume that channel
`A has a sensor value
`in its internal memory location, called MEMORY.
`that
`it needs to send to channels B and C. This requires
`execution
`of
`the
`following
`sequence
`of
`four
`instructions:
`
`R0,HEMORY
`LOAD
`STORE R0,X_A
`LOAD
`R0,X_R
`STORE R0,MEHORY
`
`J>(A)l\)-'~
`
`The data to be transmitted is fetched from memory
`(instruction 1)
`and written to transmit register X_A
`(instruction 2). All
`three processors execute these
`instructions. However, only processor A's value is
`transmitted to the receive register of A, B, and C.
`Transmissions from B and C are ignored by the hard-
`ware.
`This will be explained in the next section
`which deals with the FTP architecture from a hard-
`ware viewpoint.
`In instruction 3 all processors
`read their receive register (X_R)
`to accept the con-
`gruent value
`of
`the data transmitted by A.
`In
`instruction 4 this value is transferred to an inter-
`nal memory location.
`
`Voted data exchange requires a similar sequence
`instructions.
`The only difference is that
`in
`of
`instruction 2,
`rather than storing the value in one
`of
`the simplex transmit registers,
`it
`is stored in
`the voted exchange register, X_V.
`
`3.1.1.2 Fault Tolerant Processor: Hardware View
`
`The triplex FTP architecture from a hardware view-
`point appears as shown in Figure 4.
`
`hardware channels.
`There are three identical
`Each channel has a computational processor, an I/O
`processor,
`and some hardware that
`is shared by the
`CP and the IOP. The internal details of
`the CP and
`the IOP such as the CPU, memory,
`timers, etc., have
`
`204
`
`BOHNG
`
`Ex.1031,p.259
`
`BOEING
`Ex. 1031, p. 259
`
`
`
`A very important aspect of the FTP architecture
`the
`interconnection hardware between redundant
`is
`channels.
`This
`hardware
`serves
`three purposes.
`First of all,
`it provides a path for distributing
`simplex data available in only one channel
`to all
`other channels.
`Second,
`it provides a mechanism for
`comparing
`results of
`the redundant channels.
`And
`third,
`it provides a path for distributing and com-
`paring timing and control signals such as the fault
`tolerant clock and external
`interrupts.
`
`to
`To distribute simplex data from one channel
`all others without
`introducing single point faults
`in the design,
`it
`is necessary to adhere to source
`congruency requirements. One of these dictates that
`in order
`to tolerate single faults it
`is necessary
`to provide four fault containment regions.
`In the
`triplex FTP
`architecture
`six fault
`containment
`regions are provided.
`The triplex processor pro-
`vides
`the basic three fault containment regions.
`Three additional
`regions are provided in the form of
`interstages which receive data from processors and
`rebroadcast
`them back to processors.
`The inter-
`stages are mechanized such that they have independ-
`ent voltage and timing reference. This assures that
`faults in processors would not propogate to inter-
`stages
`and vice versa.
`Since
`an
`interstage is
`essentially a buffer with receivers and transmit-
`ters,
`it
`is relatively a small and simple piece of
`electronics.
`It is, therefore, much more convenient
`to
`provide
`three
`additional
`fault
`containment
`regions rather than just one as required for source
`congruency.
`It also makes the FTP architecture sym-
`metric.
`
`the data
`As explained in the preceding section,
`exchange hardware appears as a set of five registers
`on the processor bus.
`Four of these (X_A, X_B, X_C,
`and X_V) are the transmit registers and the fifth
`one
`is
`the
`receive register, X_R.
`For
`simplex
`source exchanges, say,
`a
`‘from A’ exchange, data in
`X_A register
`in channel
`A
`is
`transmitted to the
`three interstages. The interstages rebroadcast this
`data to every processor. The three copies received
`by
`each
`processor
`are voted
`in
`hardware
`on
`a
`bit-by-bit basis. The voted result
`is deposited in
`X_R.
`For voted exchanges, each channel writes the
`data to be voted in X_V register. Writing to X_V
`results in the data being transmitted to the chan-
`nel's own interstage.' The second half of
`the opera-
`tion is the same as for
`simplex exchange.
`In both
`cases,
`the exchange hardware masks any single faults
`while voting on three copies and also records the
`source of fault in an error latch. The error latch
`can be read by software as a memory location.
`
`3.1.1.3 Fault Tolerant Processor: External Inter-
`faces
`
`The external devices that interface with the FTP are
`the mass memory,
`the Intercomputer network, and the
`I/O network.
`
`the interface between a triplex
`Figure 4 shows
`FTP and the triply redundant mass memory bus. This
`interface hardware is shared in each channel by the
`CP and the IOP.
`Each channel of
`the FTP is enabled
`on one of
`the three buses.
`The FTP transmits com-
`mands and data synchronously on three buses to the
`mass memory where
`they are received and voted in
`hardware.
`The interface hardware performs the nec-
`essary parallel
`to serial data conversion, appends
`cyclic redundancy check byte (CRC),
`and transmits
`serial data on the bus. Each processor channel lis-
`tens to all
`three mass memory buses. Data received
`
`205
`
`BOHNG
`
`Ex.1031,p.260
`
`'0?
`
`Fe_ —:J\
`7
`_
`
`BUS C
`ARBITRATOR
`
`MASS M EMOR Y BUSES
`
`
`
`NETWORKS
`
`IC
`NETWORKS
`
`NETWORKS
`
`Figure 4.
`
`Tolerant
`Fault
`Architecture: Hardware View
`
`Processor
`
`They are
`been described in the preceding section.
`not shown in Figure 4 so that other details such as
`the redundancy dimension be shown more clearly.
`
`The common hardware consists of a shared memory,
`the
`data exchange
`registers,
`and the mass memory
`interface.
`The
`shared memory
`is used
`exchange
`information between the CP and the IOP while the
`data exchange registers are used to exchange infor-
`mation between redundant copies of
`the CP or
`IOP.
`Common hardware access conflicts between the CP and
`the IOP are resolved by a bus arbitrator.
`The bus
`arbi