throbber
Durable Memory RS/6000m System Design
`
`M. Abbott, D. Har, L. Herger, M. Kauffmann, K. Mak, J. Murdock,
`C. Schulz, T. B. Smith, B. Tremaine, D. Yeh, L. Wong
`
`IBM T. J. Watson Research Center, 30 Saw Mill River Rd
`Hawthorne, NY 10532
`
`Abstract
`The DM16000 prototype is a fault-toleranddurable-
`memory RS/6000'. The main storage of this system is
`battery backed so as to maintain memory content across
`prolonged power interruptions. In addition, there are no
`single points of failure, and all likely multiple failure
`scenarios are covered. The prototype is intended to
`match the data integrity and availability characteristics
`of RAIDS disks. Redundancy is managed in hardware
`and is transparent to the software; application programs
`and the operating system (AIXTM') can run unmodijied.
`The prototype is based on the IBM P o w e r P P ' 601
`microprocessor operating at 80 MHz. and is equivalent
`in performance and
`software appearance
`to a
`conventional 4-way shared bus,
`cache coherent,
`symmetric multiprocessor (SMP), with 4 gigabytes of
`non-volatile main storage.
`1.0 Introduction
`A Fault Tolerant, Durable Memory RS/6000
`technology prototype is under development at IBM's
`Watson Research Center. The main storage of this
`system is battery backed and will maintain memory
`content across prolonged power interruptions. This
`allows main store to be used as a durable or persistent
`storage medium in place of disk, when the improved
`performance of main memory storage of data could
`prove beneficial. Use of main store as a storage medium
`obligates the designer to meet traditional data integrity
`and availability expectations for a storage medium. Our
`goal was to match the data integrity and availability of
`the best RAID5 disk subsystems. Hardware redundancy
`assures that there are no single points of failure, and that
`all likely multiple failure scenarios are covered. This
`redundancy
`is
`software
`transparent. Application
`programs and the operating system (AIX) can run
`without modification, and are protected from hardware
`
`RSl6000, PowerPC, AIX, and Micro Channel are a
`trademarks of Intemational Business Machines Corporation.
`
`faults. Capitalizing on this hardware platform, certain
`augmentations have been made
`to
`the software
`environment to facilitate rapid recovery from software
`failures.
`Hardware fault tolerance and memory durability are
`achieved without significant performance penalty for the
`implementing technology. The prototype design is based
`on the IBM PowerPC 601 microprocessor operating at 80
`MHz. The software appearance is that of a conventional
`shared bus, cache coherent, symmetric multiprocessor
`(SMP) of up to four PowerPC 601 processors, and with
`up to 4 gigabytes of non-volatile main storage.
`2.0 System Architecture
`A large obstacle in the introduction of a new
`technology or concept
`into a mainstream product
`architecture
`is operating
`system and application
`compatibility with that base of experience and products.
`Most fault tolerant systems require major modification or
`total rewrite of existing software. This has limited the
`application of many fault tolerant systems to those
`situations where
`the
`costs of
`these
`software
`modifications are justified by the need for improved
`reliability. Even when this software customization is
`affordable, there is frequently a very serious lag between
`when new functions are available on mainstream
`platforms, and when they become available for fault
`tolerant system equivalents. This forces a choice
`between availability, and the more open flexibility of
`mainstream platforms.
`The trends in hardware technology are trivializing the
`costs of producing internally redundant or fault tolerant
`systems. The hardware costs for the triple redundant
`core DW6000 plus 1 gigabyte of durable memory is less
`than the cost of two conventional systems with the same
`raw performance. Further, there is no performance
`penalty for the management of hardware fault tolerance
`as this is transparently handled in hardware. Duplication
`of conventional machines sacrifices some system
`performance since a fraction of the systems throughput
`must be dedicated to the software processes that mirror
`recovery data to shared disk or via messaging directly to
`
`03634928194 $3.00 0 1994 IEEE
`
`414
`
`
`
`Page 1 of 10
`
`SAMSUNG EXHIBIT 1070
`
`

`

`the backup processor.
`A design choice was thus made to use straight
`forward hardware redundancy techniques
`to handle
`hardware failures in a fashion that is transparent to the
`software. The DM/6000 can use existing uniprocessor
`and symmetric multiprocessor (SMP) operating systems
`and application software. The DM/6OOO
`runs an
`unmodified A H kernel for the 601 PowerPC and
`standard RS/6000
`applications, with
`hardware
`availability characteristics equal to or superior to those
`of existing commercial
`fault
`tolerant computing
`systems[ 11 [9]. Conversely, specific software concepts
`and features of the DM/6000 that are intended to address
`software availability and recovery issues can be used in
`standard RS/6000 system environments to improve
`software availability in those environments. This is
`important for it implies that the effort to improve
`software availability and recovery characteristics is not
`specific to the DM/6000 hardware, but is more generally
`useful across the entire Droduct line.
`
`The physical architecture, illustrated in Figure 2,
`incorporates
`redundancy
`to meet
`reliability and
`availability requirements.
`The processor core, the
`interfaces to the L3 memory array, and interfaces to the
`YO channel subsystems are triplicated. All of the
`components of one rail of this Triplicated Processor Core
`(TPC), including up
`the L2
`to 4 processors,
`memorykache, and the YO channel interfaces, with all
`of the high speed, wide data width, shared data buses
`interconnecting these components are packaged together
`on a single card (the PC card). Three PC cards together
`The PC cards operate in tight
`form the TPC.
`synchronism with one another. The data paths on to and
`off of the PC cards are point to point byte or halfword
`serial links with clocks accompanying all data lines. The
`result is that all the performance critical data buses in the
`system are localized on the PC cards and can operate at
`full processor clocking rates. The off card interconnect
`is (pin) manageable and hot-pluggable.
`SYmh9.PISneS
`
`Figure 1 : DW6000 Logical Architecture
`Figure 1
`illustrates
`the
`logical, or
`software
`appearance, of the hardware architecture of an DM/6000
`system. Hardware redundancy is not visible to the
`operating system or application software. This logical
`architecture is a classic high performance computer
`architecture in the mold of emerging RISC based server
`class machines. The processor core consists of four 601
`PowerPC RISC engines making up a tightly coupled
`shared bus Symmetric Multiprocessor (SMP). The
`memory structure is that of a multi level coherent cached
`memory, ultimately backed by a large main memory
`array (4GB). Each processor has a private unified 8-way
`associative 32 KB Level 1 (Ll) cache. There is a shared
`64 MB Level 2 (L2) memory, split between 32 MB of
`directly addressable memory and 32MB of cache that is
`backed by the large main store (L3 memory array). The
`system has from two to four YO channels, which link the
`core to industry standards based YO subsystems. The
`YO channel link is flexible enough to accommodate a
`variety of standards (Micro Channelm, ISA, EISA, PCI,
`IBM ES/9000 channels etc.) but the prototype YO
`subsystems are exclusively Micro Channel based.
`
`Figure 2: Physical Architecture
`The memory in the large L3 memory array is
`partitioned across six Symbol Planes (SP's). This
`partitioning provides an economical physical size for the
`memory card while facilitating physical partitioning of
`the memory into independent fault containment regions.
`The system is tolerant to the partial or total failure of any
`one of the Symbol Planes. Each SP is a single pluggable
`circuit card with on-card power regulation and battery
`backup.
`Hardware Fault Tolerance
`There are a wide variety of techniques available to
`achieve fault tolerance. Each of these techniques has its
`own strengths and weaknesses. No single technique is
`optimum in all situations. The DM/6OOO employs a
`variety of fault tolerant techniques to optimize the
`reliability of the system, while minimizing system costs
`and impact on system performance. Each major area of
`the system uses a redundancy approach that most closely
`matches the performance requirements of that area
`without sacrificing economy.
`0 Alternate path redundancy is used to tolerate failures
`
`415
`
`Page 2 of 10
`
`SAMSUNG EXHIBIT 1070
`
`

`

`in VO channel paths. In this approach, each critical
`peripheral resource is attached to the system through
`at least two independent paths. For example, a token
`ring might be accessed by either of two paths when a
`token ring adapter is installed on each of two
`independent Micro Channel subsystems. If a fault
`disables the path through one of these adapters, then
`an alternate path is available through the other.
`Triplicated Redundancy with majority voting is used
`to mask faults in the TPC. Voting in each VO
`channel path, and in each symbol plane, mask any
`fault in the TPC. Since the voters are external to the
`TPC, a voter failure causes a failure of and is
`equivalent to a failure of the fault containment region
`in which it is located (a single VO channel, or a
`single symbol plane). Voter failures are tolerated as
`simply one of the contributing sources that could
`cause the failure of the region in which they are
`packaged .
`Error correcting codes are used to mask symbol plane
`failures. Data is striped across all symbol planes.
`Single symbol error correction circuitry in each rail is
`able to reconstruct any lost data, should a symbol
`plane fail. The advantage of this structure is that it
`minimizes the cost of protecting memory, while at
`the same time providing a very high performance
`memory system. Since all planes operate in parallel
`this memory
`structure
`is
`inherently
`high
`performance. Each symbol plane also includes
`internal single bit error correction to suppress soft
`errors.
`A distributed fault tolerant clock provides all logic
`timing in the system. Each of the three rails of the
`TPC has an independent voltage controlled crystal
`oscillator, and they are mutually phase locked to one
`another. Other fault containment regions of the
`system receive triplex FT clock transmissions from
`the TPC and phase lock to the majority phase to
`produce a local copy of the FT clock for its use. This
`clocking is essentially the same in concept as that of
`[7][8] and is not discussed further in this paper.
`The primary power input and the cooling systems use
`dual redundancy with full system capacity available
`in each of the redundant paths. For this prototype,
`each of the primary power supplies, converts 220
`VAC to -48 VDC. This intermediate power is
`distributed by dual
`-48
`VDC power buses.
`Normally, the load is shared between these two
`buses. When a failure occurs in either of the primary
`power supplies, or they lose their AC feed, the
`remaining bus rail automatically provides current to
`support the full load until the failed unit is repaired.
`Final power regulation and the backup batteries are
`distributed. Each fault containment region, or circuit
`card has its own private power regulator that draws
`input power from either or both of
`the
`two
`intermediate -48 VDC power buses. The failure of a
`
`power regulator will cause the loss of the associated
`fault containment region, but will not affect the
`operation of any other fault containment region in the
`system. If a fault containment region requires battery
`backup (the PC cards and the SP cards), it is fitted
`with a private battery and a charger that draws power
`from the intermediate power network.
`Processor (PowerPC 601)
`To meet demanding high end server requirements,
`the DM6000 design point was chosen as a 4-way SMP
`with high performance VO channels.
`Each of the four 601 PowerPC microprocessors on a
`PC card runs in cycle synchronism with the associated
`PowerPC 601 chip in the adjacent rails. At system
`startup a maintenance processor pre-loads the 601
`PowerPC’s memory with its initial program (bootcode)
`and synchronously releases triads of PowerPC 601’s in
`the rails from reset. Each triad of PowerPC 601’s
`continues to operate in tight synchronism as a single
`logical processor.
`An internal rail fault will cause that rail to diverge
`from or lose sync with the other two rails. Majority
`voting in the symbol planes and in the YO channels will
`mask the faulty rail. Because of the tight level of
`integration on the PC card, any fault on the card
`generally causes the whole card to lose sync and fail.
`The 601 PowerPC’s operate at a bus clock rate of 40
`MHz and an internal clock rate of 80 MHz. The on chip
`L1 cache of each 601 PowerPC maintains cache
`coherency using standard 601 PowerPC bus snooping
`protocol. Snoops are also presented to the L1 caches
`during YO by the memory controller so that VO activity
`is cache coherent.
`L2 Memory and PC Card Interconnect
`The L2 memory is the PC card’s primary storage
`array. The L2 memory is shared by the 601 PowerPC
`Processors, I/O channel interfaces, and the symbol plane
`(L3) memory interface. The array is 64 Mbytes of high
`speed DRAM, physically organized into two independent
`“banks” that are interleaved by modulo 256-byte address
`blocks. A multi-ported interconnect fabric/controller
`manages all data exchanges between PC card
`components.
`This
`interconnect
`provides
`for
`simultaneous access to both L2 banks at 320 MB/s for
`each bank. A 32 MB portion of this L2 memory is
`logically partitioned from the array and operated by the
`interconnect as a cache for the SP (L3) memory. Tag
`arrays for this cache are internal to this interconnect.
`The shared L2 cache has the following features:
`32,768 direct mapped lines.
`1024 byte line with four 256 byte blocks per line,
`each with individual valid and modified status bits.
`Store-in operation with hardware managed automatic
`write-back of dirty lines on reuse.
`Critical L1 line first, block fill, on read misses.
`
`416
`
`Page 3 of 10
`
`SAMSUNG EXHIBIT 1070
`
`

`

`0 Programmable line fill, a miss can fill either the
`entire line or just the accessed block.
`All processor access to L3 memory is through the L2
`cache. The remaining portion of the L2 array is directly
`accessible. This L2 direct memory can be used for
`kernel code storage, application processor private storage
`(stacks, automatic variables, ... ) and for YO buffers, to
`avoid undesirable interactions between code, private
`memory, or 1/0 buffers and L3 caching. The actual
`details of the memory hierarchy are of course not visible
`to the software (real memory appears as a single flat
`address space), but the varying access characteristics can
`be used by the virtual memory manager to tune
`performance.
`Byte parity is maintained in the L2 memory with
`the processors, U 0
`parity checks performed by
`interfaces, and the SP interface. After a failure in one
`rail and until that faulty rail is repaired, the system is
`susceptible to soft errors in the L2 memories of either of
`the two operating rails. The parity in the L2 memory
`provides for the detection of these soft errors before they
`propagate off rail. A rail fail-stops if it detects a parity
`error, signaling the surrounding voters of this condition.
`The voters use this notification to ignore inputs from the
`stopped rail. The remaining rail is able to continue in
`after a second failure, when the second failure is
`detected by the on-card parity checks. This provides
`tolerance of the most common double fault mode, a hard
`rail fail followed by a soft error in the L2 memory in one
`of the two remaining rails.
`When a faulty rail is repaired, its internal state must
`be aligned to that of the remaining rails before it can be
`successfully resynchronized with them. The bulk of this
`state is contained in the L2 memory. A special hardware
`assist performs this resynchronization of the L2 memory.
`The 64 MB L2 (which includes L3 cache content) can be
`resynchronized at 320 MB/s or in 0.2 seconds.
`The L2 memory is backed up by batteries, protecting
`against a total primary power outage. The memory
`automatically switches to battery as the power fails and
`will hold its state for a minimum of 48 hours. When
`primary power is restored, the L2 switches back to
`primary power and the system can be restarted without
`rebooting. A primary power fail interrupt provides
`adequate warning so that volatile processor state can be
`flushed to L2 memory before power is lost.
`Symbol Plane (U) Memory
`The SP memory provides of up to 4 GB of battery
`backed memory. At the PC card boundary L3 data
`words are 8 bytes (64 bits) wide plus 4 bytes (32 bits) of
`single symbol error correcting code (SSECC). This code
`word is striped across the six symbol planes, 4 data SP’s
`and 2 ECC planes. The SSECC can recover lost data
`even after the failure of an entire symbol plane.
`The SP memory has two distinct components, the
`Symbol Plane Interface (SPI) in the TPC and the SP’s
`
`themselves. Communications between the SP and the
`TPC is tightly synchronous, with active Vernier skew
`compensation being used to remove all clock skew
`effects [6].
`TPC Symbol Plane Interface
`The SPI is in the TPC. It attaches to the SP memory
`port of the PC card interconnect fabric/controller and is
`responsible for detailed control of the PC interface to the
`six SP’s.
`Beyond normal sequencing and flow
`management the SPI also provides the SSECC circuitry
`that tolerates the loss of any single SP.
`The Symbol Plane Interface consists of two identical SPI
`ASIC’s. During writes they generate the check codes for
`the data before it is passed out of the TPC. Read data
`from the SP’s is checked and lost symbols are
`reconstructed by these same ASIC’s. The generation and
`checking of the SSECC is performed entirely within the
`TPC with the SP’s voting the write data from the three
`rails. The voting makes the SP’s tolerant to the failure of
`any one of the SPI ASIC’s on any single rail, and a SPI
`ASIC failure only affects the read data bound for that
`particular rail.
`Pinning requirements and circuit density demanded that
`the SPI be partitioned into two ASIC’s. By providing
`half of the data and half of the check codes to each of the
`ASIC’s these two circuits were made identical and
`independent. Each SPI ASIC supports 8 of the 16 data
`connections to each of the SP’s. Thus each SPI ASIC
`processes 32 data bits and 16 check bits on the out-board
`side and 32 data bits and 4 parity bits on the in-board
`side.
`Internally, each of the SPI ASIC’s implements two error
`detection and correction (EDAC) logic blocks. Each
`block handles four 4-bit data nibbles (one nibble from
`each of the four data SP’s) and two 4-bit check nibbles
`(one nibble from each of the two check SP‘s). A S4EC
`Hamming-Type code [3] is implemented on each of these
`nibble groups.
`Each EDAC is able to detect and correct up to one
`nibble failure. 99.96% of all double faults are detected.
`The 4 EDAC operating in parallel are able to pipe the 64
`bits of data through the SPI chip set at full bus speeds.
`The prototype made several expediency compromises
`which would not necessarily be reflected in a product. A
`nibble code was chosen for decodedencoder efficiencies,
`minimizing ASIC design complexity. An eight bit
`symbol would probably be chosen for a product as it
`provides better scalability to larger memories. Two error
`correcting planes can protect up to 128 data planes [3] in
`such a configuration. Additionally, a sixteen bit wide
`data path operating at a relatively modest 40 Mhz. was
`used. This avoided the need for special driver and
`receiver macros for the ASIC’s. A product would likely
`invest in these macro’s boosting the serial data rate by a
`
`417
`
`Page 4 of 10
`
`SAMSUNG EXHIBIT 1070
`
`

`

`factor 8 and narrowing the data path to individual
`symbol planes accordingly. With a narrower data paths
`to each plane, and a scalable error correction code, it is
`possible to design the ASIC’s to handle 4, 8 or 16 data
`planes without exhausting available pins. A large
`memory of 16 planes has a more attractive 12.5% ECC
`overhead, while retaining the option of operating with
`fewer planes in smaller configurations.
`The prototype’s 16 bit wide interface to each symbol
`plane is also used to transmit commands and addresses to
`the symbol planes. Note that identical commands and
`address must be transmitted to each symbol plane.
`During address and command transfers the EDAC
`circuits are bypassed and identical information is sent to
`each symbol plane. The resultant SPI /SP bandwidth is
`320 MB/s for data (useful excluding SSECC) and 80
`MB/s for addresses and commands. As an example a
`READ cache-line command can be transmitted to the
`Symbol Planes in 3 cycles (3 bytes of identical command
`and address information to each plane), and the line can
`be returned in 128 cycles (1024 bytes at 8 bytes/cycle).
`The SPI also provides inter-rail exchange of L2 data
`during L2 resynchronization, and maintenance system
`access to the Symbol Planes.
`Symbol Plane
`Each s mbol plane may be populated with up to 1
`GByte (23 iK = 1,073,741,824 bytes) of battery backed
`DRAM. All logic and data path functions in a SP are
`implemented in a single ASIC, the SPC-ASIC. Each
`SPC manages a logical half duplex 2 byte data interface
`with the SPI’s in the TPC. Data, command, and address
`information is transferred at 2 bytes per clock cycle. The
`SP has an local clock system that is slaved to the fault
`tolerant clock system in the TPC. Refresh of the DRAM
`is managed on the SPC with provisions for synchronizing
`the refresh between all of the SP’s in the system. When
`primary power is lost to the SP, the SPC automatically
`switches to battery power until primary power is
`restored.
`A SPC handles the triplication in the SPI interface.
`The logical 2 byte interface consists of three replicated 2
`byte interfaces, one to each of the three rails of the TPC.
`Data, commands and address information arriving from
`the TPC are majority voted at the receiving SPC
`interface to mask (and detect) errors caused by failed
`TPC rails and rail out-of-sync conditions. Data to be
`transmitted to the TPC is first triplicated, then buffered
`and sent to each of the three PC cards. In a hard rail fail
`scenario,
`the SPC voter circuits will additionally
`functions so as to pass good data when one of the two
`operational rails transmits a fault code instead of a valid
`datakommand symbol. This occurs when a parity error
`is detected in a PC card.
`Data is stored on a SP in 64-bit words. Access to a
`SP is on a 64-bit word basis only. Since multiple SP’s
`always operate in parallel, this corresponds to a 32 byte
`
`block minimum access granularity to the TPC. All
`processor access to SP address space are cached in the
`TPC. Thus processor loads or stores can only result in
`cache-line operations at the SP memory. U 0 operations
`can be as small as the 32 byte granularity of the SP
`memory.
`Within a symbol plane each 64-bit data word is
`coded into a 72-bit SEC-DED code. This ECC prevents
`common single bit soft errors in the DRAM from causing
`symbol errors at the SPI. The DRAM chip organization
`is not by one, so that this code is not adequate to correct
`all chip failures, but it does correct many single chip
`hard failures. The ECC does detect all single chip errors
`and these are flagged to the SSECC in the TPC. Chip
`fails which cannot be corrected on the SP are corrected
`by the SPI’s SSECC code.
`Contaminated or invalid data in an SP must be
`regenerated using the SSECC and the data from good
`SP’s. This most commonly occurs after an SP failure has
`been repaired, or after a routine upgrade to SP memory,
`when an entire SP is uninitialized. The entire content of
`a SP can be regenerated by reading and rewriting all of
`memory and using the SPI SSECC circuitry to regenerate
`each word as it is read from SP memory. This scrubbing
`of the symbol plane data is performed under the control
`of the SPC upon command from the maintenance
`processor. A scrub is accomplished by sequentially
`reading
`the words
`the symbol plane memory,
`in
`transmitting them to the SPI which passes the data
`through its SSECC circuitry, regenerating any lost
`symbols, internally wrapping the data in the SPI and
`transmitting it back to the SPC’s where it is ultimately
`rewritten, correcting the content of SP DRAM memory.
`Two types of scrubs are supported:
`1. High Speed Scrub (160 MB/s peak). Normal system
`operations can be performed while a high speed scrub
`is in progress but the scrub has some impact on
`system performance.
`2. Background Scrub (<1 MB/s). Background scrub is
`performed at a slow enough rate so that it has no
`impact on the system throughput.
`There are three normal access modes that are
`supported by the SPC. These are:
`1. Cache block fetch - reads a 256 byte block from the
`SP memory for storing in a L2 cache line. The
`ordering of this read is critical double word in the
`critical L1 line first, with the remainder of the critical
`L1 line filled in a round robin fashion and then the
`remaining L1 lines of the L2 line filled in a round
`robin fashion with the double words within these L1
`lines filled in ascending address order. Returning the
`critical double word first minimizes the latency as
`seen by a PowerPC 601 for a read access that misses
`both the L1 and L2 caches. The likelihood of a
`subsequent PowerPC 601 stall due to the same L2
`cache miss is minimized by returning the remainder
`of the critical L1 line immediately following the
`
`418
`
`Page 5 of 10
`
`SAMSUNG EXHIBIT 1070
`
`

`

`critical double word.
`2. Cache block write - supports write-back of a 256 byte
`L2 cache lines as it is being cast out. The data
`ordering for cache writes is strictly ascending address
`order.
`3. Asynchronous readwrite operation - is used for in-
`memory data moves which are asynchronous with
`respect
`to
`processor
`instruction
`execution.
`Asynchronous memory to memory moves are most
`commonly used by YO device drivers.
`YO Subsystem
`Logically, YO interfacing follows the model of a
`bridge between
`the processor private memory
`interconnect (bus) and industry standard YO buses. The
`architecture has been tailored and optimized to support
`the requirements of both the server environment (high
`bandwidth, numerous diverse device connectivity) and
`fault tolerance (fault containment, fault masking, and
`redundant paths).
`Typical desk top and desk side machines bridge to
`the VO bus in a single VLSI chip. This implementation
`is physically constrained as this single chip must be
`electrically close to two high performance buses -- the
`processor private interconnect and the YO bus. This has
`the practical effect of limiting the number of VO buses to
`one, or two, which is problematic for servers, and does
`not provide good granularity with respect to failures.
`A fault tolerant machine, such as the DM/6000 needs
`to limit the impact of channel path failures. This
`demands a minimum of two YO bus subsystems,
`preferably more for a large server. The DM/6000
`additionally requires voting (for output) and replication
`(for input) in the bridge. All these requirements must be
`provided
`in a
`fashion
`that
`supports concurrent
`maintenance with hot-pluggability .
`The DM/6000 solution to all of these requirements is
`a “split-chip” bridge design. The bridge chip is split into
`two pieces, one to interface with the processor/memory
`bus and one to interface with the YO bus. This is shown
`in Figure 3 (triplication of the processor core is not
`shown). Between the two halves is a (triplicated), full
`duplex, point-to-point byte serial link.
`The link
`signaling technology supports full processor speed
`communications with an YO subsystem located up to 10
`meters away from the Tpc.
`The DM/6000 PC card supports up to four Link
`Interface ASIC’s (the LI ASIC’s) connected to the PC
`interconnect’s YO port. On the YO channel side the LI,
`connects to the high speed full duplex byte serial link to
`a remote (off card) bus controller ASIC (the Remote Bus
`Controller or RBC ASIC). The link implements a
`transaction protocol with
`retry and allows
`two
`transactions to be outstanding on the link in each
`direction The RBC ASIC is YO bus specific, and in the
`case of the prototype, bridges to a Micro Channel (RBC-
`MCA).
`
`“0- - -
`
`Remote
`Bus
`Controller
`
`In-bound Llnk
`
`VO
`Link
`Interface
`
`Figure 3. Logical VO Structure
`The central processor burden in managing the system
`VO is of critical importance in a high performance
`server. The DM/6000 relegates the mundane tasks of
`controlling the peripheral devices to YO processors
`located in the U 0 bus subsystems. Communications
`between the TPC processors and the YO controllers
`consists of ‘START-IO’ commands (messages) from the
`TPC processors and completion status or exception
`interrupts from the YO controllers. This technique has
`been widely used in high performance computing
`systems and proved capable of insulating the processor
`core from the relatively slow response characteristics of
`YO subsystems.
`The YO subsystem is allowed DMA access to the
`memory of the TPC complex. The TPC processors have
`memory mapped program (load/store) access to the VO
`subsystem address space. A TPC processor may start
`YO by a single memory mapped store to an interrupt
`request register in the RBC, passing a pointer to an
`argument list (or control block) in memory. This
`“START-IO’ write executes in a single cycle of the
`main processor, the processor execution stream is not
`interlocked with successful completion of the requested
`write, but is immediately released.
`When the YO subsystem gets the START-IO it
`fetches the argument list or control block from storage,
`performs any requested functions, returning status
`through the control block, and signals the TPC upon
`completion. Any data movement to or from main store
`the VO
`is
`subsystem’s
`responsibility. Hardware
`exceptions, which might occur during the START-IO
`write are captured by the LI in status registers and
`presented to the TPC processor as a normal YO
`exception interrupt.
`The TPC processor must also protect itself against
`passive failures of an YO subsystem by appropriate
`software
`time-outs and other defensive measures.
`Protection from errant YO DMA (unrestricted writes by
`the YO subsystem to main memory) is provided by
`memory protection and mapping functions in the LI
`ASIC, which can be used to restrict VO DMA access to a
`(relocateable) YO buffer . This protection mechanism is
`within the triplicated portion of the system, normal
`majority voting suffices to protect the system from LI
`mapping hardware failures.
`The LI-RBC design is a rather straight forward
`imitation of a plausible but unexceptional YO
`architecture for non-lT machine that addresses server
`connectivity and performance issues. This unexceptional
`characteristic
`is considered a virtue here
`in
`that
`
`419
`
`Page 6 of 10
`
`SAMSUNG EXHIBIT 1070
`
`

`

`loss of sync in the TPC. The classic Byzantine fault
`handling algorithm, used in the F K X [7] for example,
`has the serious drawback that it requires that all
`incoming data be fully exchanged among TPC rails.
`With a 9 bit wide data path (control flag and an 8 bit
`data byte) this exchange requires 36 pins per VO channel
`path for inter-rail data exchange. Total pin count for
`each channel including the full duplex RBC-LI path is
`about 62 pins (including clocking and parity lines), or
`about 250 pins for all four channels. This was not
`feasible for this design as there are inadequate pins on
`the TPC card or on the LI ASIC to dedicate 250 pins to
`VO channel paths.
`A new Byzantine fault resistant input algorithm was
`developed to address this problem. This algorithm is
`illustrated in Figure 5 . Incoming data passes through a
`pipeline.
`A byte (9 bits) and a parity bit is replicated by the
`RBC and transmitted to the LI in each of the three rails
`of the TPC during each clock. On receipt, each LI
`computes a single bit parity signature (PS) per cycle
`using only the 9 bit byte (received parity is ignored in
`this calculation). This PS is computed ov

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket