throbber
JAN/FEB 1998
`
`Chips, Systems, Software, and Applications
`
`STARFIRE:
`Extending the SMP Envelope
`
`Copyright © 1998 Institute of Electrical and Electronics Engineers.
`Reprinted, with permission, from Jan/Feb IEEE Micro.
`
`This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply
`IEEE endorsement of any of Sun Microsystem's products or services. Internal or personal use of this material is
`permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for cre-
`ating new collective works for resale or redistribution must be obtained from the IEEE by sending a blank email
`message to info.pub.permission@ieee.org.
`
`By choosing to view this document, you agree to all provisions of the copyright laws protecting it.
`
`

`
`

`
`.
`
`STARFIRE:
`
`Extending the SMP Envelope
`
`Alan Charlesworth
`
`Sun Microsystems
`
`Point-to-point routers
`
`and an active
`
`centerplane with four
`
`address routers are
`
`key components of
`
`today’s largest
`
`uniform-memory-
`
`access symmetric
`
`New and faster processors get all the
`
`glory, but the interconnect “glue”
`that cements the processors togeth-
`er should get the headlines. Without high-
`bandwidth and low-latency interconnects,
`today’s high-powered servers wouldn’t exist.
`The current mainstream server architec-
`ture is the cache-coherent symmetric multi-
`processor (SMP), illustrated in Figure 1. The
`technology was developed first on main-
`frames and then moved to minicomputers.
`Sun Microsystems and other server vendors
`brought it to microprocessor-based systems
`in the early 1990s.
`In an SMP, one or more levels of cache for
`each processor retain recently accessed data
`that can be quickly reused, thereby avoiding
`contention for further memory accesses.
`When a processor can’t find needed data in
`its cache, it sends out the address of the mem-
`ory location it wants to read onto the system
`bus. All processors check the address to see
`if they have a more up-to-date copy in their
`caches. If another processor has modified the
`data, it tells the requester to get the data from
`it rather than from memory. For a processor
`to modify a memory location, it must gain
`ownership of it. Upon modification, any other
`processor that has a cached copy of the data
`being modified must invalidate its copy.
`
`This checking of the address stream on a
`system bus is called snooping, and a proto-
`col called cache coherency keeps track of
`where everything is. For more background
`on cache-coherent multiprocessing systems,
`see Hennessy and Patterson (chapter 8).1
`A central system bus provides the quick-
`est snooping response and thus the most
`efficient system operation, especially if many
`different processors frequently modify
`shared data. Since all memory locations are
`equally accessible by all processors, there is
`no concern about trying to have applications
`optimally place data in one memory mod-
`ule or another. This feature of a traditional
`SMP system is called uniform memory
`access.
`
`Keeping pace
`A snoopy bus is a good solution only so
`long as it is not overloaded by too many
`processors making too many memory
`requests. Processor speeds have been
`increasing by 55% per year.1 With increased
`levels of component integration, we can
`squeeze more processors into a single cab-
`inet. Sun has pushed the envelope of SMP
`systems through three generations of
`snoopy-bus-based uniform-memory-access
`interconnects: MBus2 and XDBus,2 and the
`
`Processor
`
`Processor
`
`Processor
`
`Processor
`
`I/O system
`
`Cache
`
`Cache
`
`Cache
`
`Cache
`
`Cache
`
`Memory
`module
`
`Memory
`module
`
`Memory
`module
`
`Memory
`module
`
`Snoopy
`system bus
`
`multiprocessor system.
`
`Figure 1. Symmetric multiprocessor.
`
`0272-1732/98/$10.00 © 1998 IEEE
`
`January/February 1998 39
`
`

`
`.
`
`Starfire
`
`Ultra Port Architecture.3 Table 1 shows the characteristics of
`these architectures.
`Our system architects have made the following cumula-
`tive improvements in bus capacity:
`
`line width allows twice as much data bandwidth for a
`given snoop rate.
`• Widened the data wires from 8 bytes to 16 bytes to move
`twice as much data per clock.
`
`• Increased bus clock rates from 40 MHz to 100 MHz by
`using faster bus-driving logic.4
`• Changed from a circuit-switched protocol to a packet-
`switched protocol. In a circuit-switched organization,
`each processor’s bus request must complete before the
`next can begin. Packet switching separates the requests
`from the replies, letting bus transactions from several
`processors overlap.
`• Separated the addresses and data onto distinct wires so
`that addresses and data no longer have to compete with
`each other for transmission.
`• Interleaved multiple snoop buses. Using four address
`buses allows four addresses to be snooped in parallel.
`The physical memory space is divided into quarters, and
`each address bus snoops a different quarter of the
`memory.
`• Doubled the cache block size from 32 bytes to 64. Since
`each cache block requires a snoop, doubling the cache
`
`The combined effect of these improvements has been to
`increase bus snooping rates from 2.5 million snoops per
`second on our 600MP in 1991 to 167 million snoops per sec-
`ond on the Starfire in 1997—a 66-fold increase in six years.
`Combined with a two-times wider cache line, this has
`allowed data bandwidths to increase by 133 times.
`
`Ultra Port Architecture interconnect
`All current Sun workstations and servers use Sun’s Ultra
`Port Architecture.3 The UPA provides writeback MOESI
`(exclusive modified, shared modified, exclusive clean, shared
`clean, and invalid) coherency on 64-byte-wide cache blocks.
`The UPA uses packet-switched transactions with separate
`address and 18-byte-wide data lines, including two bytes of
`error-correcting code (ECC).
`We have developed small, medium, and large implemen-
`tations of the Ultra Port Architecture, as shown in Figure 2,
`to optimize for different parts of the price spectrum. The
`
`Table 1. Sun interconnect generations.
`
`Architecture
`
`MBus
`1990
`
`XDBus
`1993
`
`Ultra Port Architecture
`1996
`
`Bus improvements
`Bus clock
`Bus protocol
`Address and data information
`Maximum number of interleaved buses
`Cache block size
`Data port width
`Maximum interconnect performance
`Snoops/bus/clock
`Maximum snooping rate
`Corresponding maximum data bandwidth
`
`40 MHz
`Circuit switched
`Multiplexed on same wires
`1
`32 bytes
`8 bytes
`
`40-55 MHz
`Packet switched
`Multiplexed on same wires
`4
`64 bytes
`8 bytes
`
`83.3-100 MHz
`Packet switched
`Separate wires
`4
`64 bytes
`16 bytes
`
`1/16
`2.5 million/s
`80 MBps
`
`1/11
`20 million/s
`1,280 MBps
`
`1/2
`167 million/s
`10,667 MBps
`
`Ultra 6000
`6-30 processors
`
`Starfire Ultra 10000
`24-64 processors
`
`16 x 16 data crossbar
`
`System
`board
`
`System
`board
`
`System
`board
`
`Four address buses
`
`(c)
`
`32-byte-wide data bus
`
`System
`board
`
`System
`board
`
`System
`board
`
`Address bus
`
`(b)
`
`Ultra 450
`1-4 processors
`
`5 x 5 data crossbar
`
`Processor
`Processor
`Processor
`Processor
`I/O bridges
`Memory
`
`System controller
`
`(a)
`
`Figure 2. Three Ultra Port Architecture implementations: (a) a small system consisting of a single board with four proces-
`sors, I/O interfaces, and memory; (b) a medium-sized system with one address bus and a wide data bus between boards;
`and (c) a large system with four address buses and a data crossbar between boards.
`
`40 IEEE Micro
`
`

`
`.
`
`• Point-to-point routing. We wanted to keep failures on
`one system board from affecting other system boards,
`and we wanted the capability to dynamically partition
`the system. To electrically isolate the boards, we used
`point-to-point router ASICs (application-specific inte-
`grated circuits) for the entire interconnect—data, arbi-
`tration, and the four address buses. Also, across a large
`cabinet, point-to-point wires can be clocked faster than
`bused signals.
`• An active centerplane. The natural place to mount the
`router ASICs was on the centerplane, which is physi-
`cally and electrically in the middle of the system.
`• A system service processor (SSP). On our previous system
`it was very useful to have a known-good system that
`was physically separate from the server. We connected
`the SSP via Ethernet to Starfire’s control boards, where
`it has access to internal ASIC status information.
`
`Starfire interconnect
`Like most multiboard systems, Starfire has a two-level
`interconnect. The on-board interconnect conveys traffic from
`the processors, SBus cards, and memory to the off-board
`address and data ports. The centerplane interconnect trans-
`fers addresses and data between the boards.
`Memory accesses always traverse the global interconnect,
`even if the requested memory location is physically on the
`same board. Addresses must be sent off board anyway to
`accomplish global snooping. Data transfers are highly
`pipelined, and local shortcuts to save a few cycles would
`have unduly complicated the design. As with the rest of the
`Ultra server family, Starfire’s uniform-memory-access time is
`independent of the board where memory is located.
`Address interconnect. Table 2 (next page) characterizes
`the address interconnect. Address transactions take two
`cycles. The two low-order cache-block address bits deter-
`mine which address bus to use.
`Data interconnect. Table 3 characterizes the data inter-
`connect. Data packets take four cycles. In the case of a load-
`miss, the missed-upon 16 bytes are sent first. The Starfire
`
`Bandwidth at 83.3-MHz clock (MBps)
`
`Memory bandwidth
`Snooping capacity
`Data-crossbar capacity
`with random addresses
`
`21,333
`
`16,000
`
`10,667
`
`5,333
`
`Snoop
`limited
`
`Data-crossbar limited
`
`4
`
`5
`
`6
`
`9 10 11 12 13 14 15 16
`8
`7
`System boards
`
`0
`
`small system’s centralized coherency controller and small
`data crossbar provide the lowest possible cost and memory
`latency within the limited expansion needs of a single-board
`system. The medium-sized system’s Gigaplane bus4 provides
`a broad range of expandability and the lowest possible mem-
`ory latency. In the large system, Starfire’s multiple-address
`broadcast routers and data crossbar extend the UPA family’s
`bandwidth by four times and provide the unique ability to
`dynamically repartition and hot swap the system boards.
`
`Starfire design choices
`In the fall of 1993, we set out to implement the largest cen-
`terplane-connected, Ultra Port Architecture-based system that
`could be squeezed into one large cabinet. Our goals were to
`
`• increase system address and data bandwidth by four
`times over the medium-sized Ultra 6000 system,
`• provide a new dimension of Unix server flexibility with
`Dynamic System Domains, and
`• improve system reliability, availability, and serviceability.
`
`Our team had already implemented three previous gen-
`eration enterprise servers. When we started designing the
`Starfire, our top-of-the-line product was the 64-processor
`CS6400, which used the SuperSparc processor and the
`XDBus interconnect. Since the scale of the CS6400 worked
`well, we decided to carry over many of its concepts to the
`new UltraSparc/UPA technology generation.
`We made the following design choices:
`
`256
`
`192
`
`128
`
`64
`
`Bytes per clock
`
`• Four-way interleaved address buses for the necessary
`snooping bandwidth. This approach had worked well
`on our 64-processor XDBus-generation system. Each
`address bus covers 1/4 of the physical address space.
`The buses snoop on every other cycle and update the
`duplicate tags in alternate cycles. At an 83.3-MHz system
`clock, Starfire’s coherency rate is 167 million snoops per
`second. Multiplied by the Ultra Port Architecture’s 64-
`byte cache line width, this is enough for a 10,667-
`megabyte-per-second (MBps)
`data rate.
`• A 16 · 16 data crossbar. To
`match the snooping rate, we
`chose a 16 · 16 interboard data
`crossbar having the same 18-
`byte width as the UPA data bus.
`Figure 3 shows how the snoop-
`ing and data bandwidths relate
`as the system is expanded. Since
`the snooping rate is a constant
`two snoops per clock, while the
`data crossbar capacity expands
`as boards are added, there is
`only one point of exact balance,
`at about 13 boards. For 12 and
`fewer boards, the data crossbar
`governs the interconnect capac-
`ity; for 14 to 16 boards, the
`snoop rate sets the ceiling.
`
`0
`
`1
`
`2
`
`3
`
`Figure 3. Snooping and data interconnect capacity.
`
`January/February 1998 41
`
`

`
`.
`
`Starfire
`
`Table 2. Address interconnect.
`
`Unit
`
`ASIC type
`
`Purpose
`
`ASICs on
`ASICs per
`system board centerplane
`
`Port controller
`Coherency interface controller
`Memory controller
`UPA to SBus
`Local address arbiter
`Global address arbiter
`Global address bus
`
`PC
`CIC
`MC
`SYSIO
`LAARB mode of XARB
`GAARB mode of XARB
`GAB mode of 4 XMUXes
`
`Controls two UPA address-bus ports
`Maintains duplicate tags to snoop local caches
`Controls four DRAM banks
`Bridges UPA to SBus
`Arbitrates local address requests
`Arbitrates global requests for an address bus
`Connects a CIC on every board
`
`3
`4
`1
`2
`1
`0
`0
`
`0
`0
`0
`0
`0
`4
`16
`
`Unit
`
`ASIC type
`
`Purpose
`
`ASICs on
`ASICs per
`system board centerplane
`
`Table 3. Data interconnect.
`
`UltraSparc data buffer
`
`Pack/unpack
`
`Data buffer
`Local data arbiter
`Local data router
`
`Global data arbiter
`Global data router
`
`Pack mode of 2 XMUXes
`
`DB
`LDARB mode of XARB
`LDR mode of 4 XMUXes
`
`UDB
`
`Buffers data from the processor;
`generates and checks ECC
`Assembles and disassembles data into
`72-byte memory blocks
`Buffers data from two UPA data-bus ports
`Arbitrates on-board data requests
`Connects four Starfire data buffers
`to a crossbar port
`Arbitrates requests for the data crossbar
`GDARB
`GDR mode of 12 XMUXes 16 · 16 · 18-byte crossbar between the boards
`
`8
`
`4
`4
`1
`
`4
`0
`0
`
`0
`
`0
`0
`0
`
`0
`2
`12
`
`data buffer ASICs provide temporary storage for packets that
`are waiting their turn to be moved across the centerplane.
`The local and global routers are not buffered, and transfers
`take a fixed eight clocks from the data buffer on the send-
`ing board to the data buffer on the receiving board.
`Interconnect operation. An example of a load-miss to
`memory illustrates Starfire’s interconnect operation. The
`interconnect diagram in Figure 4 shows the steps listed in
`Table 4 (on page 44).
`Buses versus point-to-point routers. Starfire’s pin-to-
`pin latency for a load-miss is 38 clocks (468 nanoseconds),
`counting from the cycle when the address request leaves the
`processor through the cycle when data arrives at the proces-
`sor. The medium-sized Ultra 6000’s bus takes only 18 clocks
`(216 ns).
`Buses have lower latencies than routers: a bus takes only
`1 clock to move information from one system component to
`another. A router, on the other hand, takes 3 clocks to move
`information: a cycle on the wires to the router, a cycle inside
`the routing chip, and another cycle on the wires to the receiv-
`ing chip. Buses are the preferred interconnect topology for
`small and medium-sized systems.
`Ultra 10000 designers used routers with point-to-point
`interconnects to emphasize bandwidth, partitioning, and reli-
`ability, availability, and serviceability. Ultra 6000 designers
`used a bus to optimize for low latency and economy over a
`broad product range.
`
`Starfire packaging
`Starfire’s cabinet is 70 inches tall · 34 inches wide · 46
`inches deep. Inside are two rows of eight system boards
`mounted on either side of a centerplane. Starfire is our fourth
`generation of centerplane-based systems.
`Besides the card cage, power supply, and cooling system,
`the cabinet has room for three disk trays. The remaining periph-
`erals are housed separately in standard Sun peripheral racks.
`Starfire performs two levels of power conversion. Up to
`eight N + 1 redundant bulk supplies convert from 220 Vac to
`48 Vdc, which is then distributed to each board. On-board
`supplies convert from 48 Vdc to 3.3 and 5 Vdc. Having local
`power supplies facilitates the hot swap of system boards.
`Starfire uses 12 hot-pluggable fan trays, half above and
`half below the card cage. Fan speed is automatically con-
`trolled to reduce noise in normal environmental conditions.
`Centerplane. The centerplane holds the 20 address ASICs
`and 14 data ASICs that route information between the 16 sys-
`tem-board sockets. It is 27 inches wide · 18 inches tall · 141
`mils thick, with 14 signal layers and 14 power layers. The
`net density utilization is nearly 100%. We routed approxi-
`mately 95% of the 14,000 nets by hand. There are approxi-
`mately 2 miles of wire etch and 43,000 holes.
`Board spacing is 3 inches to allow enough airflow to cool
`the four 45-watt processor modules on each system board.
`Signal lengths had to be minimized to run a 10-ns system
`clock across 16 boards. The maximum etched wire length is
`
`42 IEEE Micro
`
`

`
`.
`
`Address buses
`
`Global address bus 3
`Global address arbiter 3
`
`Global address bus 2
`Global address arbiter 2
`
`Global address bus 1
`Global address arbiter 1
`
`1 1 1 1
`
`Address phase
`
`Read phase
`
`Data phase
`
`Write phase
`
`Active units
`
`Address and
`data flow
`
`Arbitration and
`control flow
`
`Global address bus 0
`Global address arbiter 0
`
`January/February 1998 43
`
`5
`
`7 8
`
`3
`
`4
`
`Coherency
`6
`interface
`controller
`and Dtag
`SRAMs
`9
`
`Coherency
`interface
`controller
`and Dtag
`SRAMs
`
`Coherency
`interface
`controller
`and Dtag
`SRAMs
`
`Coherency
`interface
`controller
`and Dtag
`SRAMs
`
`7 8
`
`6
`Coherency
`interface
`controller
`and Dtag
`SRAMs
`
`Coherency
`interface
`controller
`and Dtag
`SRAMs
`
`Coherency
`interface
`controller
`and Dtag
`SRAMs
`
`Coherency
`interface
`controller
`and Dtag
`SRAMs
`
`2
`
`Local
`address
`arbiter
`
`2
`
`3
`
`1
`
`Port
`controller
`
`Port
`controller
`
`Port
`controller
`6
`
`Memory
`controller
`
`Requesting board
`
`UltraSparc
`3
`module
`
`UltraSparc
`module
`
`UltraSparc
`module
`
`UltraSparc
`module
`
`I/O
`module
`
`Pack/
`unpack
`
`Pack/
`unpack
`
`Two
`mem
`banks
`
`Two
`mem
`banks
`
`1
`
`2
`
`Data buffer
`
`Data buffer
`
`Data buffer
`
`Data buffer
`
`3
`
`7
`
`Data crossbar
`
`2
`
`Local
`data
`arbiter
`
`Local data router
`
`6
`
`1
`
`2
`
`Local
`data
`arbiter
`
`1
`
`Responding board
`
`Local
`address
`arbiter
`
`UltraSparc
`module
`
`UltraSparc
`UltraSPARC
`module
`module
`
`UltraSparc
`module
`
`UltraSparc
`module
`
`Port
`controller
`
`Port
`controller
`
`I/O
`module
`
`Port
`controller
`
`9
`
`6
`
`Pack/
`4
`unpack
`
`3
`
`Pack/
`unpack
`
`Two
`mem
`banks
`
`Two
`mem
`banks
`
`2
`
`1
`Memory
`controller
`
`Data buffer
`
`Data buffer
`
`Data buffer
`
`Data buffer
`
`5
`
`Local data router
`
`5
`
`3
`
`4
`
`Global data arbiter
`Global data router
`
`Figure 4. Interconnect steps for a load-miss from memory.
`
`

`
`.
`
`Starfire
`
`Table 4. Interconnect sequence for a load-miss to memory.
`
`Phase
`
`Steps
`
`Clocks
`
`Send
`address
`and
`establish
`coherency
`
`Read
`from
`memory
`
`Transfer
`data
`
`Write
`data
`
`1. Processor makes a request to its port controller.
`2. Port controller sends the address to a coherency interface controller, and sends the request
`to the local address arbiter.
`3. Local address arbiter requests a global address bus cycle.
`4. Global address arbiter grants an address bus cycle.
`5. Coherency interface controller sends the address through the global address bus to the rest
`of the coherency interface controllers on the other boards.
`6. All coherency interface controllers relay the address to their memory controllers and snoop
`the address in their duplicate tags.
`7. All coherency interface controllers send their snoop results to the global address arbiter.
`8. Global address arbiter broadcasts the global snoop result.
`9. Memory is not aborted by its coherency interface controller because the snoop did not hit.
`
`1. Memory controller recognizes that this address is for one of its memory banks.
`2. Memory controller orchestrates a DRAM cycle and requests a data transfer from its local
`data arbiter.
`3. Memory sends 72 bytes of data to the unpack unit.
`4. Unpack splits the data into four 18-byte pieces.
`5. Unpack sends data to the data buffer to be buffered for transfer.
`
`1. Local data arbiter requests a data transfer.
`2. Global data arbiter grants a data transfer and notifies the receiving local data arbiter that
`data is coming.
`3. Sending local data arbiter tells the data buffer to begin the transfer.
`4. Sending data buffer sends data to the local data router.
`5. Data moves through the local data router to the centerplane crossbar.
`6. Data moves through the centerplane crossbar to the receiving board’s local data router.
`7. Data moves through the receiving local data router to the receiver’s data buffer.
`
`1. Port controller tells the data buffer to send the data packet.
`2. Data buffer sends data to the UltraSparc data buffer on the processor module.
`3. UltraSparc data buffer sends data to the processor.
`
`13
`
`13
`
`8
`
`4
`
`approximately 20 inches. After extensive cross-talk analysis,
`we developed and implemented a novel method for
`absolute–cross-talk minimization on long, minimally cou-
`pled lines.
`We distributed clock sources through the centerplane to
`each system board ASIC. Routing all traces on identical topolo-
`gies minimizes skew between clock arrivals at the ASICs.
`We used unidirectional, point-to-point, source-terminated
`CMOS to implement the 144-bit-wide 16 · 16 data crossbar
`and the four 48-bit-wide address-broadcast routers. We
`designed and tested the system to run at 100 MHz at worst-
`case temperatures and voltages. However, the UltraSparc-II
`processor constrains the system clock to be 1/3 or 1/4 of the
`processor clock. As of this writing, the processor clock is 250
`MHz, so the system clock is 250/3 = 83.3 MHz.
`System boards. The system boards, shown in Figure 5
`(next page), each hold six mezzanine modules on the top
`side: the memory module, the I/O module, and four proces-
`sor modules. The bottom side has nine address ASICs, nine
`data ASICs, and five 48-volt power converters. The boards are
`16 · 20 inches with 24 layers.
`
`Processor module. Starfire uses the same UltraSparc proces-
`sor module as the rest of Sun’s departmental and data cen-
`ter servers. As of this writing, the server modules have a
`250-MHz processor with 4 Mbytes of external cache.
`Memory module. The memory module contains four 576-
`bit-wide banks of memory composed of 32 standard 168-pin
`ECC DIMMs (dual in-line memory modules). It also has four
`ASICs, labeled Pk, which pack and unpack 576-bit-wide
`memory words into 144-bit-wide data-crossbar blocks. The
`memory module contains 4 Gbytes of memory using 64-Mbit
`DRAMs.
`I/O module. The current I/O module interfaces between
`the UPA and two SBuses and provides four SBus card slots.
`Each SBus has an achievable bandwidth of 100 MBps.
`ASIC types. We designed seven unique ASIC types. Six
`of them implement an entire functional unit on a single chip,
`while the seventh is a multiplexer part used to implement
`the local and global routers, as well as the pack/unpack func-
`tion. The ASICs are fabricated in 3.3-V, 0.5-micron CMOS
`technology. The largest die is 9.95 · 10 mm, with five metal
`layers. The ASICs are all packaged in 32 · 32-mm ceramic
`
`44 IEEE Micro
`
`

`
`.
`
`Pk
`
`Pk
`
`Memory module
`with 32 DIMMs
`and 4 ASICs
`
`Pk
`
`Pk
`
`Four
`SBus
`cards
`on I/O
`module
`
`Four UltraSparc
`processor modules
`
`Top
`
`Bus 0
`Bus 1
`Bus 2
`Bus 3
`Upper 72 bits
`
`AAAA
`
`DD
`
`Lower 72 bits
`
`AAAA
`
`D
`
`D
`
`D
`
`AA
`
`AA
`
`AA
`
`AA
`
`D
`
`DA
`
`DA
`
`D
`
`AAAA
`
`D
`
`D
`
`D
`
`AAA
`
`A
`
`D
`
`D
`
`Four address buses
`(20 ASICs)
`AA = GAARB
`A = GAR
`
`Dupli-
`cate
`Tags
`
`MC
`
`DB
`
`9
`address
`ASICs
`
`PC
`
`PC
`
`PC
`
`9 data
`ASICs
`
`LDARB
`
`CIC
`
`CIC
`
`LAARB
`
`CIC
`
`CIC
`
`LDR
`
`LDR
`
`LDR
`
`converters
`48-V power
`
`DB
`
`DB
`
`DB
`
`LDR
`
`Bottom
`
`Centerplane
`
`16 · 16 data crossbar
`DA = GDARB
`D = GDR
`
`Figure 5. Bottom (left) and top (right) of the system boards, with the centerplane shown in the middle.
`
`ball grid arrays with 624 pins.
`For more details on Starfire’s physical implementation, see
`Charlesworth et al.5
`Interconnect reliability. In addition to the ECC for data
`that is generated and checked by the processor module,
`Starfire ASICs also generate and check ECC for address pack-
`ets. To help isolate faults, the Starfire data-buffer chips check
`data-packet ECC along the way through the interconnect.
`Failed components. If an UltraSparc module, DIMM, SBus
`board, memory module, I/O module, system board, control
`board, centerplane support board, power supply, or fan fails,
`the system tries to recover without service interruption. Later,
`the failed component can be hot swapped out of the system
`and replaced.
`Redundant components. Customers can optionally config-
`ure a Starfire to have 100% hardware redundancy of config-
`urable components: control boards, support boards, system
`boards, disk storage, bulk power subsystems, bulk power
`supplies, cooling fans, peripheral controllers, and system ser-
`vice processors. If the centerplane experiences a failure, it
`can operate in a degraded mode. If one of the four address
`buses fails, the remaining buses will allow access to all sys-
`tem resources. The data crossbar is divided into separate
`halves, so it can operate at half-bandwidth if an ASIC fails.
`Crash recovery. A fully redundant system can always
`recover from a system crash, utilizing standby components
`or operating in degraded mode. Automatic system recovery
`enables the system to reboot immediately following a failure,
`automatically disabling the failed component. This approach
`prevents a faulty hardware component from causing the sys-
`
`tem to crash again or from keeping the entire system down.
`
`Dynamic System Domains
`Dynamic System Domains make Starfire unique among
`Unix servers. Starfire can be dynamically subdivided into
`multiple computers, each consisting of one or more system
`boards. System domains are similar to partitions on a main-
`frame. Each domain is a separate shared-memory SMP sys-
`tem that runs its own local copy of Solaris and has its own
`disk storage and network connections. Because individual
`system domains are logically isolated from other system
`domains, hardware and software errors are confined to their
`respective domain and do not affect the rest of the system.
`Thus, a system domain can be used to test device drivers,
`updates to Solaris, or new application software without
`impacting production usage.
`Dynamic System Domains can serve many purposes,
`enabling the site to manage the Starfire resources effectively:
`
`• LAN consolidation. A single Starfire can replace two or
`more smaller servers. It is easier to administer because
`it uses a single system service processor (SSP), and it is
`more robust because it has better reliability, availabili-
`ty, and serviceability features. Starfire offers the flexi-
`bility to shift resources quickly from one “server” to
`another. This is beneficial as applications grow, or when
`demand reaches peak levels and requires rapid reas-
`signment of computing resources.
`• Development, production, and test environments. In a
`production environment, most sites require separate
`
`January/February 1998 45
`
`

`
`Help
`
`!
`
`Failure
`indicator
`
`SB9
`36
`37
`38
`39
`
`SB9
`32
`32
`34
`35
`
`System
`boards
`
`The system service processor is a
`Sparc workstation that runs standard
`Solaris plus a suite of diagnostics and
`management programs. It is con-
`nected via Ethernet to a Starfire con-
`trol board. The control board has an
`embedded control processor that
`interprets the TCP/IP Ethernet traffic
`and converts it to JTAG control infor-
`mation. Figure 6 shows an example
`of Starfire’s hardware and domain
`status in the Hostview main screen.
`In this instance there are four
`domains, plus an additional board
`being tested in isolation.
`Domain implementation. Do-
`main protection is implemented at
`two levels: in the centerplane
`arbiters and in the coherency inter-
`face controllers on each board.
`filtering. Global
`Centerplane
`arbiters provide the top-level sepa-
`ration between unrelated domains.
`Each global arbiter contains a 16 · 16-bit set of domain con-
`trol registers. For each system board there is a register that,
`when the bits are set to one, establishes the set of system
`boards in a particular board’s domain group.
`Board-level filtering. Board-level filtering lets a group of
`domains view a region of each other’s memory to facilitate
`interdomain networking—a fast form of communication
`between a group of domains. As Figure 7 illustrates, all four
`coherency interface controllers on a system board have iden-
`tical copies of the following registers:
`
`Domain #2
`(3 boards)
`
`Domain #4
`(6 boards)
`
`27
`26
`25
`24
`SB6
`
`31
`30
`29
`28
`SB7
`
`• Domain mask. Sixteen bits identify which other system
`boards are in the board’s domain.
`• Group memory mask. Sixteen bits identify which other
`boards are in a board’s domain group, to facilitate mem-
`ory-based networking between domains.
`• Group memory base and limit registers. These registers
`contain the lower and upper physical addresses of the
`board’s memory that are visible to other domains in a
`group of domains. The granularity of these addresses is
`64 Kbytes.
`
`Dynamic reconfiguration. With dynamic reconfigura-
`tion, system administrators can logically move boards
`between running domains on-the-fly. The process has two
`phases: attach and detach.
`Attach. This phase connects a system board to a domain
`and makes it possible to perform online upgrades, redis-
`tribute system resources for load balancing, or reintroduce
`a board after it has been repaired. Attach diagnoses and con-
`figures the candidate system board so that it can be intro-
`duced safely into the running Solaris operating system. There
`are two steps:
`
`1. The board is added to the target domain’s board list in the
`domain configuration files on the SSP. Power-on self-test
`
`.
`
`Starfire
`
`File Edit Control
`
`Terminal View
`
`Back
`
`CSB1
`
`CB1
`
`SB15
`60
`61
`62
`63
`
`SB14
`56
`57
`58
`59
`
`SB13
`52
`53
`54
`55
`
`SB12
`48
`49
`50
`51
`
`SB11
`44
`45
`46
`47
`
`SB10
`40
`41
`42
`43
`
`ABUS2
`ABUS3
`DBUS1
`DBUS0
`ABUS1
`ABUS0
`
`03
`02
`01
`00
`SB0
`
`07
`06
`05
`04
`SB1
`
`11
`10
`09
`08
`SB2
`
`15
`14
`13
`12
`SB3
`Front
`
`19
`18
`17
`16
`SB4
`
`23
`22
`21
`20
`SB5
`
`J C
`
`CSB0
`
`CB0
`
`Being tested in isolation
`
`Power,
`temperature,
`fan buttons
`
`Support board
`
`Control board
`
`Domain #1
`(5 boards)
`
`Address and
`data interconnect
`
`Domain #3
`(1 board)
`
`Figure 6. System service processor Hostview main screen.
`
`development and test facilities. With Starfire, those func-
`tions can safely coexist in the same box. Having isolat-
`ed facilities enables development work to continue on
`a regular schedule without impacting production.
`• Software migration. Dynamic System Domains may be
`used as a way to migrate systems or application soft-
`ware to updated versions. This applies to the Solaris
`operating system, database applications, new adminis-
`trative environments, and applications.
`• Special I/O or network functions. A system domain can
`be established to deal with specific I/O devices or func-
`tions. For example, a high-end tape device could be
`attached to a dedicated system domain, which is alter-
`nately merged into other system domains that need to
`use the device for backup or other purposes.
`• Departmental systems. Multiple projects or departments
`can share a single Starfire system, simplifying cost-jus-
`tification and cost-accounting requirements.
`
`Many domain schemes are possible with Starfire’s 64 proces-
`sors and 16 boards. For example, we could make domain 1 a
`12-board (48-processor) production domain running the cur-
`rent release of Solaris. Domain 2 could be a two-board (8-
`processor) domain for checking out an early version of the
`next Solaris release. Domain 3 could be a two-board (8-
`processor) domain running a special application—for instance,
`proving that the application is fully stable before allowing it
`to run in the production domain. Each domain has its own
`boot disk and storage, as well as its own network connection.
`Domain administration. System administrators can
`dynamically switch system boards between domains or
`remove them from active domains for upgrade or servicing.
`After service, boards can be reintroduced into one of the
`active domains, all without interrupting system operation.
`Each system domain is administered from the SSP, which ser-
`vices all the domains.
`
`46 IEEE Micro
`
`

`
`.
`
`Address router 0
`
`Address router 1
`
`Address router 2
`
`Address router 3
`
`16 domain control registers
`
`16 domain control registers
`
`16 domain control registers
`
`16 domain control registers
`
`Each system board
`
`Coherency Interface
`controller 0
`Domain mask
`Domain mask
`
`Coherency Interface
`controller 1
`Domain mask
`
`Coherency Interface
`controller 2
`Domain mask
`
`Coherency Interface
`controller 3
`Domain mask
`
`Group memory mask
`Group memory mask
`
`Group memory mask
`
`Group memory mask
`
`Group memory mask
`
`Group memory base
`Group memory base
`
`Group memory base
`
`Group memory base
`
`Group memory base
`
`Group memory limit
`Group memory limit
`
`Group memory limit
`
`Group memory limit
`
`Group memory limit
`
`Data crossbar arbiter
`
`16 domain control registers
`
`Figure 7. Domain registers in the Starfire interconnect.
`
`(POST) executes, testing and configuring the board. POST
`also creates a single-board domain group that isolates the
`candidate board from the centerplane’s other system
`boards. The processors shift from a reset state into a spin
`mode, preparing them for code execution. The center-
`plane and board-level domain registers are configured to
`include the candidate board in the target domain.
`2. When these operations are complete, the Solaris

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket