throbber
Carnegie Mellon University
`Research Showcase @ CMU
`
`Computer Science Department
`
`School of Computer Science
`
`1989
`
`The design of Nectar : a network backplane for
`heterogeneous multicomputers
`Emmanuel A. Arnould
`Carnegie Mellon University
`
`Follow this and additional works at: http://repository.cmu.edu/compsci
`
`This Technical Report is brought to you for free and open access by the School of Computer Science at Research Showcase @ CMU. It has been
`accepted for inclusion in Computer Science Department by an authorized administrator of Research Showcase @ CMU. For more information, please
`contact research-showcase@andrew.cmu.edu.
`
`INTEL EX.1222.001
`
`

`

`NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS:
`The copyright law of the United States (title 17, U.S. Code) governs the making
`of photocopies or other reproductions of copyrighted material. Any copying of this
`document without permission of its author may be prohibited by law.
`
`INTEL EX.1222.002
`
`

`

`The Design of Nectar:
`A Network Backplane for Heterogeneous Multicomputer
`
`Emmanuel A. Arnould
`H. T. Kung
`
`Eric C. Cooper
`Peter A. Steenkiste
`
`Francois J. Bitz
`Robert D. Sansom
`January 1989
`
`CMU-CS-89-101
`
`School of Computer Science
`Carnegie Mellon University
`Pittsburgh, Pennsylvania 15213
`
`Also published in The Proceedings of the Third International Conference on
`Architectural Support for Programming Languages and Operating Systems, ACM, April 1989
`
`ABSTRACT
`
`Nectar is a "network backplane" for use in heterogeneous multicomputers. The initial system consists
`of a star-shaped fiber-optic network with an aggregate bandwidth of 1.6 gigabits/second and a switching
`latency of 700 nanoseconds. The system can be scaled up by connecting hundreds of these networks
`together.
`The Nectar architecture provides a flexible way to handle heterogeneity and task-level parallelism.
`A wide variety of machines can be connected as Nectar nodes and the Nectar system software allows
`applications to communicate at a high level. Protocol processing is off-loaded to powerful communication
`processors so that nodes do not have to support a suite of network protocols.
`We have designed and built a prototype Nectar system that has been operational since November
`1988. This paper presents the motivation and goals for Nectar and describes its hardware and software.
`The presentation emphasizes how the goals influenced the design decisions and led to the novel aspects
`of Nectar.
`
`This research was supported in part by Defense Advanced Research Projects Agency (DOD) monitored by the
`Space and Naval Warfare Systems Command under Contract N00039-87-C-0251, and in part by the Office of Naval
`Research under Contracts N00014-87-K-0385 and N00014-87-K-0533.
`
`INTEL EX.1222.003
`
`

`

`1
`
`Introduction
`
`Parallel processing is widely accepted as the most promising way to reach the next level of computer
`system performance. Currently, most parallel machines provide efficient support only for homogeneous,
`fine-grained parallel applications. There are three problems with such machines.
`First, there is a limit to how far the performance of these machines can be scaled up. When that
`limit is reached, it becomes desirable to exploit coarse-grained, or task-level, parallelism by connecting
`together several such machines as nodes in a multicomputer. Second, there is a limit to the usefulness of
`homogeneous systems; as we will argue below, heterogeneity (at hardware and software levels) is inherent
`in a whole class of important applications. Third, parallel machines are typically built from custom-made
`processor boards, although they sometimes use standard microprocessor components. These machines
`cannot readily take advantage of rapid advances in commercially-available sequential processors.
`Current local area networks (LANs) can be used to connect together existing machines, but this
`approach is unsatisfactory for a heterogeneous multicomputer with both general-purpose and specialized,
`high-performance machines. It is often not possible to implement efficiently the required communication
`protocols on special-purpose machines, and typical applications for such systems require higher bandwidth
`and lower latency than current LANs can provide.
`The Nectar (network computer architecture) project attacks the problem of heterogeneous, coarse(cid:173)
`grained parallelism on several fronts, from the underlying hardware, through the communication protocols
`and node operating system support, to the application interface to communication. The solution embodied
`in the Nectar architecture is a two-level structure, with fine-grained parallelism within tasks at individual
`nodes, and coarse-grained parallelism among tasks on different nodes. This system-level approach is
`influenced by our experience with previous projects such as the Warp1 systolic array machine [1] and the
`Mach multiprocessor operating system [12].
`The Nectar architecture provides a general and systematic way to handle heterogeneity and task-level
`parallelism. A variety of existing systems can be plugged into a flexible, extensible network backplane.
`The Nectar system software allows applications to communicate at a high level, without requiring each
`node to support a suite of network protocols; instead, protocol processing is off-loaded to powerful
`network interface processors.
`We have designed and built a prototype Nectar system that has been operational since November
`1988. This paper discusses the motivation and goals of the Nectar project, the hardware and software
`architectures, and our design decisions for the prototype system. We will evaluate the system with real
`applications in the coming year.
`Section 2 of this paper summarizes the goals for Nectar. An overview of the Nectar system and the
`prototype implementation is given in Section 3. The two major functional units of the system, the HUB
`and CAB, are described in Sections 4 and 5. Section 6 describes the Nectar software. Some of the
`applications that Nectar will support are discussed in Section 7. The paper concludes with Section 8.
`
`2 Nectar Goals
`
`low-latency,
`There are three major technical goals for the Nectar system: heterogeneity, scalability, and
`high-bandwidth communication. These goals follow directly from the desire to support an emerging class
`of large-grained parallel programs whose characteristics are described below.
`
`lWarp is a service mark of Carnegie Mellon University.
`
`1
`
`INTEL EX.1222.004
`
`

`

`2.1 Heterogeneity
`One of the characteristics of these applications is the need to process information at multiple, qualitatively
`different levels. For example, a computer vision system may require image processing on its raw input
`at the lowest level, and scene recognition using a knowledge base at the highest level. A speech
`understanding system has a similar structure, with low-level signal processing and high-level natural
`language parsing. The processing required by an autonomous robot might range from handling sensor
`inputs to high-level planning.
`At the lowest levels, these applications deal with simple data structures and highly regular number-
`crunching algorithms. The large amount of data at high rates often requires specialized hardware. At the
`highest level, these applications may use complicated symbolic data structures and data-dependent flow
`of control. Specialized inference engines or database machines might be appropriate for these tasks. The
`very nature of these applications dictates a heterogeneous hardware environment, with varied instruction
`sets, data representations, and performance.
`Software heterogeneity is equally significant. The most natural programming language for each task
`ranges from Fortran and C to query languages and production systems. As a result, the system must
`handle differences in programming languages, operating systems, and data representations.
`
`2.2 Scalability
`Often the most cost-effective way of extending a system to support new applications is to add hardware
`rather than replacing the entire system. Also, by including a variety of processors, the system can take
`advantage of performance improvements in commercially available computers. In the Nectar system, it
`must therefore be possible to add or replace nodes without disruption: the bandwidth and latency between
`existing tasks should not be affected significantly, and it should not be necessary to change existing
`system software. Using the same hardware design, Nectar should scale up to a network of hundreds of
`supercomputer-class machines.
`
`23 Low-Latency, High-Bandwidth Communication
`The structure of these parallel applications requires communication among different tasks, both "hori(cid:173)
`zontally" (among tasks operating at the same level of representation) and "vertically" (between levels).
`The lower levels in particular require a high data rate (megabyte images at video rates, for example).
`Moreover, applications often have response-time requirements that can only be satisfied by low-latency
`communication; two examples are continuous speech recognition and the control of autonomous vehicles.
`In general, by providing low-latency, high-bandwidth communication the system can rapidly dis(cid:173)
`tribute computations to multiple processors. This allows the efficient parallel implementation of many
`applications.
`Lowering latency is much more challenging than increasing bandwidth, since the latter can always be
`achieved by using pipelined architectures with wide data paths and high-bandwidth communication media
`such as fiber-optic lines. Latency can be particularly difficult to minimize in a large system where multi-
`hop communication is necessary. Nectar has the following performance goals for communication latency:
`excluding the transmission delays of the optical fibers, the latency for a message sent between processes
`on two CABs should be under 30 microseconds; the corresponding latency for processes residing in nodes
`should be under 100 microseconds; and the latency to establish a connection through a single HUB should
`be under 1 microsecond.
`
`2
`
`INTEL EX.1222.005
`
`

`

`I CAB
`
`NtCtfcNMt
`
`11
`mm
`
`NODE
`
`1 ^1
`
`11
`
`HUB
`
`TT
`
`GAB
`
`HUB
`
`TT
`
`CAB
`
`Figure 1: Nectar system overview
`
`3 The Nectar System
`3.1 System Overview
`
`The Nectar system consists of a Nectar-net and a set of CABs (communication accelerator boards), as
`illustrated in Figure 1. It connects a number of existing systems called nodes. The Nectar-net is built
`from fiber-optic lines and one or more HUBs. A HUB is a crossbar switch with a flexible datalink
`protocol implemented in hardware. A CAB is a RISC-based processor board serving three functions:
`it implements higher-level network protocols; it provides the interface between the Nectar-net and the
`nodes; and it off-loads application tasks from nodes whenever appropriate. Every CAB is connected to
`a HUB via a pair of fiber lines carrying signals in opposite directions. A HUB together with its directly
`connected CABs forms a HUB cluster.
`
`NODE
`
`NODE
`
`CAB CA*
`
`NODS
`
`GAB
`
`NODE [CAB
`
`FIBER LINES
`
`CAB
`
`GAB
`
`HUB
`
`NODE CABf
`
`[GAB NODE
`
`GAB GAB]
`
`CAB j
`
`NODE I NODE I
`
`I NODE I
`
`Figure 2: A single-HUB system
`
`In a system with a single HUB, all the CABs are connected to the same HUB (Figure 2). The number
`of CABs in the system is therefore limited by the number of I/O ports of the HUB.
`
`3
`
`INTEL EX.1222.006
`
`

`

`CAB
`
`INTER-HUB
`CONNECTIONS
`
`1 C A BH
`
`HUB
`
`p CAB
`
`CAB
`
`HUB
`CLUSTER
`
`Figure 3: HUB cluster
`
`To build larger systems, multiple HUBs are needed. In such systems, some of the I/O ports on each
`HUB are used for inter-HUB fiber connections, as shown in Figure 3. The HUB clusters may be connected
`in any topology appropriate to the application environment. Since the I/O ports used for HUB-HUB and
`for CAB-HUB connections are identical, there is no a priori restriction on how many links can be used
`for inter-HUB connections. Figure 4 depicts a multi-HUB system using a 2-dimensional mesh to connect
`its clusters.
`The Nectar-net offers at least an order of magnitude improvement in bandwidth and latency over current
`LANs. Moreover, the use of crossbar switches substantially reduces network contention. Moderate-size,
`high-speed crossbars (with setup latency under one microsecond) arc now practical; 8-bit wide 32 x 32
`crossbars can be built with off-the-shelf parts, and 128 x 128 crossbars are possible with custom VLSI.
`Re-engineering the software in the critical path of communication is as important for achieving low
`latency as building fast network hardware. Typical profiles of networking implementations on U N I X 2
`show that the time spent in the software dominates the time spent on the wire [3,5,11].
`There are three main sources of inefficiency in current networking implementations. First, existing
`application interfaces incur excessive costs due to context switching and data copying between the user
`process and the node operating system. Second, the node must incur the overhead of higher-level protocols
`that ensure reliable communications for applications. Third, the network interface burdens the node with
`interrupt handling and header processing for each packet.
`The Nectar software architecture alleviates these problems by restructuring the way applications
`communicate. User processes have direct access to a high-level network interface mapped into their
`address spaces. Communication overhead on the node is substantially reduced for three reasons. First, no
`system calls arc required during communication. Second, protocol processing is off-loaded to the CAB.
`Third, interrupts are required only for high-level events in which the application is interested, such as
`delivery of complete messages, rather than low-level events such as the arrival of control packets or timer
`expiration.
`
`2UNDC is a trademark of AT&T Bell Laboratories.
`
`4
`
`INTEL EX.1222.007
`
`

`

`HUB
`Cluster
`
`HUB
`Cluster
`
`HUB
`Cluster
`
`HUB
`Cluster
`
`HUB
`Cluster
`
`HUB
`
`HUB
`Cluster
`
`HUB
`Ouster
`
`SHUBI
`Cluster;
`
`Figure 4: Multi-HUB system connected in a 2-D mesh
`
`3.2 Prototype Implementation
`
`We have built a prototype Nectar system to carry out extensive systems and applications experiments.
`The system includes three board types: CAB, HUB I/O board, and HUB backplane. As of early 1989 the
`prototype consists of 2 HUBs and 4 CABs. The system will be expanded to about 30 CABs in Spring
`1989.
`In the prototype, a node can be any system running U N IX or Mach [12] with a VME interface. The
`initial Nectar system at Carnegie Mellon will have Sun-3s, Sun-4s and Warp systems as nodes.
`To speed up hardware construction, the prototype uses only off-the-shelf parts and 16 x 16 crossbars.
`The serial to parallel conversion is performed by a pair of TAXI chips manufactured by Advanced Micro
`Devices. The effective bandwidth per fiber line is 100 megabits/second, a limit imposed by the TAXI
`chips.
`When the prototype has demonstrated that the Nectar architecture and software works well for
`applications, we plan to re-implement the system in custom or semi-custom VLSI. This will lead to
`larger systems with higher performance and lower cost.
`
`4 The HUB
`
`The Nectar HUB establishes connections and passes messages between its input and output fiber lines.
`There are four design goals for the HUB:
`
`1. Low latency. The HUB provides custom hardware to minimize latency. In the prototype system,
`the latency to set up a connection and transfer the first byte of a packet through a single HUB is
`ten cycles (700 nanoseconds). Once a connection has been established, the latency to transfer a
`byte is five cycles (350 nanoseconds), but the transfer of multiple bytes is pipelined to match the
`100 megabits/second peak bandwidth of the fibers.
`
`2. High switching rate. In the prototype Nectar system, the HUB central controller can set up a new
`connection through the crossbar switch every 70 nanosecond cycle.
`
`5
`
`INTEL EX.1222.008
`
`

`

`3. Efficient support for multi-HUB systems. Because of the low switching and transfer latency of
`a single HUB, the latency of process to process communication in a multi-HUB system is not
`significantly higher. Flow control for inter-HUB communication is implemented in hardware (see
`Section 4.2.3).
`
`4. Flexibility and high efficiency. The HUB hardware implements a set of simple commands for the
`most frequently used operations such as opening and closing connections. These commands can be
`executed in one cycle by the central HUB controller. By sending different combinations of these
`simple commands to the HUB, CABs can implement more complicated datalink protocols such
`as multicast and multi-HUB connections. The HUB hardware is flexible enough to implement
`point-to-point and multicast connections using either circuit or packet switching.
`In addition,
`HUB commands can be used to implement various network management functions such as testing,
`reconfiguration, and recovery from hardware failures.
`
`4.1 HUB Overview
`The HUB has a number of I/O ports, each capable of connecting to a CAB or a HUB via a pair of fiber
`lines. The I/O port contains circuitry for optical to electrical and electrical to optical conversion. From
`the functional viewpoint, a port consists of an input queue and an output register as depicted in Figure 5.
`
`Figure 5: HUB overview
`
`The HUB has a crossbar switch, which can connect the input queue of a port to the output register of
`any other port (see Figure 7). An input queue can be connected to multiple output registers (for multicast),
`but only one input queue can be connected to an output register at a time. A status table is used to keep
`track of existing connections and to ensure that no new connections are made to output registers that
`are already in use. The status table is maintained by a central controller and can be interrogated by the
`CABs.
`The I/O port extracts commands from the incoming byte stream, and inserts replies to the commands
`in the outgoing byte stream. Commands that require serialization, such as establishing a connection,
`are forwarded to the central controller, while "localized" commands, such as breaking a connection, are
`executed inside the I/O port.
`
`6
`
`INTEL EX.1222.009
`
`

`

`Backplane containing
`Crossbar & Central Controller
`
`Figure 6: HUB packaging in the Nectar prototype
`
`For the prototype Nectar system, the HUB has 16 I/O ports. Two HUB I/O boards, each consisting of
`eight I/O ports, can be plugged into the HUB backplane. The backplane contains an 8-bit wide 16 x 16
`crossbar and the central controller. Each I/O port interfaces to a pair of fiber lines at the front of the
`I/O board. This packaging scheme is depicted in Figure 6. An additional instrumentation board can be
`plugged into the backplane, as shown in the figure; it can monitor and record events related to the crossbar
`and its controller.
`Each I/O board in the prototype uses 305 chips and has a typical power consumption of 110 watts;
`the boards are 15 x 17 inches. The backplane uses 92 chips for the 16 x 16 crossbar and 132 chips for the
`central controller. (47 chips in the crossbar and 20 chips in the controller are for hardware debugging.)
`The backplane has a typical power consumption of 70 watts.
`
`4.2 HUB Commands and Usage
`
`The HUB hardware supports 38 user commands and 14 supervisor commands for various datalink
`protocols. Supervisor commands are for system testing and reconfiguration purposes, whereas user
`commands are for operations concerning connections, locks, status, and flow control.
`For the Nectar prototype each command is a sequence of three bytes:
`
`command HUB ID param
`
`The first byte specifies a HUB command, the second byte specifies the HUB to which the command is
`directed, and the third byte is a parameter for the command, typically the ID of one of the ports on that
`HUB.
`In the following we mention some of the user commands and describe how they can be used to
`implement several datalink protocols. The four-HUB system depicted in Figure 7 will be used in all the
`examples.
`
`4.2.1 Circuit Switching
`
`Using circuit switching, the entire route is set up first using a command packet, before a data packet
`is
`transmitted. (A data packet is framed by start of packet and end of packet.) To send data from CAB3
`to CAB1, CAB3 first establishes the route by sending out the following command packet:
`
`7
`
`INTEL EX.1222.010
`
`

`

`Figure 7: Connections on a 4-HUB system
`
`HUB2 P8
`open with retry
`open with retry and reply HUB1 P8
`
`HUB2 will keep trying to open the connection from P4 to P8. After the connection is made, the open
`with retry and reply command is forwarded to HUB1 over the connection. After the connection from
`P3 to P8 in HUB1 is established, HUB1 sends a reply over the route established in the opposite direction,
`using another set of fiber lines, input queues, and output registers. By stealing cycles from these resources
`whenever necessary, the reply is never blocked and can reach CAB3 within a bounded amount of time.
`After receiving the reply, CAB3 knows that all the requested connections have been established. Then
`CAB3 sends data, followed by a close all command that travels over the established route.
`The close all command is recognized at the output register of each HUB in the route. After detecting
`the close all, the HUB closes the connection leading to the output register. Therefore all the connections
`will be closed after the data has flowed through them. Alternatively, close all can be replaced with a set
`of individual close commands, closing the connections in reverse order.
`If CAB3 does not receive a reply soon enough, it can try to get the connection status of the HUBs
`involved to find out what connections have been made, and can send another command packet requesting
`a different route (possibly starting from some existing connections). CAB3 can also decide to take down
`all the existing connections by using close all, and attempt to re-establish an entire route.
`
`8
`
`INTEL EX.1222.011
`
`

`

`4.2.2 Circuit Switching for Multicasting
`
`To illustrate how multicasting is implemented, consider the case in which CAB2 wants to send a data
`packet to both CAB4 and CAB5, as depicted in Figure 7. To establish the required connections, CAB2
`can use the following command packet:
`
`open with retry
`open with retry and reply
`open with retry
`open with retry and reply
`
`HUB1
`HUB4
`HUB4
`HUB3
`
`P6
`P5
`P3
`P4
`
`After receiving replies to both of the open with retry and reply commands, CAB2 sends the data packet.
`
`4.2.3 Packet Switching
`
`The HUB has two facilities to support packet switching:
`
`1. An input queue for each port. For the prototype Nectar system the length of the input queue, and
`thus the maximum packet size, is 1 kilobyte. (Circuit switching must be used for larger packets
`but, since the overhead of circuit setup is small compared to the packet transmission time, this does
`not add significantly to latency.)
`
`2. Support for flow control A ready bit is associated with each port of a HUB. The ready bit indicates
`whether the input queue of the next HUB connected to it is ready to store a new packet. Consider
`for example port P8 of HUB2 in Figure 7. This port is connected to port P3 of HUB 1. If the ready
`bit associated with P8 of HUB2 is 1, then the input queue of P3 of HUB 1 is guaranteed to be ready
`to store a new packet.
`
`The ready bit associated with each port is set to 1 initially. When start of packet is detected at the
`output register of the port, the ready bit is set to 0. Upon receipt of a signal from the next HUB
`indicating that the start of packet has emerged from the input queue connected to the port, the bit
`is set to 1.
`
`Suppose that CAB3 wants to send a data packet to CAB1 using the route shown in Figure 7. Using
`packet switching, CAB3 can send out the following packet:
`
`test open with retry HUB2 P8
`test open with retry HUB1 P8
`data
`close all
`
`The test open with retry command is used to enforce flow control. For example, the first test open
`with retry command ensures that HUB2 will not succeed in making the connection from P4 to P8 until
`port P3 of HUB1 is ready to store the entire data packet Otherwise HUB2 will keep trying to make
`the connection. Thus the packet is forwarded to the next HUB as soon as the input queue in that HUB
`becomes available.
`
`4.2.4 Packet Switching for Multicasting
`
`Consider again the multicasting example of Section 4.2.2. Using packet switching, CAB2 can use the
`following commands to multicast a packet to CAB4 and CAB5:
`
`9
`
`INTEL EX.1222.012
`
`

`

`test open with retry
`test open with retry
`test open with retry
`test open with retry
`data
`close all
`
`HUB1 P6
`HUB4 P5
`HUB4 P3
`HUB3 P4
`
`5 The CAB
`
`The CAB is the interface between a node and the Nectar-net It handles the transmission and reception of
`data over the fibers connected to the network. Communication protocol processing is off-loaded from the
`node to the CAB thus freeing the node from the burden of handling packet interrupts, processing packet
`headers, retransmitting lost packets, fragmenting lar^e messages, and calculating checksums.
`
`Data Memory Bus
`
`Fibers to HUB
`
`Fiber Out
`
`Fiber In
`
`CPU Bus
`
`CPU
`
`DMA
`Controller
`
`VME to Node
`
`Memory
`Protection
`
`Serial Line
`
`Figure 8: CAB block diagram
`
`5.1 CAB Design Issues
`The design of the CAB is driven by three requirements:
`1. The CAB must be able to keep up with the transmission rate of the optical fibers (100 megabits/sec(cid:173)
`ond in each direction).
`2. The CAB should ensure that messages can be transmitted over the Nectar-net with low latency. The
`HUB can set up a connection and begin transferring a data packet in less than one microsecond.
`Thus any latency added by the CAB can contribute significantly to the overall latency of message
`transmission.
`3. The CAB should provide a flexible environment for the efficient implementation of protocols and
`selected applications. Specifically, it should be possible to implement a simple operating system on
`the CAB that allows multiple lightweight processes to share CAB resources.
`
`10
`
`INTEL EX.1222.013
`
`

`

`Together these design goals require that the CAB be able to handle incoming and outgoing data at
`the same time as meeting local processing needs. This is accomplished by including a hardware DMA
`controller on the CAB. The DMA controller is able to manage simultaneous data transfers between the
`incoming and outgoing fibers and CAB memory, as well as between VME and CAB memory, leaving the
`CAB CPU free for protocol and application processing.
`To meet the protocol and application processing requirements, we designed the CAB around a high
`performance RISC CPU and fast local memory. The choice of a high-speed CPU, rather than a custom
`microengine or lower performance CPU, distinguishes the CAB from many I/O controllers. The latest
`RISC chips are competitive with microsequencers in speed, but offer additional flexibility and a familiar
`development environment. Older general-purpose microprocessors would be unable to keep up with the
`protocol processing requirements at fiber speeds, let alone provide cycles for user tasks.
`Allowing application software to run on the CAB is important to many applications but has dangers. In
`particular, incorrect application software may corrupt CAB operating system data structures. To prevent
`such problems, the CAB provides memory protection on a per-page basis and hardware support for
`multiple protection domains.
`The CAB design also includes various devices to support high-speed communication: hardware
`checksum computation removes this burden from protocol software; hardware timers allow time-outs
`to be set by the software with low overhead.
`
`5.2 CAB Implementation
`
`The prototype CAB implementation uses as its RISC GPU a SPARC processor running at 16 megahertz.
`A block diagram of the CAB is shown in Figure 8.
`A VME interface to the node was the natural choice in our environment, allowing Sun workstations
`and Warp systems to be used in the Nectar prototype. The initial CAB implementation supports a VME
`bandwidth of 10 megabytes/second, which is close to the speed of the current fiber interface.
`Two fibers (one for each direction) connect each CAB to the HUB. The fiber interface uses the same
`circuit as the HUB I/O port (see Section 4.1). Data can be read or written to the fiber input or output
`queue by the GPU, but for data transfers of more than small numbers of words the DMA controller should
`be used to achieve higher transmission rates. The DMA controller also handles flow control during a
`transfer the DMA controller waits for data to arrive if the input queue is empty, or for data to drain if
`the output queue is full.
`The on-board CAB memory is split into two regions: one intended for use as program memory, the
`other as data memory. DMA transfers are supported for data memory only; transfers to and from program
`memory must be performed by the CPU. The memory architecture is thus optimized for the expected
`usage pattern, although still allowing code to be executed from data memory or packets to be sent from
`program memory.
`In the prototype, the total bandwidth of the data memory is 66 megabytes/second, sufficient to support
`the following concurrent accesses: CPU reads or writes, DMA to the outgoing fiber, DMA from the
`incoming fiber, and DMA to or from VME memory. The program memory has the same bandwidth as
`data memory and is thus able to sustain the peak CPU execution rate.
`The program memory region contains 128 kilobytes of PROM and 512 kilobytes of RAM. The data
`memory region contains 1 megabyte of RAM. Both memories are implemented using fast (35 nanosecond)
`static RAM. Using static RAM in the prototype rather than less expensive dynamic RAM plus caching
`was worth the additional cost—it allowed us to focus on the more innovative aspects of the design instead
`of expending effort on a cache.
`The CAB's memory protection facility allows each 1 kilobyte page to be protected separately. Each
`page of the CAB address space (including the CAB registers and devices) can be assigned any subset of
`
`11
`
`INTEL EX.1222.014
`
`

`

`read, write, and execute permissions. All accesses from the CAB CPU or from over the VME bus are
`checked in parallel with the operation so that no latency is added to memory accesses. The flexibility,
`safety, and debugging support that memory protection affords the CAB software is worth the non-trivial
`cost in design time and board area.
`The memory protection includes hardware support for multiple protection domains, with a separate
`page protection table for each domain. Currently the CAB supports 32 protection domains. The assignment
`of protection domains is under the control of the CAB operating system kernel. The kernel can therefore
`ensure that the CAB system software is protected from user tasks and that user tasks are protected from
`one another. In addition, accesses from over the VME bus are assigned to a VME-specific protection
`domain.
`The CAB occupies a 24-bit region of the node's VME address space. Every device accessible to the
`CAB CPU is also visible to the node, allowing complete control of the CAB from the node. In normal
`operation, however, the node and CAB communicate through shared buffers, DMA, and VME interrupts.
`The CAB prototype is a 15 x 17 inch board, with a typical power consumption of 100 watts. Of the
`nearly 360 components on the densely packed board about 25% are for the data memory and DMA ports,
`15% for the VME interface, 15% for the CPU and program memory, and 13% for the I/O ports. The
`remaining 120 or so chips are divided among the DMA controller, CAB registers, hardware checksum
`computation, memory protection, and clocks and timers.
`
`6 Software
`
`The design of the Nectar software has two goals:
`
`latency between user processes. Since processing overhead on the sending
`1. Minimize communication
`and receiving nodes accounts for most of the communication latency over local area networks
`[3,5,11], the software organization plays a critical role in reducing the latency. To achieve low
`latency, data copying and context switching must be minimized.
`
`2. Provide a flexible software environment on the CAB. This will convert a bare "protocol engine"
`into a customizable network interface. When appropriate, the CAB can also be used to off-load
`application tasks from the node.
`
`The software currently running on the prototype system consists of the CAB kernel, communica

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket