throbber
1 of 10
`
`Network-Based Multicomputers: An Emerging Parallel Architecture
`
`H.T. Kung, Robert Sansom, Steven Schlick, Peter Steenkiste, Matthieu Arnould,
`Francois J. Bitz, Fred Christianson, Eric C. Cooper, Onat Menzilcioglu, Denise Ombres, Brian Zill
`
`School of Computer Science / Carnegie Mellon University / Pittsburgh, PA 15213
`
`Abstract
`Multicomputers built around a general network are now a vi-
`able alternative to multicomputers based on a system-specific
`interconnect because of architectural improvements in two
`areas. First, the host-network interface overhead can be
`minimized by reducing copy operations and host interrupts.
`Second, the network can provide high bandwidth and low
`latency by using high-speed crossbar switches and efficient
`protocol implementations. While still enjoying the flexibility
`of general networks, the resulting network-based multicom-
`puters achieve high performance for typical multicomputer
`applications that use system-specific interconnects. We have
`developed a network-based multicomputer called Nectar that
`supports these claims.
`
`1
`
`Introduction
`
`Current commercial parallel machines cover a wide spec-
`trum of architectures: shared-memory parallel computers
`such as the Alliant, Encore, Sequent, and CRAY Y-MP;
`and distributed-memory computers including MIMD ma-
`chines such as the Transputer [15], iWarp [5, 6], and various
`hypercube-like systems [3, 17], and SIMD machines such as
`the Connection Machine [26], DAP, and MasPar. Like SIMD
`machines, distributed memory MIMD computers, or multi-
`computers, are inherently scalable. Multicomputers however
`can handle a larger set of applications than SIMD machines
`because they allow different programs to run on different
`processors. Multicomputers with over 1,000 processors have
`been used successfully in some application areas [14].
`Multicomputers are traditionally built by using system-
`specific interconnections to link a set of dedicated processors.
`Examples of these traditional multicomputers are any of the
`high-performance hypercube systems such as the iPSC-2 and
`
`

`

`2 of 10
`
`interface supported by phone carriers. The gigabit Nectar
`system is one of the five testbeds in a US national effort to
`develop gigabit per second wide-area networks [24].
`This paper describes architectural features that are desir-
`able for general-purpose networks if they are to be used
`as interconnects for high-performance multicomputers.
`In
`Section 2 we summarize the advantages and interconnec-
`tion requirements of network-based multicomputers and in
`Section 3 we give an overview of the prototype Nectar sys-
`tem. We then look at design tradeoffs of critical network
`components: the host-network interface (Section 4) and the
`network interconnect (Section 5). We summarize our results
`in Section 6.
`
`2 Network-Based Multicomputers
`
`We first summarize the advantages of network-based multi-
`computers and then describe the architectural requirements
`for their interconnect.
`
`2.1 Network-Based Multicomputer Advantages
`Network-based multicomputers offer substantial advantages
`over traditional multicomputers that use dedicated intercon-
`nects:
`
`1. Supports heterogeneous architectures. A network-
`based multicomputer can incorporate nodes specially
`selected to suit a given application. Instead of having
`computations with different characteristics use a sin-
`gle architecture as in conventional computer systems,
`network-based multicomputers support a new compu-
`tation paradigm that matches architectures to different
`computational needs.
`
`2. Use of existing architectures. By incorporating existing
`systems as hosts, a network-based multicomputer can
`take advantage of rapid improvements of commercially
`available computers. In addition, it can reuse existing
`systems software and applications.
`
`In fact, a network-based multicomputer provides a
`graceful environment for moving applications to new
`architectures such as special-purpose, parallel systems.
`In the beginning, an application can execute part of the
`computation on these new systems while using more
`conventional systems to run the rest of the application.
`The application can increase its use of the new systems
`as more software and application code for the new sys-
`tems is developed.
`
`3. Availability of large data memory. In a network-based
`multicomputer, applications can use the aggregate of
`the memory available in all the nodes. Since each node
`can have a sizable memory, the amount of memory is
`potentially huge, and it becomes possible to solve very
`
`large problems using relatively modest systems such as
`workstations.
`
`4. High-speed I/O. The underlying high-speed network is
`inherently suited to support high-speed I/O to devices
`such as displays, sensors, file systems, mass stores, and
`interfaces to other networks. For example, via such
`a network, disk arrays [18, 21] can deliver very high
`data transfer rates to applications. Thus, because of the
`combination of their I/O capability and their ability to
`incorporate powerful computing nodes, network-based
`multicomputers represent a balanced architectural ap-
`proach capable of speeding up both computation and
`I/O.
`2.2 Multicomputer Interconnect Requirements
`The requirements for a multicomputer network are a com-
`bination of the requirements of general-purpose networks
`and dedicated interconnects.
`In the first place they must
`have the high performance of the dedicated interconnects
`found in traditional multicomputers so that they can sup-
`port multicomputer applications efficiently, but at the same
`time, since they have to operate in the same environment
`as a general-purpose network, multicomputer networks must
`have the same flexibility and reliability as general-purpose
`networks.
`Performance has both throughput and latency require-
`ments:
`for large messages, the network bandwidth deter-
`mines how quickly a message can be delivered; however,
`for small messages, having a low overhead on the sender
`and receiver side is more important. In the case of network-
`based multicomputers, the network efficiency must increase
`in proportion to the computation rate of the hosts on the
`network. For today’s traditional multicomputers, the band-
`width of each interconnection link can be as high as several
`100 Mb/s [5] and the communication latency between pro-
`cesses on two processors can be as low as 50 to 500 mi-
`croseconds [4]. These numbers set performance goals for
`network-based multicomputers, and they have an impact on
`both the design of the network and the host-networkinterface.
`In terms of flexibility, the network must be general-purpose
`enough to allow the attachment of many types of computers.
`Moreover, the nodes of network-based multicomputers are
`typically distributed over a building or campus so regular
`interconnect architectures such as a torus or hypercube are
`not practical. Although regular interconnects are attractive
`when mapping regular algorithms on homogeneous multi-
`computers, they are not flexible enough to allow the matching
`of network bandwidth to the requirements of heterogeneous
`hosts, or to handle the adding and removing of nodes while
`the network is operating.
`As in any general-purpose LAN, the underlying network
`of a network-based multicomputer needs to cope with data er-
`rors and network failures, since the same physical media are
`
`Ex.1026.002
`
`DELL
`
`

`

`3 of 10
`
`HOST
`
`CAB
`
`Fiber Pair
`
`HOST
`
`CAB
`
`Nectar-Net
`
`HUB
`
`HUB
`
`HOST
`
`CAB
`
`HUB
`
`CAB HOST
`
`CAB
`
`HOST
`
`CAB
`
`HOST
`
`Figure 1: Nectar system
`
`provide high speed transfers between the fiber port and the
`data memory, and between the data memory and the VMEbus
`interface.
`
`3.2 Nectar Systems Software
`The Nectar system software consists of a CAB runtime sys-
`tem and libraries on the host that exchange messages with the
`CAB on behalf of the application. The CAB runtime system
`manages hardware devices such as timers and DMA con-
`trollers, supports multiprogramming (the threads package),
`and manages data buffers (the mailbox module). The threads
`package, derived from the Mach C Threads package [9], sup-
`ports lightweight threads in a single address space. Threads
`provide a low cost, flexible method of sharing the CAB CPU
`between concurrent activities, which is important for commu-
`nication protocol implementation. Mailboxes provide flex-
`ible and efficient management of buffer space in the CAB
`memory and form the endpoints of communication between
`processes on hosts or CABs.
`The streamlined structure of the CAB software has made
`it possible for Nectar to achieve low communication latency.
`For the existing Nectar prototype, the latency to establish a
`connection through a single HUB is under 1 microsecond.
`The latency is under 100 microseconds for a message sent
`between processes on two CABs, and about 200 microsec-
`onds between processes residing in two workstation hosts.
`The above figures do not include the fiber transmission la-
`tency of approximately 5 microseconds per kilometer. These
`performance results are similar to those of traditional multi-
`computers.
`The CAB runtime system currently supports several
`transport protocols with different reliability/overhead trade-
`offs [10]. They include the standard TCP/IP protocol suite
`besides a number of Nectar-specific protocols. For TCP,
`when TCP checksums are not computed, the throughput
`between two CABs is over 80 Mb/s for 8 kilobyte pack-
`ets. When checksums are computed, TCP throughput drops
`to about 30 Mb/s. This indicates that hardware support
`for checksum calculation can significantly improve perfor-
`mance, at least for this type of system.
`
`3.3 Nectar Applications
`The network-based multicomputer architecture has made it
`possible to parallelize a new class of large applications [19].
`These applications were previously either too large or too
`complex to be implemented on parallel systems. We have
`successfully ported several such applications onto the Nectar
`prototype system. Examples are COSMOS [7], a switch-
`level circuit simulator; NOODLES [8], a solid-modeling
`package; and a simulation of air pollution in the Los An-
`geles area.
`Because Nectar uses existing general-purpose computers
`as hosts, applications can make direct use of code that has
`
`Ex.1026.003
`
`used in both cases. This is in contrast with traditional mul-
`ticomputers, which typically assume that the interconnect is
`reliable. One of the challenges of building multicomputers
`around general networks is to make sure that the techniques
`employed to insure reliability do not hamper the system per-
`formance.
`In the rest of the paper we describe methods of achieving
`these requirements. We start by describing the Nectar system,
`our first system experiment in this area.
`
`3 Nectar System
`
`prototype uses 100 Mb/s fiber links and 16(cid:2) 16 HUBs. The
`
`To demonstrate the feasibility of network-based multicom-
`puters, we started developing the Nectar system [2] in 1987.
`Nectar is a high-bandwidth, low-latency computer network
`for connecting high-performance hosts. Hosts are attached
`using Communication Acceleration Boards (CABs). The
`Nectar network consists of fiber-optic links and crossbar
`switches (HUBs). The HUBs are controlled by the CABs
`using a command set that supports circuit switching, packet
`switching, multi-hop routing, and multicast communication.
`Figure 1 gives an overview of the Nectar system.
`3.1 Nectar Prototype
`We have built a 26-host Nectar prototype system to support
`early system software and applications development. The
`
`CAB is implemented as a separate board on the host VMEbus.
`The CAB is connected to the network via a fiber port that
`supports data transmission rates up to 100 Mb/s in each direc-
`tion. The fiber port contains the optoelectronics interfaces
`to the two fiber lines and FIFOs to buffer data transferred
`over the fibers. Each CAB has 1 megabyte of data mem-
`ory, 512 kilobytes of program memory, and a 16.5 MHz
`SPARC processor. The CAB also has a DMA controller to
`
`DELL
`
`

`

`4 of 10
`
`previously been developed for these computers, and as a
`result, the Nectar implementation of the above applications
`took relatively little effort in spite of their relatively high
`complexity.
`In the distributed versions of both COSMOS
`and Noodles, for example, each node executes a sequential
`version of the program that was modified to do only part of
`the computation.
`The COSMOS simulator was ported to Nectar by partition-
`ing the circuit across the Nectar nodes. This makes it possible
`to simulate very large circuits, that cannot be handled on a
`single node, and it illustrates the benefit of being able to use
`the aggregate memory of a number of systems for a single ap-
`plication. Other applications, such as a chemical flow sheet
`application, were able to use a group of workstations plus a
`Warp systolic array.
`In addition to the use of existing code, the implemen-
`tation of applications on Nectar has emphasized the use of
`general-purpose supports for large-grain parallelization. This
`development approach complements existing fine-grain ef-
`forts in parallelizing inner-most loops of computations. The
`combined capability should significantly increase the appli-
`cability of parallel processing.
`
`4 Host-Network Interface
`
`The critical factor in parallelizing applications on a multi-
`computer is how quickly tasks on different hosts can commu-
`nicate. The latency is often determined by software overhead
`on the sending and receiving hosts, so reducing this overhead
`is a primary goal in the design of a host-network interface. In
`Nectar we decreased message latency by reducing the number
`of data copies for each message (each copy adds significant
`latency because main memory bandwidth is limited on most
`hosts), and by offloading protocol processing to an outboard
`processor so that the number the number of host interrupts
`is reduced (each host interrupt adds 10-20 microseconds to
`latency [1]).
`We first describe three design alternatives for the host-
`network interface, and examine how different software im-
`plementations can utilize these designs. We then discuss
`specific hardware and software issues for building interfaces
`for workstations, based on our experience with Nectar.
`4.1 Host-Network Interface Design
`The three components that play a role in the host-network
`interface are the host CPU, main memory, and network in-
`terface. Figure 2 shows three ways in which data can flow
`between these components when sending messages. The
`grey arrows indicate the building of the message by the ap-
`plication; the black arrows are copy operations performed by
`the system.
`The architecture depicted in Figure 2(a) is the network
`interface found in many computer systems, including most
`workstations. When a system call is made to write data to
`
`the network, the host operating system copies the data from
`user space to system buffers. Packets are sent to the network
`by providing a list of descriptors to the network controller,
`which uses DMA to transfer the data to the network. The
`main disadvantage of this design is that three bus transfers
`are required for every word sent and received. However, this
`is not really a problem if the speed of the network medium is
`sufficiently slow compared to the memory bandwidth, as is
`the case for current workstations connected to an Ethernet.
`The communication activity relative to the processing ac-
`tivity is much higher on multicomputers than on traditional
`general-purpose networks, and as a result, the network speed
`of multicomputers has to be a significant fraction of the pro-
`cessor and memory bandwidth of the computer to avoid that
`the network becomes a bottleneck. Existing workstations
`connected to a high-speed network (100 Mb/s or higher band-
`width) are an example. In these systems, the memory bus
`will be a bottleneck if the architecture of Figure 2(a) is used
`for the network interface. The performance can be improved
`by reducing the number of data copies done over the mem-
`ory bus by using external memory in the network interface.
`Figures 2(b) and 2(c) show two alternative ways of utilizing
`external memory.
`Figure 2(b) depicts an alternative in which the system
`buffers have been moved from host memory to external mem-
`ory on the network interface. When data is sent or received,
`the data is copied between the user buffer in main memory
`and the system buffers in the network interface. The copy-
`ing requires two bus transfers per word if it is done by the
`CPU, and one bus transfer per word if it is done by a DMA
`controller.
`The approach used in Nectar is shown in Figure 2(c). With
`this approach the user buffers are located on the network
`interface (the CAB), and the data does not have to be copied:
`data packets are formed and consumed in place by the user
`process. As a result, communication latency is minimized
`and main memory bandwidth is conserved.
`
`4.2 Network Interface Software
`
`The design alternatives shown in Figure 2 are linked to the
`ways in which applications send and receive messages. With
`the Unix socket interface [20] users specify messages with a
`pointer-length pair. The semantics of the Unix socket read
`call is that the call returns when the message is available in
`the specified area. The socket write call returns when the
`message can be overwritten. The semantics of both calls
`requires that the data is logically copied as part of the call.
`This requirement is naturally met by the standard host inter-
`face implementation of Figure 2(a), although the design of
`Figure 2(b) can also be used in the socket model.
`To support the host interface of Figure 2(c), the Nectar
`interface implements “buffered” send and receive primi-
`tives [23] using the mailboxes mentioned in Section 3.2.
`
`Ex.1026.004
`
`DELL
`
`

`

`CPU
`
`5 of 10
`
`CPU
`
`CPU
`
`user
`
`sys
`
`user
`
`sys
`
`user
`
`MAIN MEMORY
`
`NETWORK
`
`MAIN MEMORY
`
`NETWORK
`
`MAIN MEMORY
`
`NETWORK
`
`(a)
`
`(b)
`
`(c)
`
`Figure 2: Design alternatives for the host-network interface
`
`With a buffered send, the application builds its message in a
`message buffer that was previously obtained from the system.
`As part of the send operation, the application gives up the
`right to access the message buffer, so the system can free it
`automatically when the data has been sent and appropriately
`acknowledged. Similarly for a receive, the system returns a
`pointer to a message buffer to the user process. The applica-
`tion has to return the message buffer to the system after the
`message has been consumed.
`The advantage of buffered sends and receives over socket-
`like primitives is that data no longer has to be copied as part
`of the send and receive calls. This gives the implementation
`more freedom in implementing the communication interface,
`and can result in a lower overhead to the application and lower
`message latency.
`Our experience with the Nectar system indicates that
`buffered primitives are faster for large messages, but that
`the immediate (socket-like) primitives are faster for short
`messages. The reason is that for immediate sends and re-
`ceives of short messages, the data can be included with the
`request that is exchanged between the host and the CAB.
`For short messages, the extra complexity of the buffer man-
`agement for buffered primitives is more expensive than the
`cost of simply copying the data.
`In the Nectar prototype,
`the buffered primitives are more efficient for messages larger
`than about 50 words.
`
`4.3 Design Choices
`We review the major design choices that were made for the
`prototype Nectar system and draw some general conclusions
`based on our experience with the prototype.
`
`4.3.1 Shared Memory
`
`One problem with the shared memory interface in the Nec-
`tar prototype is the relatively high latency of CAB memory
`
`accesses from the host. The access time to CAB memory
`is approximately 2 to 3 times greater than to main memory,
`depending on the particular host. A large part of this la-
`tency is due to the asynchronous VME bus that requires both
`the host and CAB to synchronize on every transfer. More
`recently-developed, high-speed synchronous busses [12, 25]
`couple the host more tightly to the I/O bus, thus reducing the
`latency of accesses across the I/O bus.
`Maintaining consistency between the host cache and the
`CAB memory is another problem that must be addressed.
`The current solution is to mark all buffers as uncacheable,
`which increases the latency for accesses to messages that
`are handled using buffered sends and receives. If the CAB
`memory were cached, it would be necessary to invalidate
`cache lines when new data arrives on the CAB.
`Immediate sends and receives are implemented by having
`the system software copy the data between the user buffer in
`main memory and the CAB. Alternatively, the network inter-
`face can implement block transfers between user buffers in
`main memory and the network interface using DMA. Com-
`pared with copying using the CPU, using DMA incurs an
`overhead to pin the affected user virtual memory pages and
`to invalidate cache lines. Whether CPU copies or DMA
`transfers are more efficient depends on the relative costs for
`the particular system. For large transfers these costs can often
`be amortized, while for small transfers it is typically better
`to have the CPU read and write the user buffers directly,
`avoiding the overheads of DMA.
`
`4.3.2 Checksum Hardware
`
`Most software implementations calculate the transport-level
`checksum in a separate pass over the data. As a result,
`one memory read is added for every word sent or received.
`This extra memory access can be avoided by calculating the
`checksum while the data is copied, either using the CPU or
`
`Ex.1026.005
`
`DELL
`
`

`

`6 of 10
`
`special hardware in the DMA unit. The design shown in
`Figure 2(b), together with copying and checksumming by
`the CPU, corresponds to Jacobson’s proposed “WITLESS”
`interface [16].
`Since the cost of implementing checksum hardware is typ-
`ically small, it is attractive to calculate the transport-level
`checksum in hardware. In particular, when the user buffers
`reside on the network interface, or when DMA is used to
`transfer data between host memory and the network interface,
`calculating the checksum on the network interface means that
`the host system software does not have to touch the user data
`at all.
`
`4.3.3 Programmable CPU
`
`The programmable CPU on the CAB is a key feature of the
`Nectar prototype. The motivations for building the CAB
`around a general-purpose CPU with a flexible runtime sys-
`tem, were that we wanted to experiment with different pro-
`tocol implementations and open up the CAB to applications.
`For a prototype system, this flexibility is more important
`than the increase in performance that could be achieved with
`a complete hardware implementation.
`In the Nectar prototype, protocol processing can be com-
`pletely offloaded to the CAB. The main benefit of doing
`so is that the interaction between the host and the network
`interface is reduced to a single operation in which the ap-
`plication presents or accepts the data. The actual protocol
`processing overhead is small compared with the OS overhead
`cost of handling interrupts and system calls. A comparison
`of the host-host performance of the Nectar-native reliable
`message protocol (RMP) and the standard TCP/IP protocol
`shows that throughputis similar, although RMP has a smaller
`latency [10].
`Applications have access to the CAB so that they can
`exploit the low CAB to CAB latency. How useful this capa-
`bility is largely depends on whether restructuring application
`software significantly improves performance. An example
`in which this is the case is the dynamic load balancing per-
`formed by the CAB CPU in applications such as NOODLES
`(see Section 3.3). The centralized load balancer is placed on
`a CAB, allowing it to repond to about 10,000 requests for
`tasks per second.
`
`4.3.4
`
`Inter-Device DMA
`
`The ability to transfer data directly from the CAB to another
`device on the VME bus has been very useful in the prototype
`Nectar. Data can be sent to or received from devices such
`as frame buffers without going through host memory. Di-
`rect DMA both reduces latency to the device and conserves
`host memory bandwidth. The ability to provide this feature
`depends on the specification of the target bus. Some newer
`synchronous busses (such as TurboChannel) do not permit
`
`inter-device transfers, the justification being that it compli-
`cates the system design [12].
`
`4.3.5 Other Host Architectures
`
`Multicomputers based on general-purpose networks have the
`advantage that they can include a variety of hosts. Some
`of these hosts place special requirements on the network
`interface. We look at the communication needs for some
`interesting classes of hosts: special-purpose supercomputers
`(such as iWarp and Connection Machine); general-purpose
`supercomputers (such as Crays and IBM mainframes); and
`“dumb” devices (such as disk farms and frame buffers).
`Special-purpose supercomputers currently often lack ad-
`equate external I/O bandwidth. Although the internal ag-
`gregate memory and interprocessor bandwidths of these ma-
`chines are very high, the bandwidth to external networks
`and the amount of network buffer space are often small. In
`addition, such machines are often not suited to tasks such
`as calculating checksums. As a result, it is desirable that
`the network interface of special-purpose supercomputer pro-
`vides functionality similar to that provided by our CAB, such
`as a large buffer area, checksumming hardware, and switch
`control and datalink hardware.
`General-purpose supercomputers are often capable of cal-
`culating packet checksums at near-network bandwidth rates
`using pipelined vector units. In addition, such machines have
`paths into large central memories with rates greater than a gi-
`gabit per second. Thus for general-purpose supercomputers
`it is less desirable to provideexternal buffering and checksum
`hardware.
`For “dumb” devices, it is highly appropriate to have a
`network interface with a general-purpose processor. Such an
`interface can provide sufficient intelligence to allow a dumb
`device to be connected directly to the network.
`
`5
`
`Interconnect Architecture
`
`The other crucial component of a network-based multicom-
`puter is the interconnect or network itself. This section de-
`scribes alternatives for the interconnect and justifies how a
`general network, as is used in Nectar, is suitable for a multi-
`computer.
`5.1
`Interconnect Design
`The requirements for an interconnect for a network-based
`multicomputer are: high throughput (100-800 Mb/s avail-
`able to each host); low latency (100-500 microseconds be-
`tween hosts); good scalability (10-1000 attached computers);
`support for high-bandwidth multicast communication; and
`robustness so that recovery from network failures is simple.
`Some of these requirements are similar to those for general-
`purpose networks, while others, such as the performance
`goals, are much stronger.
`
`Ex.1026.006
`
`DELL
`
`

`

`7 of 10
`
`below). A larger HUB crossbar switch (for example, 32(cid:2) 32
`or 64(cid:2) 64) would reduce these problems and allow networks
`
`connection every 70 nanosecond cycle.
`Because of the low switching and transfer latency of a sin-
`gle HUB, the additional latency in a multi-HUB system is not
`significant. Thus it is practical to build multicomputers con-
`sisting of small numbers of these HUBs and up to 100 hosts.
`Although it is possible to build larger multicomputers using
`the current HUB, the network management and configuration
`problems become greater with large numbers of HUBs (see
`
`General-purpose LANs based on a shared medium, such
`as Ethernet, token ring, or FDDI, do not meet the bandwidth
`and latency requirements of a multicomputer. As more com-
`puters are attached to the medium, the bandwidth available
`to each system decreases and the communication latency
`increases. On the other hand, LANs based on high-speed
`switches (such as Autonet [22], HPC [13] and Nectar) do
`meet the performance requirements of a multicomputers. In
`such LANs, each attached computer has an exclusive link to
`a local switch and can communicate with other computers
`on the same switch at the full bandwidth of the link. The
`aggregate bandwidth of the network is very high.
`During the design of Nectar, we concentrated on the net-
`work features that are important for multicomputer usage.
`For example, we concentrated on minimizing the overhead on
`the network interface and during the design of the switch, we
`concentrated on providing a low connection setup time and
`hardware multicast (see Section 5.3.3), since these features
`are important for multicomputer applications. Some features
`that would be needed to operate a large network in a general-
`purpose environment were not included. For example, we
`do not attempt to provide automatic network reconfiguration,
`since the configuration of a Nectar system is relatively stable
`and changes can be made manually. These features would
`result in additional hardware (including switch control pro-
`cessors) and software to manage the network. Autonet on
`the other hand explored the problems involved in managing
`large, general, switch-based networks. It presents exactly the
`same interface as an Ethernet to the workstations attached to
`it and it supports automatic network reconfiguration when
`hosts or switches are added to or removed from the network.
`In the rest of this section, we briefly describe the Nectar
`network and then discuss specific issues in the design of a
`network for a multicomputer, including flow control, routing,
`multicast, and robustness.
`
`of hundreds of hosts to be built and managed.
`
`5.3 Design choices
`We review the major design choices that were made for the
`prototype Nectar system and draw some general conclusions
`based on our experience with the prototype.
`
`5.3.1 Flow Control
`
`The network must provide flow control hardware to ensure
`that input buffers at the switches and hosts do not overflow.
`Lost packets have to be retransmitted by (software on) the
`sender and have a negative impact on both throughput and
`latency. Because of the strict performance requirements for
`multicomputers, avoiding lost packets is even more impor-
`tant for multicomputer networks than for general-purpose
`networks.
`Flow control in Nectar is performed by hardware on the
`CABs and HUBs under the control of CAB software. The
`source CAB tests the status of the input buffer on its local
`HUB before sending a packet.
`It can check the status of
`buffers on intermediary HUBs and the destination CAB by
`using HUB commands provided for this purpose. These
`commands allow the next packet to start flowing when there
`is room in the buffer at the other end of a link.
`The acknowledgments that tell a CAB or a HUB that there
`is space in the next buffer are generated by hardware on the
`next HUB or by software on a destination CAB. On the
`HUB, the acknowledgment is generated when a packet that
`is currently stored in the input buffer starts flowing out of the
`buffer. On the destination CAB, the acknowledgment is gen-
`erated by the datalink software because only the software can
`know when there will be space in the input buffer. Typically
`the datalink software will send the acknowledgment when it
`has started the DMA to transfer the packet in the buffer to
`local memory.
`Implementing flow control in software has a small over-
`head cost:
`it reduces the datalink throughput by less than
`10% for 1 kilobyte packets. Nectar implements the flow
`control in software for flexibility reasons: we want to try
`out and compare different strategies. Providing full hard-
`ware flow control between HUBs and CABs would reduce
`the overhead.
`
`Ex.1026.007
`
`around HUBs, which are 16(cid:2) 16 crossbar switches. A HUB
`
`5.2 Nectar Network Overview
`
`As described earlier in Section 3, the Nectar network is built
`
`implements a command set that allows source CABs to open
`and close connections through the switch. These commands
`can be used by the CABs to implement both packet switching,
`in which case the maximum packet size is determined by the
`size of the input FIFO on each port of the switch, and circuit
`switching, in which case the maximum amount of data that
`can be sent is arbitrarily large.
`In the prototype HUB, the latency to set up a connec-
`tion and transfer the first byte of a packet through a single
`HUB is ten cycles (700 nanoseconds). Once a connection
`has been established, the latency to transfer a byte is five
`cycles (350 nanoseconds), but the transfer of multiple bytes
`is pipelined to match the 100 Mb/s peak bandwidth of each
`fiber link. Furthermore, the HUB controller can set up a new
`
`DELL
`
`

`

`8 of 10
`
`5.3.2 Routing
`
`For a network the size of Nectar, the routing need not be
`dynamic or adaptive. Deterministic routing is sufficient, and
`is in fact preferable since it avoids deadlocks. Deterministic
`routing can be implemented using source routing. Source
`routing has the advantage of making the switches very sim-
`ple as they need no control software or har

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket