throbber
DEFS-ALAOO 10586
`Ex.1025.001
`
`DELL
`
`

`

`The Association for Computing Machinery
`11 West 42nd Street
`New York, New York 10036
`
`Copyright © 1988 by the Association for Computing Machinery, Inc. Copying, without
`fee is permitted provided that the copies are not made or distributed for direct
`commercial advantage, and credit to the source is given. Abstracting with credit
`is permitted,
`For other copying of articles that carry a code at the bottom of
`the first page, copying is permitted provided that the per-copy indicated in the
`code is paid through the Copyright Clearance Center, 27 Congress Street, Salem, MA
`01970.
`For permission
`to
`republish write
`to:
`Director of Publications,
`Association for Computing Machinery. To copy otherwise, or republish, requires a
`fee and/or specific permission.
`
`ISBN 0-89791-279-9
`
`Additional copies may be ordered prepaid from:
`
`ACM Order Department
`P.O. Box 64145
`Baltimore, MD 21264
`
`Price:
`Members
`•••••••• $20.00
`All others •••••••• $26.00
`
`ACM Order Number:
`
`533880
`
`ii
`
`DEFS-ALAOO 10587
`Ex.1025.002
`
`DELL
`
`

`

`The VMP Network Adapter Board (NAB):
`High-Performance Network Communication
`for Multiprocessors
`
`Hemant Kanakia
`Computer Systems Laboratory
`· Stanford University
`
`David R. Cheriton
`Computer Science Department
`Stanford University
`
`Abstract
`
`High performance computer communication between multipro(cid:173)
`cessor nodes requires significant improvements over conven(cid:173)
`tional host-to-network adapters. Current host-to-network adapter
`interfaces impose excessive processing, system bus and interrupt
`overhead on a multiprocessor host Current network adapters are
`either limited in function, wasting key host resources such as the
`system bus and the processors, or else intelligent but too slow,
`because of complex transport protocols and because of an in(cid:173)
`adequate internal memory architecture. Conventional transport
`protocols are too complex for hardware implementation and too
`slow without it.
`In this paper, we describe the design of a network adapter
`board for the VMP multiprocessor machine that addresses these
`issues. The adapter uses a host interface that is designed for min(cid:173)
`imal latency, minimal interrupt processing overhead and mini(cid:173)
`mal system bus and memory access overhead. The network
`adapter itself has a novel internal memory and processing ar(cid:173)
`chitecture that implements some of the key performance-critical
`transport layer functions in hardware. This design is integrated
`with VMTP, a new transport protocol specifically designed for
`efficient implementation on an intelligent high-performance net(cid:173)
`work adapter. Although targeted for the VMP system, the design
`is applicable to other multiprocessors as well as uni-processors.
`
`1
`
`Introduction
`
`Performance of transport protocols on multi-megabit communi(cid:173)
`cation networks tends to be limited by overhead at both the trans(cid:173)
`mitter and receiver. For example, measurements of the V kernel
`[5] indicate that network transmission time on Ethernet accounts
`for only about 20 percent of the elapsed time for transport-level
`communication operations, even with its highly optimized proto(cid:173)
`col. Similar performance figures have been reported in [ 15, 17].
`Although processor and memory cycle times keep improving,
`with communication networks moving to gigabit range, we ex(cid:173)
`pect the processing to persist as a bottleneck - unless significant
`improvements in network adapter and transport protocol designs
`
`Permission to copy without fee all or part of this material is granted provided
`that the copies are not made or distributed for direct commercial advantage,
`the ACM copyright notice and the title of the publication and its date appear.
`and notice is given that copying is by permission of the Association for
`Computing Machinery. To copy otherwise. or to republish. requires a fee and/
`or specific permission.
`

`
`1988 ACM 0-89791-279-9/88/008/0175 $1.50
`
`are achieved. We identify three major problems with current
`designs.
`First, the host-to-network adapter interface imposes excessive
`overhead, particularly on a multiprocessor host, in the form of
`processor cycles, system bus capacity and host interrupts. The
`processing overhead arises from calculating end-to-end check(cid:173)
`sums, packetizing and depacketizing as well as encryption, in
`the case of secure communication. The memory-intensive pro(cid:173)
`cessing required by these functions reduces the average instruc(cid:173)
`tion execution rate especially for a high-performance proces(cid:173)
`sor such as MIPS [14] in which memory reference operations
`are proportionally much slower than register-only operations.
`This processing causes the data to move at least twice over the
`system bus; once from global memory to the processor (or its
`cache), and once when the packet is copied to a network adapter.
`The increased traffic wastes system bus bandwidth, a critical re(cid:173)
`source in multiprocessor machines. In current host-to-network
`adapter interfaces, a host is interrupted for each packet received
`or transmitted. These per-packet interrupts force frequent con(cid:173)
`text switching with the attendant overheads, with a high penalty
`in the multiprocessors system with processor caches, where it
`may be necessary to fault code and data into the cache before
`responding to the interrupt In addition, the context switch may
`also incur contention overhead when data associated with the
`network module is resident in another cache. The problem is fur(cid:173)
`ther aggravated by the prospect of networks moving to the 100
`megabit up to the gigabit range using fiber optics (11, 13, 16].
`For instance, in a file server attached to a 100 megabit network
`with the interface interrupting on every 2 kilobyte packet, the
`network interrupts every 200 microseconds under load, hardly
`sufficient to do even minimal packet processing.
`Second, the so-called "intelligent'' network adapters that im(cid:173)
`plement transport-level functions have lower performance at the
`transport-level as compared to the alternative system where a net(cid:173)
`work adapter does programmed 1/0 transfers and a host performs
`transport protocol functions. The primary reason is an inadequate
`internal memory architecture. Currently, the data transfers into
`and out of the buffer memory reduces the number of memory
`cycles available for packet processing. The future system bus
`technology, with a high transfer rate and the burst-mode transfer,
`and the future networks, with a high data rate, will make this
`problem even more acute.
`Finally, conventional transport protocols are too complex or
`awkward for hardware implementation and too slow without it.
`For a large packet, the processing cost incurred in checksumming
`and encryption dominates the packet processing, since the cost
`increases in proportion to the size of a packet. Hardware im(cid:173)
`plementation for such key performance-critical functions would
`substantially increase performance, but the packet formats of
`
`175
`
`DEFS-ALAOO 10588
`Ex.1025.003
`
`DELL
`
`

`

`conventional transport protocols are does not facilitate hardware
`support or implementations.
`An additional factor that motivates rethinking network adapter
`architecture is the problem of a host being bombarded by packets
`from one or more other hosts. The packet arrival rate, especially
`in a high-speed network, can exceed the rate at which a host can
`process and discard these packets, effectively incapacitating the
`host for useful computation. Excessive packet traffic can arise
`either from failures or malicious behavior of a remote host. A
`well-designed network adapter acts as a "firewall" between the
`network and the host
`In this paper, we describe the Network Adapter Board (NAB)
`for the VMP multiprocessor [3], focusing on the architectural
`issues in the host interface, the adapter board, and the transport
`protocol. The adapter host interface architecture is designed
`for minimal latency, minimal interrupt processing overhead and
`minimal data transfer on the system bus. The NAB uses a novel
`internal memory and pipelined processing architecture that im(cid:173)
`plements some of the key performance-critical transport layer
`functions in hardware. The design is coupled to a new transport
`protocol called Versatile Message Transaction Protocol (VMTP)
`[1]. VMTP is a request-response transport protocol specifically
`designed to facilitate implementation by a high-performance net(cid:173)
`work adapter. VMTP assumes an underlying network (or inter(cid:173)
`network) service providing a datagram packet service, and in(cid:173)
`cludes support for multicast, real-time, security and streaming.
`The interested reader is referred to an Internet RFC [l] for fur(cid:173)
`ther details. For brevity in this report, we focus on aspects of the
`protocol relevant to the cost of communication and the design
`of the VMP network adapter. We describe the expected perfor(cid:173)
`mance of the NAB prototype being built. Although targeted for
`the VMP system, the design is applicable to other multiproces(cid:173)
`sors as well as uni-processors.
`The next section describes the host-to-network adapter inter(cid:173)
`face architecture. Section 3 describes the internal architecture of
`the network adapter board. Section 4 describes details of a pro(cid:173)
`totype of the network adapter. Section 5 indicates the (expected)
`performance for this adapter. Section 6 compares our design to
`related work on this problem. We conclude with a summary of
`the current status, our plans to further evaluate this design and
`the current open problems in this area.
`
`2 Host-Network Adapter Interface
`
`A request/response model of communications is used for infor(cid:173)
`mation exchange between a network device and a host. A host
`requests network services with a control block sent to device, and
`the device returns a response after the service is provided. These
`messages are transferred across via an interface that appears to
`the host software as a 1024-byte control register.
`To transmit data, the host software writes a control block, the
`Transmit AUlhorization Record (TAR), to this control register.
`The TAR contains control information describing data to be sent
`including the pointer to data in physical memory. If the data
`fits entirely within the control register, the data segment descrip(cid:173)
`tion is omitted from TAR. In both cases, the network adapter
`transmits the data, checksumming and encrypting (if required).
`Interface interrupts host to inform that a TAR has been used to
`successfully transmit the data.
`To receive data, the host software writes a control block, a
`Receive Authorization Record (RAR), that "arms" the network
`
`interface to receive packets, specifying the maximum size to
`receive and providing a list of pointers to memory pages in host
`memory into which to deliver the data. The RAR can specify
`any one of: (1) a specific source, (2) a class of allowable sources
`or, (3) any source for the received data. where source is either
`a transport-level or network-level address.
`The interface interrupts the host when the received packet(s)
`satisfies one of these RARs, returning the RAR, along with the
`packet header, to the appropriate host via the control register.
`The returned RAR itself may contain small amounts of data in
`addition to the corresponding packet header. When the RAR is
`returned, the data has been already stored in the host memory at
`the location pointer(s) contained in it, unless the data is contained
`in the RAR. Incoming packets are discarded if they cannot be
`matched to an outstanding RAR.
`Type, a subfield in the control block, distinguishes the records
`passed via the control register. Four major types of records, used
`in transmission and reception of data, are an RAR with small
`amounts of data, an RAR with data descriptors, a TAR with small
`amounts of data and a TAR with data descriptors. To optimize
`host-network adapter interactions, host is allowed to combine
`issuing of a TAR and RAR, using the same buffers to send and
`to receive data. Additional types of records are used by a host for
`various purposes such as to add or delete acceptable destinations,
`to restrict traffic from a source, to provide decryption/encryption
`keys to the NAB, to get status information, and to reset the NAB.
`The RARs and TARs include a packet header including a
`network-level header, either small amounts of data or a list of
`data descriptors, and various control information. The buffer
`descriptors in a TAR point to locations in the physical memory
`space where data to transmit is available. The buffer descriptors
`in an RAR point to locations in the physical memory space
`where the data is to be received. The returned RAR or TAR
`contain in addition to the buffer descriptors the number of data
`words actually received or transmitted. The control information,
`used by NAB, includes a link field, type of RAR matching to be
`used, transport-level source and destination addresses, interrupt
`control, timeout control, a local host number, and a local process
`identifier. The buffer descriptors are omitted for T ARs and RARs
`containing small amounts of data. A link field, used by the
`adaptor, allows chaining of these records as necessary. The type
`of RAR matching to be done indicates if the RAR can be used
`for receiving traffic only from the specified source or any source.
`Interrupt control is used to determine when to interrupt the
`host identified in the record. On reception, the host, indicated
`by the host number, is interrupted either when the data of the
`first packet is stored in the system memory or when the data is
`completely received or both. The completed RAR is returned via
`the control register with the length of data received before the
`host indicated in RAR is interrupted. On transmission, one may
`have the host interrupted either when the NAB begins processing
`a TAR, or when the last of the data segment is transmitted.
`The TAR is written into the control register before a host is
`interrupted.
`In the following, we discuss how this interface efficiently
`handles small amounts of data, large amounts of data, and also
`allows the interface to act as a firewall, protecting the host from
`the network.
`
`176
`
`DEFS-ALAOO 10589
`Ex.1025.004
`
`DELL
`
`

`

`2.1 Short Message Handling
`
`The latency with short messages is minimized because the short
`message is written to the interface as part of the TAR and read
`as part of the returned RAR on reception. The operating system
`interrupt handlers for the network adapter can directly copy the
`message data between the control register and the operating sys(cid:173)
`tem data structures, moving the higher-layer data to its intended
`destination with minimal cost. For multiprocessor hosts with
`per processor cache this procedure also avoids additional cache
`and bus activity, as we expect the small amount of data to be
`already available in cache before being sent with TAR or used
`immediately after having received in an RAR. Thus, the delay
`introduced in transmitting and receiving a packet with a small
`amount of data is no more than that incurred with a host directly
`handling the packet and using the interface as a staging area to
`send and receive packets.
`Note that including the packet header in the TAR means that
`the processor writes a small amount of data to the interface for
`transmission yet the network adapter has minimal processing
`on the data to prepare packets for transmission. In particular,
`for small data appended to header, the network adapter need
`not do anything before starting network transmission, given that
`checksumming and encryption occur as part of transmission.
`
`2.2 Long Message Handling
`
`Host overhead is minimized for the transmission and reception
`of large amounts of data, typically in the range of 4-16 kilobytes,
`by passing descriptors rather than actual data. On transmission,
`the host writes one TAR and receives one completion interrupt,
`with the network adapter transferring the data from host memory
`with minimal bus overhead. 1 The network adapter handles the
`per-packet overhead of packetizing, checksumming, encryption
`and per-packet coordination. On reception, the host receives a
`single interrupt for each RAR returned after the data has been
`transferred into global memory. Again, the per-packet interrupt
`overhead is handled by the network adapter.
`Latency in transfer of large amounts of data is reduced by
`ensuring minimal bus and memory references. Moving byte(cid:173)
`based processing functions such as checksumming and encryp(cid:173)
`tion to network adapter reduces memory references to one per
`data word. Passing buffer descriptors to network adapter which
`retrieves data from host memory ensures that only one bus trans(cid:173)
`fer per word transferred is required. For multiprocessors with
`per-processor cache, the buffer-passing model helps reduce num(cid:173)
`ber of cache misses one would otherwise incur, as a cache is
`neither likely to have most of the data transferred nor going to
`use it in the near future. The cache pollution, resulting from us(cid:173)
`ing cache for network data transfers, would also increase cache
`miss ratio for other applications. An additional factor reduc(cid:173)
`ing latency is the use of burst-mode transfer on host bus, which
`decreases transfer time of data on the bus by a factor of 4.
`In sending and receiving groups of packets, the interface can
`afford to introduce some latency for the first packet of the group
`as long as the whole transmission and reception has less delay
`as compared to a host processor handling per packet processing.
`That is, for a small number of packets K, it should be the case
`that
`
`K * Pho.i > D + K * Pinter face
`1 The VMP memory supports block transfer using the VME serial bus protocol,
`thereby minimizing bus occupancy and arbitration overltead
`
`~~~~~~~~~~
`
`where writing the control record to the interface introduces a
`delay D in transmission over the host processor writing the data
`directly to the network, K is the number of packets to be trans(cid:173)
`mitted, Pho•t is the time for the host to packetize and send one
`full-sized packet and Pinter 1 ace is the time for the interface to
`transmit one full-sized packet. The value of K for which this is
`true should be as small as possible, ideally 1 but certainly less
`than the common size of a multi-packet packet group. In this
`interface, the value is close to l primarily because Pinter J ace is
`much smaller than Pho•t.
`
`2.3 Network Firewall
`
`The interface architecture is designed to allow the network
`adapter to function as a firewall, protecting the host from network
`packet pollution, both accidental and malicious. In essence, a
`host incurs no overhead for network packets whose reception
`it has not authorized; the interface discards all packets that are
`not compatible with an RAR provided by the host and are not
`directed to an end-point registered with the adaptor. Some ex(cid:173)
`amples of its use follows. If the host does not provide an RAR
`for broadcast packets, then garbage broadcast packets incur zero
`overhead on the host processor(s). Multiple responses generated
`by a multicast request can be limited to only those that fit into
`the buffer area provided; the rest are discarded without incurring
`any host overhead. By providing RARs for only those sources
`it wants to listen to, a process could avoid host overhead re(cid:173)
`quired otherwise for pruning out the traffic from unauthorized
`sources. In general, the authorization model of packet reception
`plus the speed of the network adapter insulates the host from
`packet pollution on the network.
`
`3 Network Adapter Internal Architecture
`
`The network adapter internal architecture is designed to provide
`maximal performance between the host interface and the net(cid:173)
`work architecture. The internal architecture is structured as five
`major components, interconnected as shown in Figure 1. These
`components serve the following functions:
`
`Network access controller (NAC) : implements the network
`access protocol and transfers data between the network and
`the packet pipeline.
`
`Packet Pipeline : generates and checks transport-level check(cid:173)
`sums and performs encryption and decryption of data in
`secure communication.
`
`Buffer memory : a staging and speed-matching area for data in
`transit between the packet pipeline and the host memory. Its
`specialized buffer memory permits fast block data transfers
`between the network and host and provides the on-board
`processor with contention-free memory access to the packet
`data.
`
`Host block copier : moves data between the buffer memory
`and the host memory using a burst-transfer bus protocol,
`minimizing the latency as well as the bus and memory over(cid:173)
`head for transfers.
`
`On-board processor : a general-purpose processor that man(cid:173)
`ages the packet processing pipeline and various bookkeep(cid:173)
`ing functions associated with the protocol.
`
`177
`
`DEFS-ALAOO 10590
`Ex.1025.005
`
`DELL
`
`

`

`y Network Adapter
`
`Boerd (NAB)
`
`J CHECKSUll I EHCRYPllON I NET#ORK ACCESS
`
`'1 LOGIC
`
`LOGIC
`
`CONTROLLER
`
`BUFFER
`llEllORY
`
`CONTROUfR
`Hf
`
`•
`
`llOST llTEllFACE
`HOST BLOCK COPIER
`
`I •
`
`PACKET Pl'BJIE
`
`IElWOllC Ull(
`
`Host Bus
`
`Figure 1: Network Adapter Internal Architecture
`
`Transmission is handled in three steps. When a Transmission
`Authorization Record is written to the interface, it is moved into
`the interface from the host memory by the host block copier.
`The TAR provides a description of the segment of data in the
`host memory to transmit Next, the on-board processor forms the
`first packet from the TAR and first data blocks of the message
`and queues the packet for the packet pipeline using the infor(cid:173)
`mation provided by the TAR. Finally, the packet is processed
`and transmitted by the packet processing pipeline and NAC at
`the network data rate. The pipeline calculates a checksum and
`optionally encrypts the data as the packet is transmitted.
`On reception, a packet is accepted by the NAC and passed
`through the packet pipeline which decrypts the data, verifies
`the checksum, and deposits the received packet into the buffer
`memory. If the received checksum is verified, the packet is
`matched to the appropriate RAR. For the first packet in a group of
`packets, this may involve locating and allocating a non-specific
`RAR, making it dedicated to receiving more packets from this
`source. The packet data is then delivered into the host memory
`associated with this RAR. The reception of a packet into the
`buffer memory proceeds concurrently with both the checksum
`verification and the transfer to the host of previous packets. On
`reception of the final packet or on timeout, the host is interrupted
`and informed of the receipt of this packet group by returning
`the RAR in the interface control register. Thus, the host is
`interrupted only once per RAR used by the NAB.
`If there is no matching RAR for a packet, the adapter deter(cid:173)
`mines whether the destination address is locally acceptable. An
`address is locally acceptable if the address is in the local host's
`list of destination addresses, both individual and group addresses.
`The adapter then transmits a response to the local host indicat(cid:173)
`ing that the packet was discarded. When the destination address
`is valid but no RAR is found, the interface discards the seg(cid:173)
`ment data and returns an indication to the destination host via
`the control register. The interface does not time out a partially(cid:173)
`filled RAR; the partially-filled RAR is returned to the host only
`
`on an explicit request from a hosL The host handles sending
`out retransmission requests and various timeouts. The host also
`sends acknowledgments and handles retransmission requests by
`issuing a new TAR.
`Three key aspects to the design are the buffer memory, the
`packet processing pipeline, and the use of the general-purpose
`processor, as discussed in the following sections.
`
`3.1 The Buffer Memory
`
`The network adapter requires buffer memory in order to speed(cid:173)
`match between the host bus and the network as well as to pro(cid:173)
`vide a staging area for the transmission and reception of pack(cid:173)
`ets. Three issues arise with the buffer memory design. First, the
`buffer memory must provide sufficient contention-free memory
`access to support simultaneous uses by the on-board processor,
`NAC, and block copier, as occurs under load. The performance
`of many so-called "smart" network adapters suffer from this con(cid:173)
`tention. Second, the buffer memory must minimize latency for
`packet transmission and reception over direct transmission be(cid:173)
`tween the host memory and the network. Finally, a provision is
`required to prevent overcommitting the buffer memory to either
`transmission or reception, which would interfere with the func(cid:173)
`tioning of the adapter. Our approach to each of these issues is
`discussed below.
`
`3.1.1 Buffer Memory Contention
`
`To minimize contention, the buffer memory uses dual-port static
`column RAM components, also referred to as Video RAM ICs.
`The Video RAM-based buffer memory, shown in Figure 2, pro(cid:173)
`vides multiple buffers to hold and to process packets while a
`packet is being received or being transmitted. This IC provides
`
`4 bb 18 Video Rim : 32-llil word
`
`Address
`
`Random-access
`
`Dll&Porl
`
`Figure 2: Video RAM-based Buffer Memory at Network Adapter
`Board (NAB)
`
`two independently accessed ports: one providing high-speed
`burst-mode transfer, and the other providing random-access. The
`serial-access port is used to move a packet from the network into
`
`178
`
`DEFS-ALAOO 10591
`Ex.1025.006
`
`DELL
`
`

`

`the buffer memory or the data from host memory to the buffer
`memory. The random-access port provides memory access for
`on-board processing of packets by the adapter's general-purpose
`processor. The serial access does not need address set-up and
`decoding time so read-write times on this port are faster than
`read-write times for a RAM array. For instance, in our proto(cid:173)
`type, memory is 32 bits wide, the serial access time is 40 ns per
`word and the cycle time for random read/write access is 200 ns.
`This gives an effective transfer rate of 800 megabits/second over
`the serial port and 160 megabits/second over the random-access
`port. To provide the equivalent memory bandwidth on a single
`ported standard 32-bit wide memory would require a memory IC
`with read/write cycle time of 33 ns. Currently, such fast mem(cid:173)
`ories are available, but they cost more and have less memory
`density than video RAMs.
`Data transfer operations in this memory proceed as follows.
`A packet (or data) is received from the network (or the host) via
`the serial port into the shift register contained in Video RAM
`ICs. The shift register acts as the temporary storage. When
`the block is completely received, it is transferred from the shift
`register, in a single memory cycle, to a row of the memory cell
`array constituting the buffer memory. The processor manipulates
`header fields of packets stored in the array via the random-access
`port. The processing of a packet continues without interference
`while the next packet is being received, except for one memory
`cycle stolen for each received packet transferred to the memory
`cell array from the shift register.
`Video RAM lCs provide performance that closely approxi(cid:173)
`mates the performance of true multiport memories, but at a frac(cid:173)
`tion of the cost. A triple-port memory cell would triple the area
`of the memory cell, reducing memory density and increasing its
`access time. Video RAMs provide full memory bandwidth to
`the processor at low cost. They also allow high-speed block
`data transfer between the buffer memory and the host and the
`network. The separate serial port avoids the processor losing
`memory bandwidth to arbitration overhead on the random access
`port. The serial transfer ports, accessed in parallel across a bank
`of video RAM ICs, maximizes the data unit to be transferred
`per arbitration between the host block copier and the NAC and
`minimizes the transfer time, thereby minimizing the arbitration
`penalty.
`High speed FIFOs could be used as an alternative to the buffer
`memory described above to speed-match between the host and
`the network. Intuitively, a FIFO-based design should have min(cid:173)
`imal delay for two reasons. With short FIFOs, one begins to
`process and transmit the packet before the entire packet has been
`copied. With FIFOs, one avoids the software overhead of man(cid:173)
`aging input and output packet queues. Nevertheless, our buffer
`memory design provides better performance for reasons outlined
`below.
`When the system bus is lightly loaded, our design compares
`favorably with the FIFO approach for large and small amounts
`of data transfers. For large data transfers, our design amortizes
`the buffering latency of the first packet over multiple packets.
`For a short single packet transfer, the difference in delay is small
`because the main source of delay in this case is the processing
`done at a host and the adapter, not the time of copying data to
`the interface.
`However, at even moderate levels of bus traffic, our design
`outperforms the FIFO-based approach. When the bus is con(cid:173)
`gested, the demand for bus access may not be satisfied in time
`to avoid underrunning or overrunning a FIFO, resulting in packet
`
`loss at a sender and at a receiver. The time thus lost in transmit(cid:173)
`ting aborted packets as well as the retransmission delay increases
`the total time needed to transfer a data segment.
`Mismatch between transmission data rates on the host bus
`and the network channel also supports using the buffer memory.
`When host bus speed is much higher than network channel speed,
`the additional delay for bringing the first packet in full before
`beginning transmission is negligible compared to the total trans(cid:173)
`mission time for a large data segment. When host bus speed is
`comparable to (or lower than) the network channel speed, bring(cid:173)
`ing in a full packet before starting transmission is necessary to
`avoid (frequent) loss of packets by contention on the bus.
`
`3.1.2 Buffer Latency
`
`Buffer latency refers here to the additional delay caused by the
`on-board buffering of a packet over the direct transmission by
`host to network. We minimize it by using a hardware block
`copier and by using contention-less memory accessing. A block
`copier transfers data between host memory and NAB using a
`serial memory transfer protocol, which minimizes transfer time
`and bus occupancy. Moreover, the NAB memory design de(cid:173)
`scribed above facilitates high speed block data transfers 2 and
`provides contention-less memory accessing, both minimizing the
`buffering latency.
`In this design, the cost of the buffering latency of the first
`packet is amortized over the subsequent packets sent in the
`packet group. The subsequent packets are copied from the host
`memory or the network link in parallel with the processing of
`the previous packet transferred, hiding the cost of their buffer(cid:173)
`ing latency. For large amounts of data, the buffering latency of
`the first packet forms only a small fraction of the total delay,
`For instance, the minimum transmission time for 16 Kbytes of
`data over a 100 megabits/sec network is 1.31 ms; whereas, the
`buffering latency for the first packet in this packet group is ap(cid:173)
`proximately 25 microseconds, 3 which is less than 2 percent of
`the total transmission time. The latency is even less significant
`compared to request-response delay for large data transfers, as
`this delay includes processing times at the host processors.
`We note that the data transfer between host and interface
`buffer memory does not constitute an extra copy because the
`NAB performs all the functions that the host processor would
`have otherwise performed if it was performing the transfer. That
`is, the interface memory is required as a staging area for incom(cid:173)
`ing and outgoing network traffic in any case.
`
`3.1.3 Reception of Garbage Packets
`
`Ideally, the mechanism for matching a packet to an RAR is
`fast enough so that it canriot be overrun by receiving garbage
`packets from the network. The packets with correct node address
`or multicast packets may still be undesirable, if

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket