throbber
An Analysis of TCP Processing
`Overhead
`
`David D. Clark
`Van Jacobson
`John Romkey
`Howard Sa/wen
`
`w HILE NETWORKS, ESPECIALLY LOCAL AREA
`
`networks, have been getting faster, perceived throughput at the
`application has not always increased accordingly. Various per(cid:173)
`formance bottlenecks have been encountered, each of which
`has to be analyzed and corrected.
`One aspect of networking often suspected of contrib11ting to
`low throughput is the transport layer of the protocol suite. This
`layer, especially in connectionless protocols, has considerable
`functionality, and is typically executed in software by the host
`processor at the end points of the network. It is thus a likely
`source of processing overhead.
`While this theory is appealing, a preliminary examination
`suggested to us that other aspects of networking may be a more
`serious source of overhead. To test this proposition, a detailed
`study was made of a popular transport protocol, Transmission
`Control Protocol (TCP) [ 1 ]. This paper provides results of that
`study. Our conclusions are that TCP is in fact not the source of
`the overhead often observed in packet processing, and that it
`could support very high speeds if properly implemented.
`
`Our conclusions are that TCP is in fact
`not the source of the overhead often
`observed in packet processing, and that
`it could support very high speeds if
`properly implemented.
`
`TCP
`TCP is the transport protocol from the Internet protocol
`suite. In this set of protocols, the functions of detecting and re(cid:173)
`covering lost or corrupted packets, flow control, and
`multiplexing are performed at the transport level. TCP uses se(cid:173)
`quence numbers, cumulative acknowledgment, windows, and
`software checksums to implement these functions.
`
`TCP is used on top of a network level protocol called
`Internet Protocol (IP) [2). This protocol, which is a
`connectionless or datagram packet delivery protocol, deals
`with host addressing and routing, but the latter function is al(cid:173)
`most totally the task of the Internet level packet switch, or gate(cid:173)
`way. IP also provides the ability for packets to be broken into
`smaller units (fragmented) on passing into a network with a
`smaller maximum packet size. The IP layer at the receiving end
`is responsible for reassembling these fragments. For a general
`review of TCP and IP, see [3) or [4).
`Under IP is the layer dealing with the specific network tech(cid:173)
`nology being used. This may be a very simple layer in the case
`of a local area network, or a rather complex layer for a network
`such as X.25. On top of TCP sits one of a number of applica(cid:173)
`tion protocols, most commonly for remote login, file transfer,
`or mail.
`The Analysis
`This study addressed the overhead of running TCP and IP
`(since TCP is never run without IP) and that of the operating
`system support needed by them. It did not consider the cost of
`the driver for some specific network, nor did it consider the
`cost of running an application.
`The study technique is very simple: we compiled a TCP,
`identified the normal path through the code, and counted the
`instructions. However, more detail is required to put our work
`in context.
`The TCP we used is the currently distributed version of
`TCP for UNIX from Berkeley [5]. By using a production(cid:173)
`quality TCP, we believe that we can avoid the charge that 011r
`TCP is not fully functional.
`While we used a production TCP as a starting point for our
`analysis, we made significant changes to the code. To give TCP
`itself a fair hearing, we felt it was necessary to remove it from
`the UNIX environment, lest we be confused by some unex(cid:173)
`pected operating system overhead.
`For example, the Berkeley implementation of UNIX uses a
`buffering scheme in which data is stored in a series of chained
`buffers called mbufs. We felt that this buffering scheme, as well
`as the scheme for managing timers and other system features,
`
`0163-6804/89/0006-0023 $01.00"' 1989 IEEE
`
`June 1989 - IEEE Communications Magazine • 23
`
`DEFS-ALA000? 181
`INTEL Ex.1030.001
`
`

`

`was a characteristic of UNIX rather than of the TCP, and that
`it was reasonable to separate the cost of TCP from the cost of
`these support functions. At the same time, we wanted our eval(cid:173)
`uation to be realistic. So it was not fair to altogether ignore the
`cost of these functions.
`Our approach was to take the Berkeley TCP as a starting
`point, and modify it to better give a measure of intrinsic costs.
`One of us (Romkey) removed from the TCP code all references
`to UNIX-specific functions such as mbufs, and replaced them
`with working but specialized versions of the same functions.
`To ensure that the resulting code was still operational, it was
`compiled and executed. Running the TCP in two UNIX ad(cid:173)
`dress spaces and passing packets by an interprocess communi(cid:173)
`cation path, the TCP was made to open and close connections
`and pass data. While we did not test the TCP against other im(cid:173)
`plementations, we can be reasonably certain that the TCP that
`resulted from our test was essentially a correctly implemented
`TCP.
`The compiler used for this experiment generated reasona(cid:173)
`bly efficient code for the Intel 80386. Other experiments we
`have performed tend to suggest that for this sort of application,
`the number of instructions is very similar for an 80386, a
`Motorola 68020, or even an RISC chip such as the SPARC.
`Finding the Common Path
`One observation central to the efficient implementation of
`TCP is that while there are many paths through the code, there
`is only one common one. While opening or closing the connec(cid:173)
`tion, or after errors, special code will be executed. But none of
`this code is required for the normal case. The normal case is
`data transfer, while the TCP connection is established. In this
`state, data flows in one direction, and acknowledgment and
`window information flows in the other.
`
`Control Packet Flow
`
`,----·------------,
`
`I
`I
`I
`I
`
`Data
`Sender
`
`--, r-
`I I Output
`1nput
`~rocess- I I ~rocess-
`ing
`mg
`(191-213> I I c23s>
`
`Data
`Receiver
`
`I
`I
`I
`I
`--, r::--
`I Output .i
`Process-
`ling
`I (235)
`
`Data Packet Flow
`
`( ) = Instructions
`
`Fig. 1. Analysis terminology.
`
`In writing a TCP, it is important to optimize this path. In
`studying the TCP, it was necessary to find and follow it in the
`code. Since the Berkeley TCP did not separate this path from
`all the other cases, we were not sure if it was being executed as
`efficiently as possible. For this reason, and to permit a more di(cid:173)
`rect analysis, we implemented a special "fast path" TCP. When
`a packet was received, some simple tests were performed to see
`whether the connection was in an established state, the packet
`had no special control flags on, and the sequence number was
`expected. If so, control was transferred to the fast path. The
`version of the TCP that we compiled and tested had this fast
`path, and it was this fast path that we audited.
`
`24 '" June 1989 - IEEE Communications Magazine
`
`There are actually two common paths through the TCP, the
`end sepding data and the end receiving it. In general, TCP per(cid:173)
`mits both ends to do both at once, although in the real world it
`happens only in some limited cases. But in any bulk data exam(cid:173)
`ple, where throughput is an issue, data almost always flows in
`only one direction. One end, the sending end, puts data in its
`outgoing packets. When it receives a packet, it finds only con(cid:173)
`trol information: acknowledgments and windows.
`The other receiving end finds data in its incoming packets
`and sends back control information. In this paper, we will use
`the terms sender and receiver to describe the direction of data
`flow. Both ends really do receive packets, but only one end
`tends to receive data. In Figure I, we illustrate how our terms
`describe the various steps of the packet processing.
`A First Case Study-Input Processing
`A common belief about TCP is that the most complex, and
`thus most costly, part is the packet-receiving operation. In fact,
`as we will discuss, this belief seems false. When receiving a
`packet, the program must proceed through the packet, testing
`each field for errors and determining the proper action to take.
`In contrast, when sending a packet, the program knows exactly
`what actions are intended and essentially has to format the
`packet and start the transmission.
`A preliminary investigation tended to support this model,
`so for our first detailed analysis, we studied the program that
`receives and processes a packet.
`There are three general stages to the TCP processing. In the
`first, a search is made to find the local state information (called
`the Transmission Control Block, or TCB) for this TCP connec(cid:173)
`tion. In the second, the TCP checksum is verified. This re(cid:173)
`quires computing a simple function of all the bytes in the pack(cid:173)
`et. In the third stage, the packet header is processed. (These
`steps can be reordered for greater efficiency, as we will discuss
`later.)
`We chose not to study the first two stages. The checjcsum
`cost depends strongly on the raw speed of the environment and
`the detailed coding of the computation. The lookup function
`similarly depends on the details of the data structure, the as(cid:173)
`sumed number of connections, and the potential for special
`hardware and algorithms. We will return to these two opera(cid:173)
`tions in a later section, but in the present analysis, they are
`omitted.
`The following analysis thus covers the TCP processing from
`the point where the packet has been checksummed and the
`TCB has been found. It covers the processing of all the header
`data and the resulting actions.
`The packet input processing code has rather different paths
`for the sender and receiver of data. The overall numbers are the
`following:
`• Sender of data: 191 to 213 instructions
`• Receiver of data: 186 instructions
`A more detailed breakdown provides further insight.
`Both sides contain a common path of 154 instructions. Of
`these, 15 are either procedure entry and exit or initialization.
`For the receiver of data, an additional 15 instructions are spent
`sequencing the data and calling the buffer manager, and anoth(cid:173)
`er 17 are spent processing the window field in the packet.
`The sender of data, which is receiving control information,
`has more steps to perform. In addition to the 154 common in(cid:173)
`structions, it takes 9 to process the acknowledgment, 20 to
`process the window, 17 to compute the outgoing congestion
`window (so-called "slow-start" control), and 44 instructions
`(but not for each packet) to estimate the round trip time. The
`round-trip delay is measured not for every packet, but only
`once per round trip. For short delay paths, where one packet
`can be sent in one round trip, this cost could occur for every ac(cid:173)
`knowledgment. Since the Berkeley TCP acknowledges at most
`
`DEFS-ALA000? 182
`INTEL Ex.1030.002
`
`

`

`every other packet in a bulk data transfer, the cost in this case is
`22 instructions per packet. For longer paths, the cost will be
`spread over more packets, so 22 instructions is an upper
`bound.
`From this level ofanalysis of one part of the code, it is possi(cid:173)
`ble to draw a number of conclusions. First, there are actually
`very few instructions required. In the process of making this
`study, we found several opportunities for shortening the path
`length. None of those are in this version. Second, a significant
`amount of code is involved in control of the protocol dynam(cid:173)
`ics; computing the congestion window and the round trip
`delay. These activities have nothing to do with the actual data
`flow. The actual management of the sequence numbers is very
`quick. But between l 7 and 61 instructions are spent on compu(cid:173)
`tation of dynamic control parameters.
`The analysis made clear that some changes to the protocol
`would provide a slight speedup. But any improvement
`achieved would be fractional, not a gross change in overall per(cid:173)
`formance. In a later section, we will return to a more global
`speculation on what these numbers mean.
`TCP Output Processing
`We subjected the output side of TCP to an analysis that was
`somewhat less detailed than that of the input side. We did not
`program a fast path, and we did not attempt to separate the
`paths for data sending and receiving. Thus, we have a single
`number that is a combination of the two paths and which (by
`inspection) could be significantly improved by an
`optimization of the common path.
`We found 235 instructions to send a packet in TCP. This
`number provides a rough measure of the output cost, but it is
`dangerous to compare it closely with the input processing cost.
`Neither the output side nor the input side had been carefully
`tuned, and both could be reduced considerably. Only if both
`paths had received equivalent attention would a direct com(cid:173)
`parison be justified. Our subjective conclusion in looking at
`the two halves of the TCP is that our starting assumption (the
`receiving side is more complex than the sending side) is proba(cid:173)
`bly wrong. In fact, TCP puts most of its complexity in the send(cid:173)
`ing end of the connection. This complexity is not a part of
`packet sending, but a part of receiving the control information
`about that data in an incoming acknowledgment packet.
`The Cost of IP
`In the normal case, IP performs very few functions. Upon
`inputting of a packet, it checks the header for correct form, ex(cid:173)
`tracts the protocol number, and calls the TCP processing func(cid:173)
`tion. The executed path is almost always the same. Upon
`outputting, the operation is even more simple.
`The instruction counts for IP were as follows:
`• Packet receipt: 5 7 instructions
`• Packet sending: 61 instructions
`
`An Optimization Example-Header
`Prediction
`The actual sending of a packet is less complex than the re(cid:173)
`ceiving of one. There is no question of testing for malformed
`packets, or of looking up a TCB. The TCB is known, as is the
`desired action.
`One example of a simple operation is the actual generation
`of the outgoing packet header. IP places a fixed-size, 20-byte
`header on the front of every IP packet, plus a variable amount
`of options. Most IP packets carry no options. Of the 20-byte
`header, 14 of the bytes will be the same for all IP packets sent
`by a particular TCP connection. The IP length, ID, and check-
`
`sum fields ( 6 bytes total) will probably be different for each
`packet. Also, if a packet carries any options, all packets for that
`TCP connection will be likely to carry the same options.
`The Berkeley implementation of UNIX makes some use of
`this observation, associating with each connection a template
`of the IP and TCP headers with a few of the fixed fields filled
`in. To get better performance, we designed an IP layer that cre(cid:173)
`ated a template with all the constant fields filled in. When TCP
`wished to send a packet on that connection, it would call IP and
`pass it the template and the length of the packet. Then IP would
`block-copy the template into the space for the IP header, fill in
`the length field, fill in the unique ID field, and calculate the IP
`header checksum.
`This idea can also be used with TCP, as was demonstrated
`in an earlier, very simple TCP implemented by some of us at
`MIT [6]. In that TCP, which was designed to support remote
`login, the entire state of the output side, including the unsent
`data, was stored as a preformatted output packet. This reduced
`the cost of sending a packet to a few lines of code.
`A more sophisticated example of header prediction in(cid:173)
`volves applying the idea to the input side. In the most recent
`version of TCP for Berkeley UNIX, one of us (Jacobson) and
`Mike Karels have added code to precompute what values
`should be found in the next incoming packet header for the
`connection. If the packets arrive in order, a few simple compar(cid:173)
`isons suffice to complete header processing. While this version
`of TCP was not available in time to permit us to compile and
`count the instructions, a superficial examination suggests that
`it should substantially reduce the overhead of processing com(cid:173)
`pared to the version that we reviewed.
`Support Functions
`The Buffer Layer
`The most complex of the support functions is the layer that
`manages the buffers which hold the data at the interface to the
`layer above. Our buffer layer was designed to match high(cid:173)
`throughput bulk data transfer. It supports an allocate and free
`function and a simple get and put interface, with one addition(cid:173)
`al feature to support data sent but not yet acknowledged. All
`the bookkeeping about out-of-order packets was performed by
`TCP itself.
`The buffer layer added the following costs to the processing
`of a packet:
`• Sending a data packet: 40 instructions
`•Receiving a data packet: 35 instructions
`• Receiying an acknowledgment (may free a buffer): 30 in(cid:173)
`structmns
`It might be argued that our buffer layer is too simple. We
`would accept that argument, but are not too concerned about
`it. All transport profocols must have a buffer layer. In compar(cid:173)
`ing two transport protocols, it is reasonable to assume (to first
`order) that if they have equivalent service goals, then they will
`have equivalent buffer layers.
`A buffer layer can easily grow in complexity to swamp the
`protocol itself. The reason for this is that the buffer layer is the
`part of the code in which the demand for varieties of service
`has a strong effect. For example, some implementations of
`TCP attempt to provide good service to application clients
`who want to deal with data one byte at a time, as well as others
`who want to deal in large blocks. To serve both types of clients
`requires a buffer layer complex enough to fold both of these
`models together. In an informal study, done by one of us
`(Clark), of another transport protocol, an extreme version of
`this problem was uncovered: of 68 pages of code written in C,
`which seemed to be the transport protocol, over 60 were found
`to be the buffer layer and interfaces to other protocol layers,
`and only about 6 were the protocol.
`
`June 1989 - IEEE Communications Magazine • 25
`
`DEFS-ALA000? 183
`INTEL Ex.1030.003
`
`

`

`The problem of the buffer layer is made worse by the fact
`that the protocol specifiers do not admit that such a layer ex(cid:173)
`ists. It is not a part of the ISO reference model, but is left as an
`exercise for the implementor. This is reasonable, within limits,
`since the design of the buffer layer has much to do with the par(cid:173)
`ticular operating system. (This, in fact, contributed to the great
`simplicity of our buffer layer; since there was no operating sys(cid:173)
`tem to speak of, we were free to structure things as needed, with
`the right degree of generality and functionality.)
`
`The problem of the buffer layer is
`made worse by the fact that the
`protocol specifiers do not admit that
`such a layer exists.
`
`However, some degree of guidance to the implementor is
`necessary, and the specifiers of a protocol suite would be well
`served to give some thought to the role of buffering in their ar(cid:173)
`chitecture.
`Timers and Schedulers
`In TCP, almost every packet is coupled to a timer. On send(cid:173)
`ing data, a retransmit timer is set. On receipt of an acknowledg(cid:173)
`ment, this timer is cleared. On receiving data, a timer may be
`set to permit dallying before sending the acknowledgment. On
`sending the acknowledgment, if that timer has not expired, it
`must be cleared.
`The overhead of managing these timers can sometimes be a
`great burden. Some operating systems' designers did not think
`that timers would be used in this demanding a context, and
`made no effort to control their costs.
`In this implementation, we used a specialized timer package
`similar to the one described by Varghese [7]. It provides very
`low-cost timer operations. In our version the costs were:
`• Set a timer: 35 instructions
`• Clear a timer: 1 7 instructions
`•Reset a timer (clear and set together): 41 instructions
`Checksums and TCBs-The Missing
`Steps
`In the discussion of TCP input processing above, we inten(cid:173)
`tionally omitted the costs for computing the TCP checksum
`and for looking up the TCB. We now consider each of these
`costs.
`The TCP checksum is a point of long-standing contention
`among protocol designers. Having an end-to-end checksum
`that is computed after the packet is actually in main memory
`provides a level of protection that is very valuable [8]. Howev(cid:173)
`er, computing this checksum using the central processor rather
`than some outboard chip may be a considerable burden on the
`protocol. In this paper, we do not want to take sides on this
`matter. We only observe that "you get what you pay for." A
`protocol designer might try to make the cost optional, and
`should certairtly design the checksum to be as efficient as possi(cid:173)
`ble.
`There are a number of processing overheads associated with
`processing the bytes of the packet rather than the header fields.
`The checksum computation is one of these, but there are oth(cid:173)
`ers. In a later section, we consider all the costs of processing the
`bytes.
`Looking up the TCB is also a cost somewhat unrelated to the
`
`26 • June 1989 - IEEE Communications Magazine
`
`details of TCP. That is, any transport protocol must keep state
`information for each connection, and must use a search func(cid:173)
`tion to find this for an incoming packet. The only variation is
`the number of bits that must be matched to find the state (TCP
`uses 96, which may not be minimal), and the number of con(cid:173)
`nections that are assumed to be open.
`Using the principle of the common path and caching, one
`can provide algorithms that are very cheap. The most simple
`algorithm is to assume that the next packet is from the same
`connection as the last packet. To check this, one need only pick
`up a pointer to the TCB saved from last time, extract from the
`packet and compare the correct 96 bits, and return the pointer.
`This takes very few instructions. One of us (Jacobson) added
`such a single-entry TCB cache to his TCP on UNIX, and mea(cid:173)
`sured the success rate. Obviously, for any bulk data test, where
`the TCP is effectively dedicated to a single connection, the suc(cid:173)
`cess rate of this cache approaches I 00%. However, for a TCP in
`general operation, the success rate (often called the "hit ratio")
`was also very high. For a workstation in general use (opening
`5,715 connections over 38 days and receiving 353,238 pack(cid:173)
`ets), the single entry cache matched the incoming packet 93.2%
`of the time. For a mail server, which might be expected to have
`a much more diverse set of connections, the measured ratio
`was 89.8% (over two days, 2,044 connections, and 121,676 in(cid:173)
`coming packets).
`If this optimization fails too often to be useful, the next step
`is to hash the 96 bits into a smaller value, perhaps an 8-bit field,
`and use this to index into an array oflinked lists ofTCBs, with
`the most recently used TCB sorted first. If the needed TCB is
`indeed first on the list selected by the hash function, the cost is
`again very low. A reasonable estimate is 25 instructions. We
`will use this higher estimate in the analysis to follow.
`
`Some Speed Predictions
`Adding all these costs together, we see that the overhead of
`receiving a packet with control information in it (which is the
`most costly version of the processing path) is about 335 in(cid:173)
`structions. This includes the TCP and IP level processing, our
`crude estimate of the cost of finding the TCB and the buffer
`layer, and resetting a timer. Adding up the other versions of the
`sending and receiving paths yields instruction counts of the
`same magnitude.
`With only minor optimization, an estimate of 300 instruc(cid:173)
`tions could be justified as a round number to use as a basis for
`some further analysis. If the processing overhead were the only
`bottleneck, how fast could a stream of TCP packets forward
`data?
`Obviously, we must assume some target processor to esti(cid:173)
`mate processing time. While these estimates were made for an
`Intel 80386, we believe the obvious processor is a 32-bit RISC
`chip, such as a SPARC chip or a Motorola 88000. A conserva(cid:173)
`tive execution rate for such a machine might be I 0 MIPS, since
`chips of this sort can be expected to have a clock rate of twice
`that or more, and execute most instructions in one clock cycle.
`(The actual rate clearly requires a more detailed analysis-it
`depends on the number of data references, the data fetch archi(cid:173)
`tecture of the chip, the supporting memory architecture, and so
`on. For this paper, which is only making a very rough estimate,
`we believe that a working number of I 0 MIPS is reason(cid:173)
`able.)
`In fairness, the estimate of 300 instructions should be ad(cid:173)
`justed for the change from the 80386 to an RISC instruction
`set. However, based on another study of packet processing
`code, we found little expansion of the code when converting to
`an RISC chip. The operations required for packet processing
`are so simple that no matter what processor is being used, the
`instruction set actually utilized is an RISC set.
`
`DEFS-ALA000? 184
`INTEL Ex.1030.004
`
`

`

`A conservative adjustment would be to assume that 300 in(cid:173)
`structions for a 80386 would be 400 instructions for an RISC
`processor.
`At 10 MIPS, a processor can execute 400 instructions in 40
`µs, or 25,000 packets/s. These processing costs permit rather
`high data rates.
`Ifwe assume a packet size of 4,000 bytes (which would fit in
`an FDDI frame, for example), then 25,000 packets/s provides
`800 Mb/s. For TCP, this number must be reduced by taking
`into account the overhead of the acknowledgment packet. The
`Berkeley UNIX sends one acknowledgment for every other
`data packet during bulk data, so we can assume that only two
`out of the three packets actually carry data. This yields a
`throughput of 530 Mb/s.
`Figuring another way, if we assume an FDDI network with
`I 00 Mb/s bandwidth, how small can the packets get before the
`processing per packet limits the throughput? The answer is 500
`bytes.
`These numbers are very encouraging. They suggest that it is
`not necessary to revise the protocols to utilize a network such
`as FDDI. It is only necessary to implement them properly.
`
`Why Are Protocols Slow?
`The numbers computed above may seem hard to believe.
`While the individual instruction counts may seem reasonable,
`the overall conclusion is not consistent with observed perform(cid:173)
`ance today.
`We believe that the proper conclusion is that protocol pro(cid:173)
`cessing is not the real source of the processing overhead. There
`are several others that are more important. They are just hard(cid:173)
`er to find, and the TCP is easier to blame. The first overhead is
`the operating system. As we discussed above, packet process(cid:173)
`ing requires considerable support from the system. It is neces(cid:173)
`sary to take an interrupt, allocate a packet buffer, free a packet
`buffer, restart the I/0 device, wake up a process (or two or
`three), and reset a timer. In a particular implementation, there
`may be other costs that we did not identify in this study.
`In a typical operating system, these functions may turn out
`to be very expensive. Unless they were designed for exactly this
`function, they may not match the performance requirements at
`all.
`A common example is the timer package. Some timer pack(cid:173)
`ages are designed under the assumption that the common oper(cid:173)
`ations are setting a timer and having a timer expire. These op(cid:173)
`erations are made less costly at the expense of the operation of
`unsetting or clearing the timer. But that is what happens on
`every packet.
`It may seem as if these functions, even if not optimized, are
`small compared to TCP. This is true only if TCP is big. But, as
`we discovered above, TCP is small. A typical path through
`TCP is 200 instructions; a timer package could cost that much
`if not carefully designed.
`The other major overhead in packet processing is perform(cid:173)
`ing operations that touch the bytes. The example associated
`with the transport protocol is computing the checksum. The
`more important one is moving the data in memory.
`Data is moved in memory for two reasons. First, it is moved
`to separate the data from the header and get the data into the
`alignment needed by the application. Second, it is copied to get
`it from the I/O device to system address space and to user ad(cid:173)
`dress space.
`In a good implementation, these operations will be com(cid:173)
`bined to require a minimal number of copies. In the Berkeley
`UNIX, for example, when receiving a packet, the data is
`moved from the I/O device into the chained mbuf structure,
`and is then moved into the user address space in a location that
`is aligned as the user needs it. The first copy may be done by a
`
`DMA controller or by the processor; the second is always done
`by the processor.
`To copy data in memory requires two memory cycles, read
`and write. In other words, the bandwidth of the memory must
`be twice the achieved rate of the copy. Checksum computation
`has only one memory operation, since the data is being read
`but not written. (In this analysis, we ignore the instruction
`fetch operations to implement the checksum, under the as(cid:173)
`sumption that they are in a processor cache.) In this implemen(cid:173)
`tation of TCP, receiving a packet thus requires four memory
`cycles per word, one for the input DMA, one for the checksum
`and two for the copy. 1
`A 32-bit memory with a cycle time of 250 ns, typical for dy(cid:173)
`namic RAMs today, would thus imply a memory limit of 32
`Mb/s. This is a far more important limit than the TCP process(cid:173)
`ing limits computed above. Our estimates of TCP overhead
`could be off by several factors of two before the overhead of
`TCP would intrude into the limitations of the memory.
`A Direct Measure of Protocol
`Overhead
`In an independent experiment, one ofus (Jacobson) directly
`measured the various costs associated with running TCP on a
`UNIX system. The measured system was the Berkeley TCP
`running on a Sun-3/60 workstation, which is based on a 20-
`MHz 68020. The measurement technique was to use a sophis(cid:173)
`ticated logic analyzer that can be controlled by special start,
`stop, and chaining patterns triggered whenever selected ad(cid:173)
`dresses in the UNIX kernel were executed. This technique per(cid:173)
`mits actual measurement of path lengths in packet processing.
`A somewhat subjective division of these times into categories
`permits a loose comparison with the numbers reported above,
`obtained from counting instructions.
`The measured overheads were divided into two groups:
`those that scale per byte (the user-system and network-memory
`copy and the checksum), and those that are per packet (system
`overhead, protocol processing, interrupts, and so on.) See
`Table I.
`
`TABLE I. Measured Overheads
`Costs•
`Per byte:
`User-system copy
`TCP checksum
`Network-memory copy
`Perpaeket:
`Ethernet driver
`TCP + IP + ARP protoct>ls
`Operating system overhead
`·idle time: 200 µs
`
`200µs
`185µs
`386µs
`
`100µs
`100µs
`240µs
`
`The per-byte costs were measured for a maximum-length
`Ethernet packet of 1,460 data bytes. Thus, for example, the
`checksum essentially represents the 125-ns/byte average mem(cid:173)
`ory bandwidth of the Sun-3/60. The very high cost of the
`network-memory copy seems to represent specifics of the par-
`
`1 The four memory cycles per received word are artifacts of the
`Berkeley UNIX we examined, and not part of TCP in general. An ex(cid:173)
`perimental version of Berkeley UNIX being developed by one of us
`(Jacobson) uses at most three cycles per received word, and one cycle
`per word if the network interface uses on-board buffer memory rather
`than DMA for incoming packets.
`
`June 1989 - IEEE Communications Magazine • 27
`
`DEFS-ALA000? 185
`INTEL Ex.1030.005
`
`

`

`ticular Ethernet controller chip used (the LANCE chip), which
`additionally locks up the memory bus during the transfer, thus
`stalling the processor.
`The times reported above can be converted to instruction
`counts (the measure used earlier in the paper) using the fact (re(cid:173)
`corded by the logic analyzer) that this is essentially a 2-MIPS
`machine. Th

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket