throbber
Modeling and Analysis of the Unix Communication Subsystems
`
`Yi-Chun Chu and Toby J. Teorey
`
`Electrical Engineering and Computer Science Department
`The University of Michigan
` Ann Arbor, MI 48109-2122, USA
`
`Abstract
`
`communication
`The performance of host
`subsystems is an important research topic in
`computer networks.1 Performance metrics such as
`throughput, delay, and packet loss are important
`indices to observe the system behavior. Most
`research in this area is conducted by experimental
`measurement; far less attention is paid to the
`analytic modeling approach. The well-known
`complexity
`and dynamic nature of
`the
`Transmission Control Protocol/Internet Protocol
`(TCP/IP) make the performance modeling of
`communication subsystems extremely difficult.
`The purpose of this study is to analyze and model
`the overhead in Unix communication subsystems.
`The overhead is caused by protocol processing as
`well as kernel functions for fair allocation of
`system resources. Our approach is to build
`analytic models of communication overhead for
`sending and receiving a message. The analytic
`models
`can be
`applied
`to
`analyze
`the
`communication overhead for Internet information
`systems, such as the Internet web servers, or
`software servers built above midwares, such as the
`Distributed Computing Environment/Remote
`Procedure Call (DCE/RPC) servers, that require
`intensive network I/O.
`
`1 Introduction
`
`With recent advances in networking technology,
`many services that used to reside in a host system
`are now provided in a distributed fashion. Distrib-
`
`1This work is funded by IBM Canada Ltd. Labora-
`tory Centre for Advanced Studies, Toronto.
`
`uted file service is a good example of this. Provid-
`ing
`services
`across
`networks
`increases
`communication costs: requests and results must be
`transported between client and server machines.
`Intensive overhead caused by network I/O limits
`the capacity of many distributed servers. This
`makes the study of host communication sub-
`systems an important research topic.
`Earlier research closely examined the overhead
`generated in host communication subsystems
`[3,5,6]. The overhead is caused by both protocol-
`specific processing and operating system (OS)
`activities such as data movement, context switch-
`ing, and interrupt handling. Careful analysis of the
`overhead breakdown can improve the design of
`communication subsystems. However, it cannot
`reveal how server machines behave under heavy
`network traffic [10]. This question has received
`more attention recently because many distributed
`servers, such as the World-Wide Web (WWW)
`servers, have generally experienced performance
`problems in response time and service availability
`[11,12].
`To answer the above question, we need an ana-
`lytic solution to study the server system behavior
`under a varied network load. In this paper, we try
`to develop analytic models of communication
`overhead for both sending and receiving a mes-
`sage. These models can be applied to estimate the
`communication overhead in the distributed server
`systems.
`The rest of this paper is organized as follows. In
`Section 2, we introduce the software architecture
`of communication subsystems derived from the
`Berkeley Software Distribution (BSD) Unix. The
`overhead breakdown is organized according to the
`software and protocol layers. The queueing delay
`
`1
`
`INTEL EX.1248.001
`
`

`

`stream
`socket
`
`Application Layer
`
`datagram
`socket
`
`socket receive buffer
`
`socket send buffer
`
`socket receive buffer
`
`Socket Layer
`
`Protocol Layer
`
`TCP/UDP
`
`IP
`
`protocol input queue
`
`Network Interface
`Layer
`
`interface send queue
`
`Figure 1. Software Architecture of Unix Communication Subsystems
`
`in communication subsystems is also described
`here. In Section 3, analytic models for communi-
`cation overhead are developed, along with a
`detailed analysis of the critical path for sending
`and receiving a message. Both transport layer ser-
`vices, TCP and User Datagram Protocol (UDP),
`are considered in our study. In Section 4 we
`present two case studies: the Internet web server
`and the DCE/RPC server with our analytic mod-
`els. Conclusions and future work are outlined in
`Section 5.
`
`2 Unix Communication
`Subsystems
`
`The Unix communication subsystem is divided
`into three software layers: 1) the socket layer, 2)
`
`the protocol layer, and 3) the network-interface
`layer [8]. The software architecture is shown in
`Figure 1.
`The socket layer hides the complexity of net-
`work communication and provides an abstract
`interface similar to a generic I/O device. The pro-
`tocol layer covers protocol-specific processing in
`the transport layer (TCP/UDP) and the network
`layer (IP). The network-interface layer is mainly
`concerned with
`the
`link-layer encapsulation/
`decapsulation and driving the transmission media.
`The TCP/IP protocol specification puts no restric-
`tion on the layered structure of network software.
`Most implementations, however, put the code in
`the kernel with tightly integrated software layers
`for efficiency considerations [2].
`
`2
`
`INTEL EX.1248.002
`
`

`

`The specific communication subsystem we
`studied is a DEC Alpha AXP workstation running
`the OSF/1 1.0 [1]. The workstation is attached to a
`department LAN with an Ethernet adaptor. The
`implementation of the OSF/1 network software
`follows the design of the 4.3 BSD Reno release
`[8].
`
`2.1 The Processing Overhead
`
`In communication subsystems, processing over-
`head can be caused by data-touching operations,
`such as data movement and checksum computa-
`tion, as well as non-data-touching operations, such
`as context switching and interrupt handling.
`Before developing any analytic models to describe
`the system behavior, a thorough understanding of
`the overhead is necessary. The major kinds of
`overheads discovered in communication sub-
`systems are described below.
`
`Data Movement
`
`Data movement is a principal overhead in the
`communication subsystems. This overhead is sig-
`nificant because memory bandwidth has not kept
`pace with the speed of microprocessors [3,6].
`Generally, two data movements are needed in both
`sending and receiving paths. First, data has to be
`copied between user space and kernel space for
`protection reasons. This work is done by CPU,
`and we denote them as Muk(m) and Mku(m) sepa-
`rately (m is the size of data to be moved).1 The
`other data movement is between kernel space and
`network adaptor buffer (which is in I/O space).
`The real work can be done either by CPU (pro-
`gramming I/O) or by Direct Memory Access
`(DMA). Which does the work depends on the
`hardware I/O architecture, and it also depends on
`if the network adaptor has DMA capability. In the
`system we studied, the data movement from ker-
`nel to adaptor is done by PIO, Mka(m); but the data
`movement in the reverse direction is done by
`DMA, Mak(m).
`
`Checksum Computation
`
`1Since both Muk(m) and Mku(m) are memory to mem-
`ory copy, we use Mmm(m) to represent Muk(m) or
`Mku(m).
`
`Network communication relies on checksums to
`preserve end-to-end data integrity. In the Internet
`protocols, a 16-bit checksum field is used for error
`detection in IP header (20 bytes), UDP datagram
`(12-byte IP pseudo header, 8-byte UDP header
`and UDP data), and TCP segment (12-byte IP
`pseudo header, 20-byte TCP header and TCP data)
`[4,15]. We denote the overhead of checksum com-
`putation as CS(m). Both checksum computation
`and data movement are data-touching operations;
`the overhead, hence, grows linearly with the data
`size to be processed.
`
`Protocol-specific Processing
`
`Protocol-specific processing contributes different
`overhead in each protocol layer. Measurement
`results show that the overhead tends to be fixed if
`the checksum computation is not included [6].
`Therefore, we can use constants to represent the
`fixed part of overhead for protocol-specific pro-
`cessing in each layer, and we denote them sepa-
`rately as TCPin, TCPout, UDPin, UDPout, IPin, and
`IPout.
`
`Demultiplexing
`
`Demultiplexing is a table-lookup operation in the
`transport layer. It searches protocol control blocks
`(PCBs) for the socket connection associated with
`an incoming packet. Most implementation derived
`from the BSD Unix uses a link-list structure with a
`one-entry cache2 (for the latest lookup result) to
`improve performance [9]. The search cost depends
`on the number of socket connections in the sys-
`tem. Here we consider it as part of the fixed over-
`head in transport-layer input routines, TCPin and
`UDPin. However, it has been shown that the over-
`head can grow significantly in a busy Internet
`information server that has peak connections for
`more than a thousand [10].
`
`Interrupt Handling
`
`Network communication generates two device
`hardware interrupts: the receiving interrupt and
`the transmission-complete interrupt. The overhead
`for interrupt handling receives less attention than
`other processing overhead. This is probably
`
`2The one-entry cache is called “1-behind cache.”
`
`3
`
`INTEL EX.1248.003
`
`

`

`Socket Layer
`
`Protocol Layer
`
`Table 1. Overhead Breakdown in Software Layers and Network Adaptor
`Overhead
`Description
`data copy from user space to kernel space
`Muk(m)
`data copy from kernel space to user space
`Mku(m)
`checksum computation
`CS(m)
`TCP protocol-specific processing
`TCPin/TCPout
`UDP protocol-specific processing
`UDPin/UDPout
`IP protocol-specific processing
`IPin/IPout
`data copy from kernel space to I/O space (adaptor)
`Mka(m)
`data copy from I/O space (adaptor) to kernel space
`Mak(m)
`link-layer processing
`ETHout
`transmit complete interrupt
`Is
`receive interrupt and link-layer processing
`Ir
`packet transmission time
`Tx(m)
`packet reception time
`Rx(m)
`
`Network Interface
`Layer
`
`Network Adaptor
`
`Device
`CPU
`CPU
`CPU
`CPU
`CPU
`CPU
`CPU
`DMA
`CPU
`CPU
`CPU
`Adaptor
`Adaptor
`
`because of its asynchronous nature and the diffi-
`culty for measuring it. However, careful analysis
`of interrupt handling in the network device driver
`reveals how it critically affects the performance
`during heavy network traffic [14]. In the system
`we studied, the overhead of the receiving inter-
`rupt, denoted as Ir, covers the entire link-layer
`processing. It does not include the data movement
`overhead from network adaptor to kernel, Mka(m)
`(which is done by DMA). The overhead for the
`transmission-complete
`interrupt
`involves data
`movement from kernel to network adaptor (which
`is done by PIO) and the initiation of next packet
`transmission. Hence, it is further divided into a
`fixed part, denoted as Is, and a varied part, denoted
`as Mka(m).
`
`Context Switch
`
`Socket system calls for sending or receiving a
`message are synchronous. As a consequence, it
`blocks the current running process and might
`cause a context switch. Incoming packet process-
`ing, which is driven by asynchronous interrupt,
`will “wakeup” the blocked receiving process at the
`final stage. Here we denote the fixed overhead for
`a context switch as C.
`
`Transmission Time
`
`Transmission time depends on the speed of trans-
`mission media the workstation is attached to.
`Transmission speed can vary several orders of
`magnitude, e.g. from 10 Mb/s (Ethernet) to 622
`
`Mb/s (ATM). Packet transmission and reception
`are also data-touching operations; we denote the
`overhead as Tx(m) and Rx(m). Generally, it takes
`equal time to transmit or receive a packet.
`
`Others
`
`Other kinds of overheads not listed above, such as
`mbufs allocation, are not significant to our analy-
`sis. As a result, we treat them as part of the fixed
`overhead in protocol-specific processing.
`
`2.2 Overhead Breakdown in
`Software Layers
`
`The breakdown of processing overhead in com-
`munication subsystems is categorized in Table 1.
`Overhead for data-touching operations and for
`non-data-touching operations is organized accord-
`ing to the software layers to which it applies.
`The overhead breakdown help us see where the
`overhead is generated. In Section 3, we use this
`table to develop analytic models of communica-
`tion overhead by detailed analysis of sending and
`receiving paths.
`
`2.3 Queueing Delay in
`Communication Subsystems
`
`The processing overhead described above does not
`account for the entire delay accumulated in com-
`munication subsystems. There is a queueing delay
`
`4
`
`INTEL EX.1248.004
`
`

`

`introduced by buffers or queues within or between
`the software layers. These queues and buffers are
`also shown in Figure 1 and described below.
`
`Socket Send Buffer
`
`The socket send buffer holds data not sent yet or
`sent but not acknowledged by the receiving end.
`Since UDP does not provide flow control or reli-
`able message delivery, a UDP message is never
`placed into the socket send buffer.1 For TCP, the
`queueing delay is determined by its flow-control
`algorithms such as the slow start and congestion
`avoidance.
`
`Socket Receive Buffer
`
`The socket receive buffer is used to hold data
`received, but not yet delivered to the application.
`The queueing delay, hence, depends on how
`quickly the receiving process can accept the data.
`For TCP, its flow-control algorithm will prevent
`the sender from sending more data when the
`buffer is full. For UDP, which has no flow control,
`any message that arrives when the buffer is full is
`simply dropped.
`
`Protocol Input Queue (IP Queue)
`
`The protocol input queue holds IP datagrams
`delivered by the network interface that are waiting
`for protocol layer input processing. This queue
`normally will not build up unless there are burst
`packets arriving at the network adaptor. IP input
`routine is scheduled as an asynchronous software
`interrupt in the kernel. This software interrupt is
`posted by the receiving interrupt handler. It has a
`lower priority than the device hardware interrupt.
`IP input routine is usually scheduled immediately
`to process incoming IP datagrams after the receiv-
`ing hardware interrupt returns.
`
`Interface Send Queue
`
`Outgoing packets that wait to be transmitted by
`the network adaptor are placed in the interface
`send queue. The queueing delay depends on the
`
`1Although UDP message is never copied into the
`socket send buffer, the buffer size restricts the maxi-
`mum size of UDP messages can be sent.
`
`Medium Access Control (MAC) protocol and the
`bandwidth of the transmission media.
`
`Developing analytic models to estimate delay
`accumulated in the communication subsystems is
`a challenging task. Several factors make it
`extremely complicated. First, incoming packet
`processing is divided into two stages in the kernel
`and scheduled as asynchronous activities with two
`different priorities. This applies to both TCP and
`UDP. Second, the dynamics of transport layer pro-
`cessing, such as TCP, is sensitively influenced by
`its flow-control algorithms. This turns out to be an
`end-to-end issue that we have to also consider
`how quickly the remote peer can accept the packet
`and the end-to-end network latency.
`We cannot currently develop analytic models
`about the delay accumulated in communication
`subsystems because further study is required to
`capture the end-to-end dynamics in TCP. For now,
`we develop a mean value model of the overhead
`delay to be used as the service demands for queue-
`ing models of delay in the future.
`
`3 Analytic Model for Overall
`Communication Overhead
`
`In Section 2, we introduced the different catego-
`ries of overhead generated in communication sub-
`systems. In this section, we use them to develop
`analytic models of the overall overhead for send-
`ing and receiving a message. Since TCP has a
`much richer transport functionality than UDP
`does, it is impractical to use a single model to
`describe both of them. Four overhead models,
`and
`TCPsend(m),
`TCPrecv(m), UDPsend(m),
`UDPrecv(m), are built, with m denoting the size of
`the message to be sent.
`
`3.1 Processing Overhead for
`Sending and Receiving a Packet
`in the Bottom Layer
`
`We analyze the bottom layer first because both
`TCP and UDP employ the same processing steps
`in this layer. The bottom layer corresponds to the
`link layer or the MAC sublayer in the Open Sys-
`tem Interconnection (OSI) reference model.
`
`5
`
`INTEL EX.1248.005
`
`

`

`Table 2. Breakdown of Sending Overhead in Bottom Layers
`Calling Sequence
`Processing Overhead
`ether_output ()
`Link-layer encapsulation
`enoutput ()
`enstart ()
`en_senddone ()
`
`Network Interface
`Layer
`
`Network Adaptor
`
`Data copy from kernel space to I/O space (by PIO)
`Packet transmission
`
`Table 3. Breakdown of Receiving Overhead in Bottom Layers
`Calling Sequence
`Processing Overhead
`ether_input ()
`Link-layer decapsulation
`en_recv ()
`en_srecv () or en_lrecv ()
`
`Network Interface
`Layer
`
`Network Adaptor
`
`Packet reception and data copy from I/O space to kernel
`space (by DMA)
`
`Within the communication subsystems, the bot-
`tom layer is the network interface layer.
`For sending a packet, control enters the bottom
`layer when
`IP makes an output
`request,
`ether_output(), to the interface chosen by the
`routing algorithm. The network interface layer
`encapsulates the datagram in its link-layer format
`and places the outgoing packet in the interface
`send queue. If the network adaptor is already
`active, the control returns directly; otherwise, it
`returns after starting the network adaptor for trans-
`mission by calling enstart(). After the transmis-
`sion is finished, the network adaptor generates a
`hardware interrupt for transmission completion.
`The interrupt-handling routine, en_senddone(),
`removes the next packet from the interface send
`queue, copies it to the adaptor buffer, and restarts
`the adaptor.
`The processing overhead, hence, is equal to:
`
`
`
`ETHout Mka m(+
`
`)
`
`+
`
`I s Tx m(
`+
`
`)
`
`This formula includes overhead for link-layer
`encapsulation, ETHout, overhead for data move-
`ment from kernel to adaptor, Mka(m), interrupt
`service for transmission complete, Is, and packet
`transmission time, Tx(m). The data movement is
`done by the CPU in enstart(). It takes place before
`the control returns if the network adaptor is not
`active; otherwise, it occurs later during the inter-
`rupt service interval if the network adaptor is
`busy. The sending overhead breakdown in the bot-
`tom layer is shown in Table 2.
`
`Upon receiving a packet, the network adaptor
`first DMAs the packet from adaptor to kernel.
`When the DAM is complete, a receiving hardware
`interrupt (SPLINET) is generated by the network
`adaptor. For receiving a packet, control starts from
`interrupt service routine, en_srecv() or
`the
`en_lrecv(). The first step is link-layer decapsula-
`tion. Next, the packet is placed in the protocol
`input queue, and a software interrupt (SPLIMP) is
`posted to initiate higher-layer protocol processing
`later. Before the interrupt returns, it restarts the
`adaptor to receive the next packet.
`The processing overhead, hence, is equal to:
`Rx m(
`) Mak m(
`+
`
`)
`
`+
`
`I r
`
`This formula includes packet reception time,
`Rx(m), data movement from adaptor to kernel,
`Mak(m), and interrupt service time, Ir. Since the
`entire
`link-layer processing
`is accomplished
`within the interrupt service interval, the interrupt
`service time includes the link-layer decapsulation
`overhead. The receiving overhead breakdown is
`shown in Table 3.
`
`3.2 Processing Overhead for
`Sending a UDP Message
`
`The UDP sending path is a sequence of kernel
`subroutine calls traversing down the software lay-
`ers. The overhead breakdown and calling
`sequence in upper software layers are shown in
`Table 4.
`
`6
`
`INTEL EX.1248.006
`
`

`

`Table 4. Overhead Breakdown in UDP Sending Path
`Calling Sequence
`Processing Overhead
`sendmsg ()
`sosend ()
`udp_output ()
`ip_output ()
`
`Socket Layer
`
`Protocol Layer
`
`Data copy from user space to kernel space
`UDP checksum
`IP header checksum and fragmentation of large UDP datagram
`
`Table 5. Overhead Breakdown in UDP Receiving Path
`Calling Sequence
`Processing Overhead
`recvmsg ()
`soreceive ()
`udp_input ()
`ipintr ()
`
`Socket Layer
`
`Protocol Layer
`
`Data copy from kernel space to user space
`UDP checksum
`IP header checksum and reassembly of IP fragments
`
`3.3 Processing Overhead for
`Receiving a UDP Message
`
`For the UDP receiving path, a control sequence
`consists of hardware interrupt, software interrupt,
`and “upcall” to kernel subroutines that traverse up
`the software layers. The overhead breakdown and
`calling sequence in the upper software layers are
`shown in Table 5.
`Control enters the kernel from the receiving
`hardware interrupt (SPLINT). Link-layer decap-
`sulation is done in the interrupt service. Control
`enters the protocol layer through the software
`interrupt (SPLIMP) with the IP input routine,
`ipintr(), as its interrupt handler. For large UDP
`messages, the reassembly of IP fragments is
`accomplished in this routine. As a result, the total
`overhead below the UDP layer is the product of
`the fragmentation factor and the receiving over-
`head for an IP datagram of size MTU. The UDP
`input routine, udp_input(), contributes a fixed
`protocol-specific overhead and overhead for
`checksum computation. After that, the socket
`receiving-routine, soreceive(), “wakes up” the
`receiving process blocked by system call,
`recvmsg(), and this introduces a context switch
`overhead. The only significant overhead in the
`socket layer is the data movement overhead from
`kernel space to user space.
`From the above analysis, we derive the total
`overhead for receiving a UDP message of size m:
`
`Control enters the kernel from system call
`sendmsg() in the socket interface. The only signif-
`icant overhead in the socket layer is data move-
`ment from user space to kernel space. The UDP
`output routine, udp_output(), contributes a fixed
`protocol-specific overhead and the overhead for
`checksum computation. Processing overhead in
`the network layer can be complicated if fragmen-
`tation of a large IP datagram to fit into path Maxi-
`mum Transmission Unit (MTU) is required. As a
`result, the total overhead below the UDP layer is
`the product of
`the
`fragmentation
`factor
`m 8+
` and the sending overhead for
`=
`f
`-------------------------
`MTU 20–
`an IP datagram of size MTU.
`From the above analysis, we derive the total
`overhead for sending a UDP message of size m:
`) UDPout CS m 20+(
`+
`+
`
`
`IPout ET Hout Mka MTU 14++ (+
`
`}
`
`)
`
`)
`
`I s
`
`) Muk m(
`UDPsend m(
`=
`m 8+
`-------------------------
`MTU 20–
`(
`Tx MTU 14+
`
`+{
`
`)
`
`+
`
`+
`
`This formula includes context switch overhead, C,
`overhead for data movement from user space to
`kernel space, Muk(m), UDP protocol-specific
`overhead, UDPout, overhead for checksum com-
`putation, CS(m+20), and the total overhead below
`the UDP layer. Applying the fragmentation factor
`m 8+
`, we get:
`=
`f
`-------------------------
`MTU 20–
`
`34 f
`) Mka m 8+ +(
`
`) Mmm m(
`UDPsend m(
`
`+
`) Tx m 8( + +
`CS m(
`
`
`) UDPout
`34 f
`+
`+
`+
`(
`)
`+
`+
`+
`f
`IPout ET Hout
`I s
`
`)
`
`7
`
`INTEL EX.1248.007
`


`@
`

`

`m 8+
`UDPrecv m(
`-------------------------
`MTU 20–
`(
`) Mak MTU 14+
`{
`Rx MTU 14+
`(
`+
`UDPin CS m 20+(
`) Mku m(
`+
`
`) C+
`+
`
`)
`
`=
`
`+
`
`)
`
`+
`
`I r
`
`+
`
`IPin
`
`}
`
`Before we derive the total overhead for sending
`a TCP message, we first consider the cost for
`sending a TCP segment and receiving an ACK.
`Following the similar analysis in UDP, the over-
`head for sending a TCP segment with size MSS is:
`
`)
`(
`32+
`+
`=
`+
`IPout
`TCPout CS MSS
`SEGsend
`) Tx MTU 14+(
`(
`
`ET Hout Mka MTU 14+
`+
`+
`+
`
`)
`
`+
`
`I s
`
`The checksum computation includes the TCP
`header (20 bytes) and the IP pseudo header (12
`bytes). Similarly, the overhead for receiving an
`ACK is:
`
`ACK recv
`CS 32(
`+
`
`)
`
`=
`
`Rx 54(
`
`) Mak 54(
`+
`
`)
`
`+
`
`I r
`
`+
`
`+
`IPin TCPin
`
`where the ACK packet is 54 bytes long (20-byte
`TCP header, 20-byte IP header, and 14-byte Eth-
`ernet header).
`To simplify our analysis, we assume there is no
`packet loss (no retransmission) and there is always
`an acknowledgment for each segment sent (no
`ACK compression). Hence, the total overhead for
`sending a TCP message of size m is equal to:
`
`TCPsend m(
`m
`-----------
`MSS
`
`+
`
`)
`
`) Muk m(
`=
`{
`
`+
`SEGsend ACK recv
`
`}
`
`If we let the segmentation factor
`get:
`
`g
`
`=
`
`m
`-----------
`MSS
`
`, we
`
`This formula includes the total overhead below
`the UDP layer, UDP protocol-specific overhead,
`UDPin, overhead for checksum computation,
`CS(m+20), overhead for data movement from ker-
`nel space to user space, Mku(m), and context
`switch overhead, C. Applying the same fragmen-
`tation factor f, we get:
`
`
`) C Mmm m(
`UDPrecv m(
`
`) Mak m 8+ +(
`+
`+
`) Tx m 8( + +
`CS m(
`
`
`) UDPin
`34 f
`+
`+
`+
`(
`)
`I r+
`+
`f
`IPin
`
`34 f
`
`)
`
`3.4 Processing Overhead for
`Sending a TCP Message
`
`The TCP sending path is a sequence of kernel sub-
`routine calls that traverses down the software lay-
`ers. The
`calling
`sequence
`and overhead
`breakdown in the upper software layers are shown
`in Table 6. The overhead breakdown in the proto-
`col layer is different from the UDP in two ways.
`First, the breakdown of a large message to fit into
`path MTU is done in TCP instead of IP. Second,
`there is also overhead for receiving TCP acknowl-
`edgment (ACK) associated with the message sent.
`
`Table 6. Overhead Breakdown in TCP Sending Path
`Calling Sequence
`Processing Overhead
`write ()
`sosend ()
`tcp_output ()
`ip_output ()
`
`Socket Layer
`
`Protocol Layer
`
`Data copy from user space to kernel space
`TCP checksum (message sent in unit no larger than MSS)
`IP header checksum
`
`Table 7. Overhead Breakdown in TCP Receiving Path
`Calling Sequence
`Processing Overhead
`read ()
`soreceive ()
`tcp_input ()
`ipintr ()
`
`Socket Layer
`
`Protocol Layer
`
`Data copy from kernel space to user space
`TCP checksum
`IP header checksum
`
`8
`
`INTEL EX.1248.008
`


`@

`

`

`) Mmm m(
`TCPsend m(
`)
`) Mka m 54g+(
`
`+
`=
`)
`
`) CS m 64g+(
`
`) Tx m 108g+(
`Mak 54g(
`
`+
`+
`+
`(
`+
`+
`+
`+
`+
`TCPin TCPout
`IPin
`IPout ET Hout
`g
`)
`+
`+
`I r
`I s
`
`3.5 Processing Overhead for
`Receiving a TCP Message
`
`For receiving a TCP message, the overhead break-
`down and calling sequence in the upper software
`layers are shown in Table 7. Following the similar
`approach in the TCP sending path, we derive the
`total overhead for receiving a TCP message of
`size m:
`
`TCPrecv m(
`m
`-----------
`MSS
`
`+
`
`)
`
`=
`
`C Mku m(
`+
`
`)
`
`{
`
`+
`SEGrecv ACK send
`
`}
`
`Applying the same segmentation factor, we get:
`
`)
`TCPrecv m(
` +(C Mmm m( ) Mak m 54g
`
`
`+
`+
`=
`)
`
`) CS m 64g+(
`
`) Tx m 108g+(
`Mka 54g(
`
`+
`+
`+
`(
`+
`+
`+
`+
`+
`TCPin TCPout
`IPin
`IPout ET Hout
`g
`)
`+
`+
`I r
`I s
`
`)
`
`3.6 Scheduling Issue in
`Communication Subsystems
`
`One important characteristic of communication
`subsystems is that packet reception receives a
`higher priority than does packet transmission. This
`imbalance is because incoming packet processing
`is interrupt-driven, which has a higher priority
`than do other kernel activities. There is a similar
`situation in input processing: the MAC layer has a
`higher priority than the protocol layer. For server
`machines under a heavy network load, outgoing
`packets probably suffer in throughput from this
`imbalance in scheduling of packet processing.
`Another important characteristic of communi-
`cation subsystems is that the OS has no way to
`control the load offered to it because it has no con-
`trol over the number of clients, or over their
`aggressiveness [10]. Although flow control can
`restrict the data that arrives over an existing con-
`nection, it cannot control the rate of requests for
`new connections.
`
`For an overloaded server, these two characteris-
`tics cause the OS to spend more time on incoming
`packet processing than on the rest of request ser-
`vices. The direct consequences for application
`performance are throughput drop and increased
`response time. The overloaded behavior of com-
`munication subsystems, therefore, deserves more
`investigation to achieve better application perfor-
`mance.
`
`4 Case Studies
`
`In this section, we present two case studies, the
`Internet web server and the DCE/RPC server, to
`analyze the communication overhead with the
`analytic models. Both applications require inten-
`sive network communication to handle the enor-
`mous request and reply messages. As a result, a
`careful analysis of the communication overhead
`can help us investigate the performance problems
`caused by heavy network load.
`
`4.1 The Internet Web Server
`
`The World Wide Web (WWW) provides a quick
`and easy access to retrieve a large variety of infor-
`mation across the Internet. For busy Internet web
`servers, the general performance problems experi-
`enced are long response time and short-term ser-
`vice unavailability. A common solution for these
`problems is to off-load the enormous requests
`from a single machine to replicated servers [11].
`Researches also identified the inefficiency in the
`HyperText Transfer Protocol (HTTP)1
`itself,
`which uses separate TCP connections for each
`request. An enhanced HTTP, which uses a single
`TCP connection for all data exchange, reduces the
`response time caused by the round-trip network
`latency [12]. In this subsection, we apply the ana-
`lytic models to examine the communication over-
`head caused by HTTP requests.
`HTTP relies on TCP to provide reliable mes-
`sage delivery. The protocol itself is quite simple: a
`TCP connection is established for retrieving a
`remote document and torn down after the docu-
`
`1HTTP is an application layer protocol for web cli-
`ents and web servers to exchange data.
`
`9
`
`INTEL EX.1248.009
`



`

`

`At the server machine, the communication over-
`head caused by this HTTP transaction is approxi-
`mately:
`
`+
`+
`=
`ACK send ACK recv
`SY N recv
`HTT Preq
`) TCPsend 1852(
`)
`TCPrecv 1146(
`
`
`+
`+
`+
`+
`+
`+
`FI N send ACK recv FIN recv ACK send
`
`By applying the analytic models (from Section 3.4
`and 3.5) for sending and receiving a TCP mes-
`sage, we get:
`
`+
`+
`=
`ACK send ACK recv
`SY N recv
`HTT Preq
`)
`(
`+
`)
`3
`Muk 1146(
`
`+
`+
`SEGrecv ACK send
`(
`C Mku 1853(
`
`)
`4
`+
`+
`+
`+
`SEGsend ACK recv
`+
`+
`+
`+
`FI N send ACK recv FIN recv ACK send
`
`)
`
`Our models overestimate the number of ACKs
`sent and received because of the delayed acknowl-
`edgment effect (the ACK compression) that is
`present in most TCP implementations. We esti-
`mate that it requires 15 hardware interrupts (8 Ir
`and 7 Is) to service this HTTP transaction. For a
`busy web server with a peak request rate of about
`60 requests per second [11], it will generate
`approximately 900
`interrupts per
`second.
`Researches have shown the cost of interrupt ser-
`vice is not trivial in modern computer systems.
`Therefore, the avoidance of the congestive col-
`lapse or “livelock”1 is an important issue in com-
`munication subsystem design [10,13].
`
`4.2 The DCE/RPC
`
`The Open Software Foundation’s Distributed
`Computing Environment (OSF/DCE) is a plat-
`form that facilitates interoperability of distributed
`applications in a heterogeneous environment.
`DCE relies on the remote procedure call (RPC) as
`its communication paradigm to construct distrib-
`uted applications based on client/server architec-
`ture. Detailed analysis of the DCE/RPC has shown
`the significant communication overhead for trans-
`porting request and reply messages between client
`and server machines [7]. In this subsection, we
`apply the analytic models to analyze the commu-
`nication overhead of a null RPC.
`
`1A livelocked server spends most of its resources on
`non-productive operations, such as rejecting new con-
`nections or aborting partially-completed ones [10].
`
`ment is received. We can divide an HTTP transac-
`tion into five steps:
`
` 1. The client establishes a TCP connection to
`the server,
` 2. The client sends a request message,
` 3. The server retrieves the document according
`to the request,
` 4. The server sends the document in a reply
`message, and
` 5. The server tears down the TCP connection.
`
`A schematic diagram of an HTTP transaction
`from live data obtained by the tcpdump is shown
`in Figure 2. The first three packets are for TCP
`connection establishment. The request message is
`sent through three TCP segments and the replay
`message is sent through four TCP segments. The
`last four packets are for connection teardown. The
`rest of the packets are acknowledgment.
`
`Web Server
`
`Web Client
`
`request
`packets
`
`connection
`teardown
`
`S Y N
`
`SYN+ACK
`
`A C K
`1 : 5 3 7 ( 5 3 6 )
`
`ACK(537)
`
`5 3 7 : 1 0 7 3 ( 5 3 6 )
`1 0 7 3 : 1 1 4 7 ( 7 4 )
`
`1:513(512)+ACK(1147)
`513:1025(512)
`
`A C K ( 1 0 2 5 )
`
`1025:1537(512)
`1537:1853(316)+FIN
`
`A C K ( 1 8 5 4 )
`F I N
`
`ACK
`
`connection
`establishment
`
`reply
`packets
`
`Time
`
`Figure 2. A Schematic HTTP Transaction
`
`10
`
`INTEL EX.1248.010
`


`

`

`The software layering of the DCE/RPC is shown
`in Figure 3. It is built above UDP/IP. Since there
`is no flow control or reliable message delivery in
`UDP, an extra RPC layer is needed to supplement
`the transport layer functions required. The proto-
`col software is implemented as a dynamically-
`linked library, the RPC runtime. The flow-control
`mechanism in the RPC layer is a combined win-
`dow scheme similar to TCP. The initial window
`size is set to 4 KB, and then is doubled after it
`receives each acknowledgment, up to a maximum
`of 32 KB.
`
`client app.
`
`client stub
`
`server app.
`
`server stub
`
`RPC runtime
`
`RPC layer
`
`RPC runtime
`
`UDP
`
`IP
`
`UDP
`
`IP
`
`RPC request path
`
`RPC reply path
`
`Figure 3. Software Layering of the DCE/RPC
`
`The major overhead intro

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket