throbber
The following paper was originally published in the
`Proceedings of the USENIX 1996 Annual Technical Conference
`San Diego, California, January 1996
`
`Eliminating Receive Livelock in an
`Interrupt-driven Kernel
`
`Jeffrey Mogul, DEC Western Research Laboratory
`K. K. Ramakrishnan, AT&T Bell Laboratories
`
`For more information about USENIX Association contact:
`1. Phone:
`510 528-8649
`2. FAX:
`510 548-5738
`3. Email:
`office@usenix.org
`4. WWW URL: http://www.usenix.org
`
`Petitioner Apple Inc. - Exhibit 1009, p. 1
`
`

`
`Eliminating Receive Livelock in an Interrupt-driven Kernel
`
`Jeffrey C. Mogul
`Digital Equipment Corporation Western Research Laboratory
`K. K. Ramakrishnan
`AT&T Bell Laboratories
`
`Abstract
`
`Most operating systems use interface interrupts to
`schedule network tasks. Interrupt-driven systems can
`provide low overhead and good latency at low of-
`fered load, but degrade significantly at higher arrival
`rates unless care
`is
`taken
`to prevent several
`pathologies. These are various forms of receive
`livelock, in which the system spends all its time
`processing interrupts, to the exclusion of other neces-
`sary tasks. Under extreme conditions, no packets are
`delivered to the user application or the output of the
`system.
`To avoid livelock and related problems, an operat-
`ing system must schedule network interrupt handling
`as carefully as it schedules process execution. We
`modified an interrupt-driven networking implemen-
`tation to do so; this eliminates receive livelock with-
`out degrading other aspects of system performance.
`We present measurements demonstrating the success
`of our approach.
`
`1. Introduction
`Most operating systems use interrupts to inter-
`nally schedule the performance of tasks related to I/O
`events, and particularly the invocation of network
`protocol software. Interrupts are useful because they
`allow the CPU to spend most of its time doing useful
`processing, yet respond quickly to events without
`constantly having to poll for event arrivals.
`Polling is expensive, especially when I/O events
`are relatively rare, as is the case with disks, which
`seldom interrupt more than a few hundred times per
`second. Polling can also increase the latency of
`response to an event. Modern systems can respond to
`an interrupt in a few tens of microseconds; to achieve
`the same latency using polling, the system would
`have to poll tens of thousands of times per second,
`which would create excessive overhead. For a
`general-purpose system, an interrupt-driven design
`works best.
`Most extant operating systems were designed to
`handle I/O devices that interrupt every few mil-
`liseconds. Disks tended to issue events on the order
`
`of once per revolution; first-generation LAN environ-
`ments tend to generate a few hundred packets per
`second for any single end-system. Although people
`understood the need to reduce the cost of taking an
`interrupt, in general this cost was low enough that
`any normal system would spend only a fraction of its
`CPU time handling interrupts.
`The world has changed. Operating systems typi-
`cally use the same interrupt mechanisms to control
`both network processing and traditional I/O devices,
`yet many new applications can generate packets
`several orders of magnitude more often than a disk
`can generate seeks. Multimedia and other real-time
`applications will become widespread. Client-server
`applications, such as NFS, running on fast clients and
`servers can generate heavy RPC loads. Multicast and
`broadcast protocols subject innocent-bystander hosts
`to loads that do not interest them at all. As a result,
`network implementations must now deal with sig-
`nificantly higher event rates.
`Many multi-media and client-server applications
`share another unpleasant property: unlike traditional
`network applications (Telnet, FTP, electronic mail),
`they are not flow-controlled. Some multi-media ap-
`plications want constant-rate, low-latency service;
`RPC-based client-server applications often use
`datagram-style transports, instead of reliable, flow-
`controlled protocols. Note that whereas I/O devices
`such as disks generate interrupts only as a result of
`requests from the operating system, and so are in-
`herently flow-controlled, network interfaces generate
`unsolicited receive interrupts.
`The shift to higher event rates and non-flow-
`controlled protocols can subject a host to congestive
`collapse: once the event rate saturates the system,
`without a negative feedback loop to control the
`sources, there is no way to gracefully shed load. If
`the host runs at full throughput under these con-
`ditions, and gives fair service to all sources, this at
`least preserves the possibility of stability. But if
`throughput decreases as the offered load increases,
`the overall system becomes unstable.
`Interrupt-driven systems tend to perform badly
`under overload. Tasks performed at interrupt level,
`
`Petitioner Apple Inc. - Exhibit 1009, p. 2
`
`

`
`by definition, have absolute priority over all other
`tasks. If the event rate is high enough to cause the
`system to spend all of its time responding to inter-
`rupts, then nothing else will happen, and the system
`throughput will drop to zero. We call this condition
`receive livelock: the system is not deadlocked, but it
`makes no progress on any of its tasks.
`Any purely interrupt-driven system using fixed in-
`terrupt priorities will suffer from receive livelock un-
`der input overload conditions. Once the input rate
`exceeds the reciprocal of the CPU cost of processing
`one input event, any task scheduled at a lower
`priority will not get a chance to run.
`Yet we do not want to lightly discard the obvious
`benefits of an interrupt-driven design. Instead, we
`should integrate control of the network interrupt han-
`dling sub-system into the operating system’s schedul-
`ing mechanisms and policies. In this paper, we
`present a number of simple modifications to the
`purely interrupt-driven model, and show that they
`guarantee throughput and improve latency under
`overload, while preserving the desirable qualities of
`an interrupt-driven system under light load.
`
`2. Motivating applications
`We were led to our investigations by a number of
`specific applications that can suffer from livelock.
`Such applications could be built on dedicated single-
`purpose systems, but are often built using a general-
`purpose system such as UNIX, and we wanted to
`find a general solution to the livelock problem. The
`applications include:
`• Host-based routing: Although inter-network
`routing is traditionally done using special-
`purpose (usually non-interrupt-driven) router
`systems, routing is often done using more con-
`ventional hosts.
` Virtually
`all
`Internet
`‘‘firewall’’ products use UNIX or Windows
`NT systems for routing [7, 13]. Much ex-
`perimentation with new routing algorithms is
`done on UNIX [2], especially for IP multicast-
`ing.
`• Passive network monitoring: network managers,
`developers, and researchers commonly use
`UNIX systems, with their network interfaces in
`‘‘promiscuous mode,’’ to monitor traffic on a
`LAN for debugging or statistics gathering [8].
`• Network file service: servers for protocols such
`as NFS are commonly built from UNIX sys-
`tems.
`These applications (and others like them, such as
`Web servers) are all potentially exposed to heavy,
`non-flow-controlled loads. We have encountered
`livelock in all three of these applications, have solved
`or mitigated the problem, and have shipped the solu-
`
`tions to customers. The rest of this paper con-
`centrates on host-based routing, since this simplifies
`the context of the problem and allows easy perfor-
`mance measurement.
`
`3. Requirements for scheduling network tasks
`Performance problems generally arise when a sys-
`tem is subjected to transient or long-term input over-
`load.
`Ideally, the communication subsystem could
`handle the worst-case input load without saturating,
`but cost considerations often prevent us from build-
`ing such powerful systems. Systems are usually
`sized to support a specified design-center load, and
`under overload the best we can ask for is controlled
`and graceful degradation.
`When an end-system is involved in processing
`considerable network traffic, its performance depends
`critically on how its tasks are scheduled. The
`mechanisms and policies
`that schedule packet
`processing and other tasks should guarantee accept-
`able system throughput, reasonable latency and jitter
`(variance in delay), fair allocation of resources, and
`overall system stability, without imposing excessive
`overheads, especially when the system is overloaded.
`We can define throughput as the rate at which the
`system delivers packets to their ultimate consumers.
`A consumer could be an application running on the
`receiving host, or the host could be acting as a router
`and forwarding packets to consumers on other hosts.
`We expect the throughput of a well-designed system
`to keep up with the offered load up to a point called
`the Maximum Loss Free Receive Rate (MLFRR), and
`at higher loads throughput should not drop below this
`rate.
`Of course, useful throughput depends not just on
`successful reception of packets; the system must also
`transmit packets. Because packet reception and
`packet transmission often compete for the same
`resources, under
`input overload conditions
`the
`scheduling subsystem must ensure that packet trans-
`mission continues at an adequate rate.
`Many applications, such as distributed systems
`and interactive multimedia, often depend more on
`low-latency, low-jitter communications than on high
`throughput. Even during overload, we want to avoid
`long queues, which increases latency, and bursty
`scheduling, which increases jitter.
`When a host is overloaded with incoming network
`packets, it must also continue to process other tasks,
`so as to keep the system responsive to management
`and control requests, and to allow applications to
`make use of the arriving packets. The scheduling
`subsystem must fairly allocate CPU resources among
`packet
`reception, packet
`transmission, protocol
`
`Petitioner Apple Inc. - Exhibit 1009, p. 3
`
`

`
`processing, other I/O processing, system housekeep-
`ing, and application processing.
`A host that behaves badly when overloaded can
`also harm other systems on the network. Livelock in
`a router, for example, may cause the loss of control
`messages, or delay their processing. This can lead
`other routers to incorrectly infer link failure, causing
`incorrect routing information to propagate over the
`entire wide-area network. Worse, loss or delay of
`control messages can lead to network instability, by
`causing positive feedback in the generation of control
`traffic [10].
`
`4. Interrupt-driven scheduling and its
`consequences
`Scheduling policies and mechanisms significantly
`affect the throughput and latency of a system under
`overload.
`In an interrupt-driven operating system,
`the interrupt subsystem must be viewed as a com-
`ponent of the scheduling system, since it has a major
`role in determining what code runs when. We have
`observed that interrupt-driven systems have trouble
`meeting the requirements discussed in section 3.
`In this section, we first describe the characteristics
`of an interrupt-driven system, and then identify three
`kinds of problems causes by network input overload
`in interrupt-driven systems:
`• Receive livelocks under overload: delivered
`throughput drops to zero while the input over-
`load persists.
`• Increased latency for packet delivery or for-
`warding: the system delays the delivery of one
`packet while it processes the interrupts for sub-
`sequent packets, possibly of a burst.
`• Starvation of packet transmission: even if the
`CPU keeps up with the input load, strict priority
`assignments may prevent it from transmitting
`any packets.
`
`4.1. Description of an interrupt-driven system
`An interrupt-driven system performs badly under
`network input overload because of the way in which
`it prioritizes the tasks executed as the result of net-
`work input. We begin by describing a typical operat-
`ing system’s structure for processing and prioritizing
`network tasks. We use the 4.2BSD [5] model for our
`example, but we have observed that other operating
`systems, such as VMS, DOS, and Windows NT,
`and even several Ethernet chips, have similar charac-
`teristics and hence similar problems.
`When a packet arrives, the network interface sig-
`nals this event by interrupting the CPU. Device in-
`terrupts normally have a fixed Interrupt Priority
`Level (IPL), and preempt all tasks running at a lower
`
`IPL; interrupts do not preempt tasks running at the
`same IPL. The interrupt causes entry into the as-
`sociated network device driver, which does some in-
`itial processing of the packet. In 4.2BSD, only buffer
`management and data-link layer processing happens
`at ‘‘device IPL.’’ The device driver then places the
`packet on a queue, and generates a software interrupt
`to cause further processing of the packet. The
`software interrupt is taken at a lower IPL, and so this
`protocol processing can be preempted by subsequent
`interrupts. (We avoid lengthy periods at high IPL, to
`reduce latency for handling certain other events.)
`The queues between steps executed at different
`IPLs provide some insulation against packet losses
`due to transient overloads, but typically they have
`fixed length limits. When a packet should be queued
`but the queue is full, the system must drop the packet.
`The selection of proper queue limits, and thus the
`allocation of buffering among layers in the system, is
`critical to good performance, but beyond the scope of
`this paper.
`Note that the operating system’s scheduler does
`not participate in any of this activity, and in fact is
`entirely ignorant of it.
`As a consequence of this structure, a heavy load
`of incoming packets could generate a high rate of
`interrupts at device IPL. Dispatching an interrupt is a
`costly operation, so to avoid this overhead, the net-
`work device driver attempts to batch interrupts. That
`is, if packets arrive in a burst, the interrupt handler
`attempts to process as many packets as possible be-
`fore returning from the interrupt. This amortizes the
`cost of processing an interrupt over several packets.
`Even with batching, a system overloaded with in-
`put packets will spend most of its time in the code
`that runs at device IPL. That is, the design gives
`absolute priority to processing incoming packets. At
`the time that 4.2BSD was developed, in the early
`1980s, the rationale for this was that network adap-
`ters had little buffer memory, and so if the system
`failed to move a received packet promptly into main
`memory, a subsequent packet might be lost. (This is
`still a problem with low-cost interfaces.) Thus, sys-
`tems derived from 4.2BSD do minimal processing at
`device IPL, and give this processing priority over all
`other network tasks.
`Modern network adapters can receive many back-
`to-back packets without host intervention, either
`through the use of copious buffering or highly
`autonomous DMA engines. This insulates the system
`from the network, and eliminates much of the
`rationale for giving absolute priority to the first few
`steps of processing a received packet.
`
`Petitioner Apple Inc. - Exhibit 1009, p. 4
`
`

`
`4.2. Receive livelock
`In an interrupt-driven system, receiver interrupts
`take priority over all other activity. If packets arrive
`too fast, the system will spend all of its time process-
`ing receiver interrupts. It will therefore have no
`resources left to support delivery of the arriving
`packets to applications (or, in the case of a router, to
`forwarding and transmitting these packets). The use-
`ful throughput of the system will drop to zero.
`Following [11], we refer to this condition as
`receive livelock: a state of the system where no useful
`progress is being made, because some necessary
`resource
`is entirely consumed with processing
`receiver interrupts. When the input load drops suf-
`ficiently, the system leaves this state, and is again
`able to make forward progress. This is not a dead-
`lock state, from which the system would not recover
`even when the input rate drops to zero.
`A system could behave in one of three ways as the
`input load increases. In an ideal system, the
`delivered throughput always matches the offered
`load. In a realizable system, the delivered throughput
`keeps up with the offered load up to the Maximum
`Loss Free Receive Rate (MLFRR), and then is rela-
`tively constant after that. At loads above the
`MLFRR, the system is still making progress, but it is
`dropping some of the offered input; typically, packets
`are dropped at a queue between processing steps that
`occur at different priorities.
`In a system prone to receive livelock, however,
`throughput decreases with increasing offered load,
`for input rates above the MLFRR. Receive livelock
`occurs at the point where the throughput falls to zero.
`A livelocked system wastes all of the effort it puts
`into partially processing received packets, since they
`are all discarded.
`Receiver-interrupt batching complicates the situa-
`tion slightly. By improving system efficiency under
`heavy load, batching can increase the MLFRR.
`Batching can shift the livelock point but cannot, by
`itself, prevent livelock.
`In section 6.2, we present measurements showing
`how livelock occurs in a practical situation. Ad-
`ditional measurements, and a more detailed discus-
`sion of the problem, are given in [11].
`
`4.3. Receive latency under overload
`Although interrupt-driven designs are normally
`thought of as a way to reduce latency, they can ac-
`tually increase the latency of packet delivery. If a
`burst of packets arrives too rapidly, the system will
`do link-level processing of the entire burst before do-
`ing any higher-layer processing of the first packet,
`because link-level processing is done at a higher
`
`priority. As a result, the first packet of the burst is
`not delivered to the user until link-level processing
`has been completed for all the packets in the burst.
`The latency to deliver the first packet in a burst is
`increased almost by the time it takes to receive the
`entire burst. If the burst is made up of several inde-
`pendent NFS RPC requests, for example, this means
`that the server’s disk sits idle when it could be doing
`useful work.
`One of the authors has previously described ex-
`periments demonstrating this effect [12].
`
`4.4. Starvation of transmits under overload
`In most systems, the packet transmission process
`consists of selecting packets from an output queue,
`handing them to the interface, waiting until the inter-
`face has sent the packet, and then releasing the as-
`sociated buffer.
`Packet transmission is often done at a lower
`priority than packet reception. This policy is super-
`ficially sound, because it minimizes the probability of
`packet loss when a burst of arriving packets exceeds
`the available buffer space. Reasonable operation of
`higher level protocols and applications, however, re-
`quires that transmit processing makes sufficient
`progress.
`When the system is overloaded for long periods,
`use of a fixed lower priority for transmission leads to
`reduced throughput, or even complete cessation of
`packet transmission. Packets may be awaiting trans-
`mission, but the transmitting interface is idle. We
`call this transmit starvation.
`Transmit starvation may occur if the transmitter
`interrupts at a lower priority than the receiver; or if
`they interrupt at the same priority, but the receiver’s
`events are processed first by the driver; or if trans-
`mission completions are detected by polling, and the
`polling is done at a lower priority than receiver event
`processing.
`effect
`This
`previously [12].
`
`has
`
`also
`
`been
`
`described
`
`5. Avoiding livelock through better
`scheduling
`In this section, we discuss several techniques to
`avoid receive livelocks. The techniques we discuss
`in this section include mechanisms to control the rate
`of incoming interrupts, polling-based mechanisms to
`ensure fair allocation of resources, and techniques to
`avoid unnecessary preemption.
`
`Petitioner Apple Inc. - Exhibit 1009, p. 5
`
`

`
`5.1. Limiting the interrupt arrival rate
`We can avoid or defer receive livelock by limiting
`the rate at which interrupts are imposed on the sys-
`tem. The system checks to see if interrupt processing
`is taking more than its share of resources, and if so,
`disables interrupts temporarily.
`The system may infer impending livelock because
`it is discarding packets due to queue overflow, or
`because high-layer protocol processing or user code
`are making no progress, or by measuring the fraction
`of CPU cycles used for packet processing. Once the
`system has invested enough work in an incoming
`packet to the point where it is about to be queued, it
`makes more sense to process that packet to comple-
`tion than to drop it and rescue a subsequently-
`arriving packet from being dropped at the receiving
`interface, a cycle that could repeat ad infinitum.
`When the system is about to drop a received
`packet because an internal queue is full, this strongly
`suggests that it should disable input interrupts. The
`host can then make progress on the packets already
`queued for higher-level processing, which has the
`side-effect of freeing buffers to use for subsequent
`received packets. Meanwhile, if the receiving inter-
`face has sufficient buffering of its own, additional
`incoming packets may accumulate there for a while.
`We also need a trigger for re-enabling input inter-
`rupts, to prevent unnecessary packet loss. Interrupts
`may be re-enabled when internal buffer space be-
`comes available, or upon expiration of a timer.
`We may also want the system to guarantee some
`progress for user-level code. The system can observe
`that, over some interval, it has spent too much time
`processing packet input and output events, and tem-
`porarily disable interrupts to give higher protocol
`layers and user processes time to run. On a processor
`with a fine-grained clock register, the packet-input
`code can record the clock value on entry, subtract
`that from the clock value seen on exit, and keep a
`sum of the deltas. If this sum (or a running average)
`exceeds a specified fraction of the total elapsed time,
`the kernel disables input interrupts.
`(Digital’s
`GIGAswitch
`system
`uses
`a
`similar
`mechanism [15].)
`On a system without a fine-grained clock, one can
`crudely simulate this approach by sampling the CPU
`state on every clock interrupt (clock interrupts typi-
`cally preempt device interrupt processing). If the
`system finds itself in the midst of processing inter-
`rupts for a series of such samples, it can disable inter-
`rupts for a few clock ticks.
`
`5.2. Use of polling
`Limiting the interrupt rate prevents system satura-
`tion but might not guarantee progress; the system
`must also fairly allocate packet-handling resources
`between input and output processing, and between
`multiple interfaces. We can provide fairness by care-
`fully polling all sources of packet events, using a
`round-robin schedule.
`In a pure polling system, the scheduler would in-
`voke the device driver to ‘‘listen’’ for incoming
`packets and for transmit completion events. This
`would control the amount of device-level processing,
`and could also fairly allocate resources among event
`sources, thus avoiding livelock. Simply polling at
`fixed intervals, however, adds unacceptable latency
`to packet reception and transmission.
`Polling designs and interrupt-driven designs differ
`in their placement of policy decisions. When the
`behavior of tasks cannot be predicted, we rely on the
`scheduler and the interrupt system to dynamically al-
`locate CPU resources. When tasks can be expected
`to behave in a predictable manner, the tasks them-
`selves are better able to make the scheduling deci-
`sions, and polling depends on voluntary cooperation
`among the tasks.
`Since a purely interrupt-driven system leads to
`livelock, and a purely polling system adds unneces-
`sary latency, we employ a hybrid design, in which the
`system polls only when triggered by an interrupt, and
`interrupts happen only while polling is suspended.
`During low loads, packet arrivals are unpredictable
`and we use interrupts to avoid latency. During high
`loads, we know that packets are arriving at or near
`the system’s saturation rate, so we use polling to en-
`sure progress and fairness, and only re-enable inter-
`rupts when no more work is pending.
`
`5.3. Avoiding preemption
`As we showed in section 4.2, receive livelock oc-
`curs because interrupt processing preempts all other
`packet processing. We can solve this problem by
`making
`higher-level
`packet
`processing
`non-
`preemptable. We observe that this can be done fol-
`lowing one of two general approaches: do (almost)
`everything at high IPL, or do (almost) nothing at high
`IPL.
`Following the first approach, we can modify the
`4.2BSD design (see section 4.1) by eliminating the
`software interrupt, polling interfaces for events, and
`processing received packets to completion at device
`IPL. Because higher-level processing occurs at
`device IPL, it cannot be preempted by another packet
`arrival, and so we guarantee that livelock does not
`occur within the kernel’s protocol stack. We still
`
`Petitioner Apple Inc. - Exhibit 1009, p. 6
`
`

`
`need to use a rate-control mechanism to ensure
`progress by user-level applications.
`In a system following the second approach, the
`interrupt handler runs only long enough to set a ‘‘ser-
`vice needed’’ flag, and to schedule the polling thread
`if it is not already running. The polling thread runs at
`zero IPL, checking the flags to decide which devices
`need service. Only when the polling thread is done
`does it re-enable the device interrupt. The polling
`thread can be interrupted at most once by each
`device, and so it progresses at full speed without in-
`terference.
`Either approach eliminates the need to queue
`packets between the device driver and the higher-
`level protocol software, although if the protocol stack
`must block, the incoming packet must be queued at a
`later point. (For example, this would happen when
`the data is ready for delivery to a user process, or
`when an IP fragment is received and its companion
`fragments are not yet available.)
`
`5.4. Summary of techniques
`In summary, we avoid livelock by:
`• Using interrupts only to initiate polling.
`• Using round-robin polling to fairly allocate
`resources among event sources.
`• Temporarily disabling input when feedback
`from a full queue, or a limit on CPU usage,
`indicates that other important tasks are pending.
`• Dropping packets early, rather than late, to
`avoid wasted work. Once we decide to receive
`a packet, we try to process it to completion.
`We maintain high performance by
`• Re-enabling interrupts when no work is pend-
`ing, to avoid polling overhead and to keep
`latency low.
`• Letting the receiving interface buffer bursts, to
`avoid dropping packets.
`• Eliminating the IP input queue, and associated
`overhead.
`We observe, in passing, that inefficient code tends
`to exacerbate receive livelock, by lowering the
`MLFRR of the system and hence increasing the
`likelihood that livelock will occur. Aggressive op-
`timization, ‘‘fast-path’’ designs, and removal of un-
`necessary steps all help to postpone arrival of
`livelock.
`
`6. Livelock in BSD-based routers
`In this section, we consider the specific example
`of an IP packet router built using Digital UNIX
`(formerly DEC OSF/1). We chose this application
`because routing performance is easily measured.
`Also, since firewalls typically use UNIX-based
`
`routers, they must be livelock-proof in order to
`prevent denial-of-service attacks.
`Our goals were to (1) obtain the highest possible
`maximum throughput; (2) maintain high throughput
`even when overloaded; (3) allocate sufficient CPU
`cycles to user-mode tasks; (4) minimize latency; and
`(5) avoid degrading performance in other applica-
`tions.
`
`6.1. Measurement methodology
`Our test configuration consisted of a router-under-
`test connecting two otherwise unloaded Ethernets. A
`source host generated IP/UDP packets at a variety of
`rates, and sent them via the router to a destination
`address.
`(The destination host did not exist; we
`fooled the router by inserting a phantom entry into its
`ARP table.) We measured router performance by
`counting the number of packets successfully for-
`warded in a given period, yielding an average for-
`warding rate.
`router-under-test was a DECstation
`The
`3000/300 Alpha-based system running Digital UNIX
`V3.2, with a SPECint92 rating of 66.2. We chose the
`slowest available Alpha host, to make the livelock
`problem more evident. The source host was a
`DECstation 3000/400, with a SPECint92 rating of
`74.7. We slightly modified its kernel to allow more
`efficient generation of output packets, so that we
`could stress the router-under-test as much as possible.
`In all the trials reported on here, the packet gener-
`ator sent 10000 UDP packets carrying 4 bytes of
`data. This system does not generate a precisely
`paced stream of packets; the packet rates reported are
`averaged over several seconds, and the short-term
`rates varied somewhat from the mean. We calculated
`the delivered packet rate by using the ‘‘netstat’’
`program (on the router machine) to sample the output
`interface count (‘‘Opkts’’) before and after each trial.
`We checked, using a network analyzer on the stub
`Ethernet, that this count exactly reports the number of
`packets transmitted on the output interface.
`
`6.2. Measurements of an unmodified kernel
`We started by measuring the performance of the
`unmodified operating system, as shown in figure 6-1.
`Each mark represents one trial. The filled circles
`show kernel-based forwarding performance, and the
`open squares show performance using the screend
`program [7], used in some firewalls to screen out un-
`wanted packets. This user-mode program does one
`system call per packet; the packet-forwarding path
`includes both kernel and user-mode code. In this
`case, screend was configured to accept all packets.
`
`Petitioner Apple Inc. - Exhibit 1009, p. 7
`
`

`
`Without screend
`
`With screend
`
`2000
`
`8000
`6000
`4000
`Input packet rate (pkts/sec)
`Figure 6-1: Forwarding performance of unmodified kernel
`
`10000
`
`12000
`
`5000
`
`4000
`
`3000
`
`2000
`
`1000
`
`0
`
`0
`
`Output packet rate (pkts/sec)
`
`From these tests, it was clear that with screend
`running, the router suffered from poor overload be-
`havior at rates above 2000 packets/sec., and complete
`livelock set in at about 6000 packets/sec. Even with-
`out screend, the router peaked at 4700 packets/sec.,
`and would probably livelock somewhat below the
`maximum Ethernet packet rate of about 14,880
`packets/second.
`
`6.3. Why livelock occurs in the 4.2BSD model
`4.2BSD follows the model described in section
`4.1, and depicted in figure 6-2. The device driver
`runs at interrupt priority level (IPL) = SPLIMP, and
`the IP layer runs via a software interrupt at IPL =
`SPLNET, which is lower than SPLIMP. The queue
`between the driver and the IP code is named
`‘‘ipintrq,’’ and each output interface is buffered by a
`queue of its own. All queues have length limits;
`excess packets are dropped. Device drivers in this
`system implement interrupt batching, so at high input
`rates very few interrupts are actually taken.
`Digital UNIX follows a similar model, with the IP
`layer running as a separately scheduled thread at IPL
`= 0, instead of as a software interrupt handler.
`It is now quite obvious why the system suffers
`from receive livelock. Once the input rate exceeds
`the rate at which the device driver can pull new pack-
`ets out of the interface and add them to the IP input
`queue, the IP code never runs. Thus, it never
`removes packets from its queue (ipintrq), which fills
`up, and all subsequent received packets are dropped.
`The system’s CPU resources are saturated be-
`cause it discards each packet after a lot of CPU time
`has been invested in it at elevated IPL. This is
`foolish; once a packet has made its way through the
`device driver, it represents an investment and should
`be processed to completion if at all possible. In a
`router, this means that the packet should be trans-
`
`mitted on the output interface. When the system is
`overloaded, it should discard packets as early as pos-
`sible (i.e., in the receiving interface), so that dis-
`carded packets do not waste any resources.
`
`6.4. Fixing the livelock problem
`We solved the livelock problem by doing as much
`work as possible in a kernel thread, rather than in the
`interrupt handler, and by eliminating the IP input
`queue and its associated queue manipulations and
`1
`software interrupt (or thread dispatch) . Once we
`decide to take a packet from the receiving interface,
`we try not to discard it later on, since this would
`represent wasted effort.
`We also try to carefully ‘‘schedule’’ the work
`done in this thread. It is probably not possible to use
`the system’s real scheduler to control the handli

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket