throbber
UNlVEHSHY Of W"S1ilNGTON
`
`DEC ?O 1~':J6
`
`19PERATING
`..SYSTEMS
`REVIEW
`
`A Publication of the
`.Association for Computing Machinery
`Special Interest Group on Operating Systems
`
`Volume 30
`
`Special Issue
`
`1996
`
`Proceedings of the
`
`Second USENIX Symposium
`on
`Operating Systems Design and
`Implementation (OSDI)
`
`, I
`\(
`\
`)
`Seatt IE:,,-- ..-,,~ _r._~~__ ~_~.._.__~__.
`Octo~ .f
`\ r
`
`ENGINEERING LIBRARY
`DISPLAY PERIODICAL
`Non-circulating until:
`J/;.i'll 2 2 1997}
`
`_
`
`001
`
`

`
`OPERATING SYSTEMS REVIEW
`A Publication of the ACM Special Interest Group on Operating Systems
`
`Chairperson
`Carla Ellis
`Dept. of Computer Science
`D308 Levine Science Research Center
`Duke University B90129
`Durham NC 27708-0129 USA
`(919)660-6523
`(carla@cs.duke.edu)
`
`Secretary-Treasurer
`Paul J. Leach
`Microsoft
`1 Microsoft Way
`Redmond WA 98052, USA
`(206) 936-4765
`(paulle@microsoft.com)
`
`V. Chairperson
`Marc Shapiro
`INRIA Rocquencourt
`Domaine de Voluceau
`B.P.105
`78153 Le Chesnay Cedex France
`+33 (1) 39-63-53-25
`(Marc.Shapiro@inria.fr)
`
`.
`
`Newsletter Editor
`William M. Waite
`Dept. of Electrical and
`.Computer Engineering 425
`University of Colorado
`Boulder, CO 80309-0425 USA
`(303) 492-7204
`(waite@cs.colorado.edu)
`
`OPERATING SYSTEMS REVIEW is an informal publication of the ACM Special Interest Group on
`Operating Systems (SIGOPS), whose scope of interest
`includes: Computer operating systems and archi-
`tecture for multiprogramming, multiprocessing,
`and time sharing;" resource management; evaluation and
`simulation;
`reliability,
`integrity, and security of data; communications among computing processors; and
`computer system modeling analysis.
`Membership i~ SIGOPS (at $15 per annum) is open to ACM members and associate members, and stu-
`dent members (at $8 per annum); non-ACM members may join also (at $42 per annum). Non-members of
`SIGOPS may subscribe to OPERATING SYSTEMS REVIEW at $30 per year. All SIGOPS members
`receive OPERATING SYSTEMS REVIEW.
`SIGOPS membership application forms are available from ACM Headquarters (or see back cover of this
`issue), 1515 Broadway, 17th Floor, New York, New York 10036, telephone (212) 869-7440. Changes of .
`address and other matters pertaining to the SIGOPS mailing list should be directed to ACM Headquarters,
`not to any of the SIGOPS officers.
`Contributions
`to OSR are sent to the editor and should be single spaced and printed on one side of the
`paper only. Text and diagrams must be contained within a 6Y2inch wide by 8Y2inch high (165mm by
`216mm) area with at least one inch (25mm) top and side margins, 1Y2inch (38mm) bottom margin, and
`be camera-ready (Le., be an original having sufficient contrast and density for reproduction). All letters to
`the editor will be considered for publication unless accompanied by a request to the contrary. Except for
`editorial
`items, all sources of material appearing in OPERATING SYSTEMS REVIEW will be clearly
`identified.
`Items attributed to individuals will ordinarily be interpreted as personal rather than organiza-
`tional opinions. The technical material
`in this newsletter
`is not refereed. Deadlines
`for submitting
`material
`are the 15th of February, May, August, and November. The SIGOPS conference
`proceed-
`ings will be mailed as the December
`issue in odd numbered
`calendar
`years, and the ASPLOS
`conference proceedings will be mailed in even numbered
`calendar years.
`OPERATING SYSTEMS REVIEW (ISSN 0163-5980)
`is published five times a year (January, April, July,
`October and December) by the Association for Computing Machinery Inc., 1515 Broadway, New York, NY
`10036. Periodicals
`postage
`paid at New York, NY 10001,
`and at additional mailing
`offices.
`POSTMASTER: Send address changes to OPERATING SYSTEMS REVIEW, ACM, 1515 Broadway, New
`York, NY 10036.
`Subscriptions: Annual subscription cost of$12.53 is included in the member dues of$15.00 (for students
`cost is included in $8.00); the non-member annual subscription is $42.00.
`
`002
`
`

`
`The USENIX Association
`
`Proceedings of the .
`Second Symposium on Operating Systems
`Design and Implementation
`(OSDI '96)
`
`Co-sponsored by ACM SIGOPS and IEEE TCOS
`
`October 28-31, 1996
`Seattle Washington
`
`003
`
`

`
`For additional copies of these proceedings contact:
`~
`
`".
`
`USENIX Association
`2560 Ninth Street, Suite 215
`Berkeley, CA 94710 USA
`Phone: 510 528 8649
`FAX: 510548 5738
`Email: office@usenix.org
`URL: http.Itwww.usenix.org
`
`The price is $20 for members and $27 for nonmembers.
`Outside the U.S.A. and Canada, please add
`$11 per copy for postage (via air printed matter).
`
`Past OSDI Proceedings
`
`OSDI '94 (First)
`
`November, 1994
`
`Monterey, California
`
`$20127
`
`1996 © Copyright by The USENIX Association
`All Rights Reserved.
`
`ISBN 1-880446-82-0
`
`Printed in the United States of America on 50% recycled paper, 10-15% post consumer waste.
`
`This volume is published as a coIlective work. Rights to individual papers remain with the
`author or the author's employer. Permission is granted for the noncommercial
`reproduc-
`tion of the complete work for educational or research purposes. USENIX acknowledges
`all trademarks herein.
`
`ii
`
`OSOI'96
`
`004
`
`

`
`CONTENTS
`
`Message from the Program Chairs
`External Referees
`Author
`Index
`
`Thesday, October 29
`
`and Awards Presentation
`Opening Remarks
`Karin Petersen, Xerox PARC,' Willy Zwaenepoel, Rice University
`
`Invited Talk: Java OS: Back to the Future
`Jim Mitchell, Sun Fellow, Vice President of Technology and Architectures, JavaSojt
`
`Caching And Prefetching
`
`in JlO Systems
`
`Automatic Compiler-Inserted I/O Prefetching for Out-Of-Core Applications
`Todd C. Mowry, Angela K. Demke and Orran Krieger, University of Toronto
`
`•..............................
`
`v
`vii
`viii
`
`1
`
`3
`
`A Trace-Driven Comparison of Algorithms for Parallel Prefetching and Caching
`Tracy Kimbrel, University of Washington; Andrew Tomkins and R. Hugo Patterson, Carnegie Mellon University;
`Brian Bershad, University of Washington; Pei Cao, University of Wisconsin; Edward If. Felten, Princeton University;
`Garth Gibson, Carnegie Mellon University; Anna R. Karlin, University of Washington; and Kai Li, Princeton
`University
`
`19
`
`Efficient Cooperative Caching Using Hints
`Prasenjit Sarkar and John Hartman, University of Arizona
`
`Distributed
`
`Shared Memory
`
`Online Data-Race Detection via Choherency Guarantees
`Dejan Perkovic and Pete Keleher, University of Maryland
`
`,
`
`;
`
`,
`
`Lightweight Logging for Lazy Release Consistent Distributed Shared Memory
`Manuel Costa, Paulo Guedes, Manuel Sequeira, Nuno Neves and Miguel Castro, 1ST -NESC
`
`Performance Evaluation of 1\\'0 Home-Based Lazy Release Consistency Protocols for Shared
`VIrtual Memory Systems
`Yuanyuan Zhou, Liviu Jftode and Kai Li, Princeton University
`
`Wednesday, October 30
`
`Invited Talk: Active Networks
`David Tennenhouse, Telemedia, Networks and Systems Group, Laboratory for Computer Science, MIT
`
`Scheduling
`
`and Synchronization
`
`'"
`CPU Inheritance Scheduling
`Bryan Ford and Sai R. Susarla, University of Utah
`
`'"
`
`,
`
`OSDI '96
`
`35
`
`47
`
`59
`
`75
`
`89
`
`91
`
`iii
`
`005
`
`

`
`for Multimedia Operating Systems
`A Hierarchical CPU Scheduler
`Pawan Goyal, Xingang Guo and Harrick M. Yin, University of Texas, Austin
`The Synergy Between Non-blocking Synchronization and Operating System Structure
`Michael Greenwald and David Cheriton, Stanford University
`
`OS Abstractions
`Session Chair: Richard Draves, Microsoft Research
`
`Microkernels Meet Recursive Virtual Machines
`Bryan Ford, Mike Hibler, Jay Lepreau, Patrick Tullman, Godmar Back, Steven Clawson, University
`of Utah
`
`in the Scout Operating System
`Making Paths Explicit
`David Mosberger and Larry L. Peterson, University of Arizona
`
`:
`
`Performance Measurements
`
`Studies of Windows NT Performance Using Dynamic Execution Traces
`Sharon E. Perl and Richard L. Sites, DEC SRC
`
`.'
`Using Latency to Evaluate Interactive System Performance
`Yasuhiro Endo, Zheng Wang, 1. Bradley Chen and Margo 1. Seltzer, Harvard University
`
`Thursday, October 31
`
`Extensibility and Safety
`
`Dynamic Binding for an Extensible System
`Przemyslaw Pardyak and Brian Bershad, University of Washington
`
`, .
`Dealing With Disaster: Surviving Misbehaved Kernel Extensions
`Margo 1. Seltzer, Yasuhiro Endo, Christopher Small and Keith A. Smith, Harvard University
`
`Safe Kernel Extensions Without Run-Time Checking
`George C. Necula and Peter Lee, Carnegie Mellon University
`
`,
`
`Network Interfaces and Protocols
`
`An Implementation of the Hamlyn Sender-Managed Interface Architecture
`Greg Buzzard, David Jacobson, Milon Mackey, Scott Marovich and John Wilkes, Hewlett-Packard Laboratories
`.
`Lazy Receiver Processing (LRP): A Network Subsystem Architecture for Server Systems
`Peter Druschel and Gaurav Banga, Rice University
`
`.
`
`.
`
`.,'
`
`Effects of Buffering Semantics on I/O Performance
`Jose Carlos Brustoloni and Peter Steenkiste, Carnegie Mellon University
`
`107
`
`123
`
`137
`
`153
`
`169
`
`185
`
`201
`
`213
`
`229
`
`245
`
`261
`
`277
`
`iv
`
`OSDI '96
`
`006
`
`

`
`Message from the Program Chairs
`
`Jay Lepreau successfully bore his baby two years ago, and, as he said himself, now it was up to us to raise OSDI well.
`Our first step in doing so was to select a strong program committee for OSDI '96. The researchers on the committee,
`eight from academia and six from industry, did a tremendous amount of work in an incredibly timely manner, each
`reading at least 34 of the 110 papers that were submitted. Our sincere thanks to all of them!
`
`the paper submission process differed from the previous one in two ways: we requested
`For this second symposium,
`full paper submissions
`instead of abstracts to allow for more thoroughly presented submissions, and we extended the
`review period from six to twelve weeks to avoid the frenzy we endured last time.
`
`On the other hand, we succeeded in maintaining the large number of, often lengthy, reviews for each of the submis-
`sions. The review process again consisted of two rounds. During the first round, we received an average 4.7 reviews
`for each paper. The results of this first round were then used to select fifty-two papers for the next round.
`In this sec-
`ond round, we asked for additional program committee and external reviews. In the end, papers in the second round
`got an average of 8.9 reviews, all of which were sent to the whole program committee before our meeting. We would
`like to thank all of our external reviewers for making this task possible and as painless as it was!
`
`The program committee met for a full day at the Microsoft Bay Area Research Center in San Francisco. By then,
`most program committee members had digested a large number of the reviews, and each paper to be discussed had
`been read by at least five committee members. This lead to often lively, but foremost
`informed, discussions. The
`result: 19 high quality papers were selected for publication in these proceedings, out of 110 submissions. Two papers
`clearly stood out and have thereforebeen selected as the best papers of the conference. The best paper awards go to:
`
`Automatic Compiler-Inserted 110 Prefetching for Out-Of-Core Applications
`Todd C. Mowry, Angela K. Dernke, and Orran Krieger, University of Toronto
`
`Safe Kernel Extensions Without Run- Time Checking
`George C. Necula and Peter Lee, Carnegie Mellon University
`
`Three of the accepted papers are co-authored by program committee members, out of eight such submissions. A sig-
`nificantly higher standard was applied to thesesubmissions.
`Furthermore, as is customary by now, program commit-
`tee authors did not see the reviews, nor participate in the discussions for their papers. Finally, all papers were
`shepherded by one program committee member before final publication. The shepherding process,
`in general,
`resulted in much improved versions of all papers, including those that were strong from the beginning.
`
`technology changes, one
`the technical program. The talks focus on current
`This year, two invited talks complement
`.coming from industry, JavaOS, and the other being proposed in academia, Active Networks. Both technologies
`seek
`to address the rapidly changing nature of devices connected to an ever-growing global network. It is our hope that
`these talks will foster fruitful discussions in new research areas for Operating Systems.
`
`Of course, after putting together a technical program there is always the realization of things you could have done dif-
`ferently. We wish we had taken a stronger stand on accepting experience papers, and we wish we had accepted some
`of the less worked out, but more controversial
`idea-papers. However, we are happy with the result, and believe that
`the papers in this proceedings represent some of the best work being done in the field.
`
`in helping us do our job. Early on, we could not have managed the incoming stream
`Many people were instrumental
`of submissions without the help of Ivy Jorgensen. Tim Miller helped with the software that handled submissions and
`notifications
`to the authors. On the committee, Jay Lepreau helped out with all his experience from running the show
`last time, and Jim Gray was generous enough to host the committee meeting and then take us out to sail on the San
`Francisco Bay. The USENlX staff was professional and supportive throughout. Ellie Young deserves special
`thanks
`for always quickly responding to every single one of our requests.
`Judy DesHarnais organized the conference; Toni
`
`OSDI
`
`'96
`
`v
`
`007
`
`

`
`Veglia and Zanna Knight helped with the promotion: Pennfield Jensen dealt with the final publication; and finally,
`Dan Klein hung in there while we were pushing his tutorial deadlines as much as we possibly could. Many
`thanks to each of you!
`
`Above all, we thank all the authors that submitted their work to continue making OSDI a world-Class conference!
`The authors of accepted papers endured the committee's
`shepherding, and we truly appreciate your effort in produc-
`ing the papers we believed would make this such a strong program.
`
`Karin Petersen
`Willy Zwaenepoel
`
`October 1996
`
`vi
`
`J
`
`..
`
`II ......
`
`OSDI '96
`
`za
`
`008
`
`

`
`Lazy Receiver Processing (LRP): A Network Subsystem
`Architecture for Server Systems
`
`Peter Druschel and Gaurav Banga*
`
`Department of Computer Science
`Rice University
`Houston, TX 77005
`
`Abstract
`
`the widespread use
`The explosive growth of the Internet,
`ofWWW-related applications, and the increased reliance
`on client-server architectures places interesting new de-
`mands on network servers.
`In particular,
`the operating
`system running on such systems needs to manage the ma-
`chine's resources in a manner
`that maximizes and main-
`tains throughput under conditions of high load. We pro-
`pose and evaluate a new network subsystem architecture
`that provides improved fairness, stability, and increased
`throughput under high network load. The architecture
`is hardware independent
`and does not degrade network
`latency or bandwidth under normal
`load conditions.
`
`1 Introduction
`
`for high-speed
`Most work on operating system support
`networks to date has focused on improving message la-
`tency and on delivering the network's
`full bandwidth to
`application programs
`[1, 5, 7, 21J. More recently,
`re-
`searchers have started to look at resource management
`issues in network servers such as LAN servers,
`firewall
`gateways, and WWW servers [16, 17]. This paper pro-
`poses a new network subsystem architecture based on
`lazy receiver processing (LRP), which provides
`stable
`overload behavior, fair resource allocation, and increased
`throughput under heavy load from the network.
`State of the art operating systems use sophisticated
`means of controlling the resources consumed by appli-
`cation processes. Policies. for dynamic scheduling, main
`
`"This work supported in pan by National Science Foundation Grant
`CCR·9503098
`
`memory allocation and swapping are designed to ensure
`graceful behavior of a timeshared system under various
`load conditions. Resources consumed during the process-
`ing of network traffic, on the other hand, are generally not
`controlled and accounted for in the same manner. This
`poses a problem for network servers that face a large
`volume of network traffic, and potentially spend consid-
`erable amounts of resources on processing that traffic.
`In particular, UNIX based operating systems and many
`non-UNIX operating systems use an interrupt-driven net-
`work subsystem architecture
`that gives strictly highest
`priority to the processing of incoming network packets.
`This leads to scheduling anomalies, decreased through-
`put, and potential
`resource
`starvation of applications.
`Furthermore,
`the system becomes unstable in the face
`of overload from the network. This problem is serious
`even with the relatively slow current network technology
`and will grow worse as networks increase in speed.
`We propose a network subsystem architecture that in-
`tegrates network processing into the system's global re-
`source management. Under this system, resources spent
`in processing
`network traffic are associated with and
`charged to the application process that causes the traf-
`fic. Incoming network traffic is scheduled at the priority
`of the process that receives the traffic, and excess traffic
`is discarded early. This allows the system to maintain
`fair allocation of resources while handling high volumes
`of network traffic, and achieves
`system stability under
`overload.
`system based on
`show that a prototype
`Experiments
`its throughput
`and remains
`responsive
`LRP maintains
`even when faced with excessive network traffic on a 155
`Mbitls ATM network.
`In comparison,
`a conventional
`UNIX system collapses under network traffic conditions
`that can easily arise on a 10 Mbitls Ethernet.
`Further
`results show increased fairness
`in resource allocation,
`traffic separation,
`and increased throughput under high
`load.
`The rest of this paper is organized as follows. Section 2
`gives a brief overview of the network subsystem found in
`
`USENIX Association Second Symposium on Operating Systems Design and Implementation (OSOI '96)
`
`261
`
`~
`
`009
`
`

`
`o O
`
`S
`
`Application
`Processes
`
`TCP
`
`Sockets M
`[§I
`lU Datagram
`Stream l.tl]J
`I·
`
`TCP's input function is called, as appropriate. Finally-
`still in the context of the software interrupt-the
`packet is
`queued on the socket queue of the socket that is bound to
`the packet's destination port. The software interrupt has
`higher priority than any user process;
`therefore, when-
`ever a user process is interrupted by a packet arrival, the
`protocol processing for that packet occurs before control
`returns to the user process. On the other hand, software
`interrupts have lower priority than hardware interrupts;
`thus,
`the reception of subsequent packets can interrupt
`the protocol processing of earlier packets.
`When an application process performs a receive system
`calli on the socket, the packet's data is copied from the
`mbufs into the application's address space. The mbufs
`are then dequeued and deallocated. This final processing
`step occurs in the context of the user process performing
`a system call.
`On the sending side, data written to a socket by an
`application is copied into newly allocated mbufs. For
`datagram sockets (UDP), the mbufs are then handed to
`UDP and IP for transmission. After potential fragmenta-
`tion, the resulting IP packets are then transmitted, or-if
`the interface is currently busy-placed
`in,the driver's in-
`terface queue. All of these actions are executed in the
`context of the user process that performed the send sys-
`tem calion the socket. Packets queued in the interface
`queue are removed and transmitted in the context of the
`network interface's interrupt handler.
`For stream sockets (TCP), the mbufs are queued in the
`socket's outgoing socket queue, and TCP's output func-
`tion is called. Depending on the state of the TCP con-
`nection and the arguments to the send call, TCP makes a
`logical copy of all, some, or none of the queued mbufs,
`processes
`them for transmission,
`and calls IP's output
`function. The resulting IP packets are then transmitted
`or queued on the interface queue. Again, this processing
`occurs in the context of the application process perform-
`ing a system call. As for UDP packets, data is removed
`from the interface queue and transmitted in the context
`of the network interface's interrupt handler.
`Processing of any remaining data in the socket queue
`typically occurs in the context of a software interrupt. If
`TCP receives an acknowledgment, more data from the
`socket queue may be sent in the context of the software
`interrupt
`that was posted to process
`the incoming ac-
`knowledgment. Or, data may be sent in the context of a
`software interrupt that was scheduled by TCP to indicate a
`timeout. Data is not removed from the socket queue until
`its reception was acknowledged by the remote receiver.
`CPU time consumed during the processing of network
`I/O is accounted for as follows. Any processing that
`
`IWe use the term receive system call to refer to any of the five system
`caIls available to read data from a socket. The term send system call is
`used analogously to refer to system caIls that write data to a socket.
`
`Interface queue
`
`Figure 1: BSD Architecture
`
`BSD UNIX-derived systems [13] and identifies problems
`that arise when a system of this type is used as a network
`server. The design of the LRP network architecture is
`presented in Section 3. Section 4 gives a quantitative
`performance evaluation of our prototype implementation.
`Finally, Section 5 covers related work and and Section 6
`offers some conclusions.
`
`2 UNIX Network Processing
`
`This section starts with a brief overview of network pro-
`cessing in UNIX operating systems.
`It then points out
`problems that arise when a system of this type faces large
`volumes of network traffic. Finally, we argue that these
`problems are important by discussing common sources
`of high network traffic.
`the
`on
`focus
`we
`discussion,
`To simplify
`the
`TCPIUDPIIP protocol suite, and on BSD-derived UNIX
`systems [13]. Similar problems arise with other protocol
`suites,
`in System V-derived UNIX systems, and in many
`commercial non-UNIX operating systems. Figure 1 il-
`lustrates the BSD networking architecture.
`
`2.1 Overview
`On the receiving side, the arrival of a network packet is
`signaled by an interrupt. The interrupt handler, which is
`part of the network interface device driver, encapsulates
`the packet in an mbuf, queues the packet in the IP queue,
`and posts a software interrupt.
`In the context of this
`software interrupt,
`the packet
`is processed by IP. After
`potential reassembly of multiple IP fragments, UDP's or
`
`Second Symposium on Operating Systems Design and Implementation (OSDI '96) USENIXAssociation
`
`010
`
`

`
`occurs in the context of a user process performing a sys-
`tem call is charged to that process as system time. CPU
`time spent in software or hardware interrupt handlers
`is
`charged to the user process that was interrupted. Note
`that in general,
`the interrupted process may be unrelated
`to the network communication
`that caused the interrupt,
`
`2.2 Problems
`that can arise
`We now turn to describe several problems
`when a system with conventional network architecture
`faces high volumes of network traffic. Problems arise
`because of four aspects of the network subsystem:
`
`receiver processing
`Eager
`Processing of received packets is strictly interrupt-
`driven, with highest priority given to the capture and
`storage of packets in main memory; second highest
`priority is given to the protocol processing of pack-
`ets; and, lowest priority is given to the applications
`that consume the messages.
`
`Packet dropping as a
`Lack of effective load shedding
`means to resolve receiver overload occurs only after
`significant host CPU resources have already been
`invested in the dropped packet.
`
`Incoming traffic destined for
`Lack of traffic separation
`one application (socket) can lead to delay and loss
`of packets destined for another application (socket).
`
`CPU time spent in
`accounting
`resource
`Inappropriate
`interrupt context during the reception of packets is
`charged to the application that happens to execute
`when a packet arrives. Since CPU usage, as main-
`tained by the system,
`influences a process's
`future
`scheduling priority,
`this is unfair.
`
`has significant disadvan-
`receiver processing
`Eager
`tages when used in a network server.
`It gives highest
`priority to the processing of incoming network packets,
`regardless of the state or the scheduling priority of the re-
`ceiving application. A packet arrival will always interrupt
`a presently executing application, even if any of the fol-
`lowing conditions hold true:
`(1) the currently executing
`application is notthe
`receiver of the packet;
`(2) the re-
`ceiving application is not blocked waiting on the packet;
`or, (3) the receiving application has lower or equal pri-
`ority than the currently executing process. As a result,
`overheads associated with dispatching and handling of
`interrupts and increased context switching can limit
`the
`throughput of a server under load.
`the system can en-
`Under high load from the network,
`ter a state known as receiver livelock [20].
`In this state,
`the system spends all of its resources processing incom-
`ing network packets, only to discard them later because
`
`no CPU time is left to service the receiving application
`programs. For instance, consider the behavior of the sys-
`tem under increasing load from incoming UDP packets
`2. Since hardware interface interrupt and software inter-
`rupts have higher priority than user processes,
`the socket
`queues will eventually fill because the receiving appli-
`cation no longer gets enough CPU time to consume the
`packets. At that point, packets are discarded when they
`reach the socket queue. As the load increases further, the
`software interrupts will eventually no longer keep up with
`the protocol processing, causing the IP queue to fill. The
`problem is that early stages of receiver processing have
`strictly higher priority than later stages. Under overload,
`this causes packets to be dropped only after resources
`have been invested in them. As a result,
`the throughput
`of the system drops as the offered load increases until the
`system finally spends all its time processing packets only
`to discard them.
`Bursts of packets arriving from the network can cause
`scheduling anomalies.
`In particular,
`the delivery of an
`incoming message to the receiving application can be de-
`layed by a burst of subsequently
`arriving packets. This
`is because the network processing of the entire burst of
`packets must complete before any application process can
`regain control of the CPU. Also, since all incoming IP
`traffic is placed in the shared IP queue, aggregate traffic
`bursts can exceed the IP queue limit and/or exhaust
`the
`mbuf pool. Thus,
`traffic bursts destined for one server
`process can lead to the delay and/or loss of packets des-
`tined for other sockets. This type of traffic interference
`is generally unfair and undesirable.
`
`2.3 Sources of High Network Load
`Network protocols and distributed application programs
`to prevent a.sender process
`use flow control mechanisms
`from generating more traffic than the receiver process can
`handle. Unfortunately,
`flow control does not necessarily
`prevent overload of network server machines. Some rea-
`sons for this are:
`
`• simultaneous
`
`requests from a large number of clients
`
`• misbehaved distributed applications
`
`•
`
`incorrect client protocol
`
`implementations
`
`• malicious denial-of-service
`
`attacks
`
`• broadcast and multicast
`
`traffic
`
`(TCP SYN
`requests
`establishment
`TCP connection
`packets) from a large number of clients can flood a WWW
`server. This is true despite TCP's flow control mechanism
`c
`2Similar problems can arise under load from TCP connection estab-
`lishment
`request packets.
`
`USENIX Association Second Symposium on Operating Systems Design and Implementation (OSDI '96)
`
`263
`
`011
`
`

`
`and
`(which regulates traffic on established connections)
`TCP's exponential backoff strategy for connection estab-
`lishment
`requests (which can only limit
`the rate of re-
`tries). The maximal rate of SYN packets is only bounded
`by the capacity of the network.
`Similar arguments ap-
`ply for any server that serves a virtually unlimited client
`community such as the Internet.
`Distributed applications built on top of a simple data-
`gram service such as UDP must
`implement
`their own
`flow and congestion control mechanisms. When these
`mechanisms
`are deficient, excessive network traffic can
`result.
`Incorrect
`implementations of flow-controlled pro-
`tocols such as TCP-not
`uncommon in the PC market-
`can have the same effect. The vulnerability of network
`servers to network traffic overload can be and has been
`exploited for security attacks", Thus, current network
`servers have a protection and security problem, since un-
`trusted application programs running on clients can cause
`the failure of the shared server.
`There are many examples of real-world systems that
`are prone to the problems discussed above. A packet
`filtering application-level
`gateway,
`such as a firewall,
`a new TCP connection for every flow that
`establishes
`passes through it. An excessive flow establishment
`rate
`can overwhelm the gateway. Moreover, a misbehaving
`flow can get an unfair share of the gateway's
`resources
`and interfere with other flows that pass through it. Simi-
`lar problems can occur in systems that run several server
`processes, such as Web servers that use a process per con-
`nection; or, single process servers that use a kernel thread
`per connection. Scheduling anomalies, such as those re-
`lated to bursty data, can be ill-afforded by systems that
`run multimedia applications. Apart from the above ex-
`amples, any system that uses eager network processing
`can be livelocked by an excess of network traffic-this
`need not always be part of a denial of service attack, and
`can simply be because of a program error.
`These problems make it
`imperative
`that a network
`server be able to control
`its resources
`in a manner
`that
`ensures efficiency and stability under conditions of high
`network load. The conventional,
`interrupt-driven
`net-
`work subsystem architecture does not satisfy this crite-
`rion.
`
`3 Design of the LRP Architecture
`
`the design of our network
`In this section, we present
`subsystem architecture based on lazy receiver processing
`(LRP). We start with an overview, and then focus on
`details of protocol processing for UDP and TCP.
`
`Application
`Processes
`
`Stream~
`,---TC_P ---JI
`
`os
`
`Network Interface
`
`Figure 2: LRP Architecture
`
`the problems
`The proposed architecture overcomes
`discussed in the previous section through a combination
`of techniques:
`(1) The IP queue is replaced with a per-
`socket queue that is shared with the network
`interface
`(NI). (2) The network interface demultiplexes
`incoming
`packets according to their destination socket, and places
`receive queue",
`the packet directly on the appropriate
`Packets destined for a socket with a full receiver queue
`are silently discarded (early packet discard).
`(3) Receiver
`protocol processing is performed at the priority of the re-
`ceiving process'.
`(4) Whenever the protocol
`semantics
`allow it, protocol processing is performed
`lazily,
`in the
`context of the user process performing a receive
`system
`call. Figure 2 illustrates the LRP architecture.
`of
`There are several things to note about
`the behavior
`in
`this architecture. First, protocol processing for a packet
`many cases does not occur until the application
`requests
`the packet
`in a receive system call. Packet processing
`no
`longer interrupts
`the running process at the time of the
`packet's arrival, unless the receiver has higher
`schedul-
`ing priority than the currently executing process.
`This
`avoids inappropriate
`context switches and can increase
`performance.
`(demulti-
`separates
`the network interface
`Second,
`plexes) incoming traffic by destination socket and places
`packets directly into per-socket
`receive queues.
`Com-
`bined with the receiver protocol processing at application
`
`4The present discussion assumes that the network interface has an
`embedded CPU that can be programmed to perform this task.
`Sec-
`~~~ 3.2 discusses how LRP can be implemented with an uncooperative
`
`3Often, a denial-of-service
`security attack.
`
`attack is used as part of a more elaborate
`
`sFor a shared or multicast socket,
`pating processes'
`priorities.
`
`this is the highest of the partici-
`
`Second Symposium on Operating Systems Design and Implementation (OSDI '96) USENIX Association
`
`012
`
`

`
`this provides feedback to the network interface
`priority,
`about application processes'
`ability to keep up with the
`traffic arriving at a socket. This feedback is used as fol-
`lows: Once a socket's receive queue fills, the NI discards
`further packets destined for the socket until applications
`have consumed some of the queued packets. Thus,
`the
`NI can effectively shed load without consuming signif-
`icant host resources. As a result,
`the system has stable
`overload behavior and increased throughput under high
`load.
`separation of received
`the network interface's
`Third,
`traffic,

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket