throbber
PERFORMANCE EVALUATION REVIEW
`
`SPECIAL ISSUE
`
`VOLUME 28 No. 1
`
`JUNE 2000
`
`Proceedings
`ACM SIGMETRICS '2000
`International Conference on Measurement
`and Modeling of Computer Systems
`
`Sponsored
`
`by ACM SIGMETRICS
`
`EVAS COLLECT\U^
`UCV.^
`
`•Mi
`
`Santa Clara, CA, USA
`
`' U
`
`BUNGIE - EXHIBIT 1040
`
`

`

`The Association for Computing Machinery
`1515 Broadway
`New York, N.Y. 10036
`
`Copyright © 2000 by Association for Computing Machinery, Inc. (ACM). Permission to make
`digital or hard copies of portions of this work for personal or classroom use is granted without fee
`provided that the copies are not made or distributed for profit or commercial advantage and that
`copies bear this notice and the full citation on the first page. Copyright for components of this
`work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy
`otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific
`permission and/or a fee. Request permission to republish from: Publications Dept. ACM, Inc.
`Fax +1 (212) 869-0481 or E-mail permissions(a),acm.org
`
`For other copying of articles that carry a code at the bottom of the first or last page, copying is
`permitted provided that the per-copy fee indicated in the code is paid through the Copyright
`Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, +1-978-750-8400, +1-978-750­
`4470 (fax).
`
`Notice to Contributing Authors to SIG Newsletters:
`
`By submitting your article for distribution in this Special Interest Group publication,
`you hereby grant to ACM the following non-exclusive, perpetual, worldwide rights:
`
`-to publish in print on condition of acceptance by the editor
`-to digitize and post your article in the electronic version of this publication
`-to include the article in the ACM Digital Library
`-to allow users to copy and distribute the article for noncommercial, educational or research
`purposes
`
`However, as a contributing author, you retain copyright to your article and
`ACM will make every effort to refer requests for commercial use directly to
`you.
`
`ACM ISBN: 1-58113-194-1
`
`Additional copies may be ordered prepaid from:
`
`ACM Order Department
`P.O. Box 11405
`New York, N.Y. 10286-1405
`
`Phone: 1-800-342-6626
`(U.S.A. and Canada)
`+1-212-626-0500
`(All other countries)
`Fax:+1-212-944-1318
`E-mail: acmhelp@acm.org
`
`ACM Order Number: 488000
`Printed in the U.S.A.
`
`

`

`TABLE OF CONTENTS
`
`Organizing Committee
`Referees
`Message from General Chair
`Message from Program Co-Chairs
`Message from Sigmetrics Chair ...
`
`Keynote Address: The Internet and Its Future
`Leonard Kleinrock, Chairman and Founder, Nomadic Inc., Professor, Department of Computer
`Science, UCLA
`
`Session 1: Network Architecture and Protocols
`
`A Case for End System Multicast....:
`Yang-hua Chu, Sanjay Rao, Hui Zhang, Carnegie Mellon University
`
`PLM: Fast Convergence for Cumulative Layered Multicast Transmission Schemes
`Amaud Legout, Ernst W. Biersack, Institut EURECOM
`
`On Achievable Service Differentiation with Token Bucket Marking for TCP
`Sambit Sahu, University of Massachusetts, Philippe Nain, INRIA Sophia Antipolis,
`Don Towsley, University of Massachusetts, Christophe Diot, Sprint ATL, Victor Firiou, Nortel
`Networks
`
`in
`IV
`v
`VI
`vn
`
`1
`
`13
`
`23
`
`Session 2: File and Storage Systems
`
`Feasibility of a Serverless Distributed File System Deployed on an Existing Set of
`34
`Desktop PCs
`William J. Bolosky, John R. Douceur, Microsoft Research, David Ely, University of Washington,
`Marvin Theimer, Microsoft Research
`
`44
`Comparing Random Data Allocation and Data Striping in Multimedia Servers
`Jose Renato Santos, Richard Muntz, UCLA, Berthier Ribeiro-Neto, Universidade Federal de Minas
`Gerais
`
`56
`Modeling and Performance of MEMS-Based Storage Devices
`John Linwood Griffin, Steven W. Schlosser, Gregory R. Ganger, David F.Nagle, Carnegie Mellon
`University.
`
`VIM
`
`

`

`Session 3: Web and Multimedia Servers
`
`Implications of Proxy Caching for Provisioning Networks and Servers
`Mohammad S. Raunak, Prashant Shenoy, University of Massachusetts, Pawan Goyal, Ensim
`Corporation, Krithi Ramamritham, University of Massachusetts
`
`Collaborative Web Caching Based on Proxy Affinities,
`Jiong Yang, Wei Wang, IBM, Richard Muntz, UCLA
`
`66
`
`78
`
`Cluster Reserves: A Mechanism for Resource Management in Cluster-Based Network Servers... 90
`Mohit Aron, Peter Druschel, Willy Zwaenepoel, Rice University
`
`Poster Session
`
`Analysis of the Phenomenon of Several Consecutive Slow Start Phases in TCP,
`Chadi Barakat, Eitan Altman, INR1A Sophia Antipolis
`.
`
`Providing Guaranteed Quality of Service for Interactive Visualization Applications
`Wai-Man R. Wong, Richard R. Muntz, UCLA
`
`IP Multicast Fault Recovery in PIM over OSPF
`Xin Wang, Chienming Yu, Henning Schulzrinne, Columbia University,
`Paul Stirpe, Wei Wu, Reuters
`
`102
`
`104
`
`106
`
`Cell-based Multicast Grouping in Large-Scale Virtual Environments
`Emmanuel Lety, Thierry Turletti, INRIA Sophia Antipolis, Francois Baccelli, ENRIA/ENS
`
`108
`
`Temporal Locality in Web Request Streams: Sources, Characteristics, and Caching
`Implications
`Azer Bestavros, Shudong Jin, Boston University
`
`Automated Disk Drive Characterization
`Jiri Schindler, Gregory Ganger, Carnegie Mellon University
`
`Online Superpage Promotion Revisited
`Zhen Fang, Lixin Zhang, John Carter, Sally McKee, Wilson C. Hsieh University of Utah
`
`110
`
`112
`
`114
`
`An Inherently Loss-Less and Bandwidth Efficient Periodic Broadcast Scheme for VBR Video.... 116
`loanis Nikolaidis, Fulu Li, Ailan Hu University of Alberta
`
`An Analysis of Short-Term Fairness in Wireless Media Access Protocols
`Can Emre Koksal, Hisham Kassab, Hari Balakrishnan, MIT
`
`RESCU: Dynamic Hybrid Packet Loss Recovery for Video Transmission over the Internet
`Srinath R. Joshi, Injong Rhee, North Carolina State University
`
`118
`
`120
`
`IX
`
`

`

`The Content and Access Dynamics of a Busy Web Server
`Venkata N. Padmanabhan, Microsoft Research, Lili Qiu, Cornell University
`
`Session 4: Networking: Congestion Control
`
`TCP in presence of Bursty Losses
`Eitan Altman, Konstantin Avrachenkov, Chadi Barakat, INRIA Sophia Antipolis
`
`122
`
`124
`
`The Incremental Deployability of RTT-Based Congestion Avoidance for High Speed TCP Internet
`134
`Connections
`Jim Martin, GartnerGroup, Arne Nilsson, Injong Rhee, North Carolina State University
`
`Detecting Shared Congestion of Flows Via End-to-end Measurement
`Dan Rubenstein, Jim Kurose, Don Towsley, University of Massachusetts
`Best Student Paper Award
`
`145
`
`Session 5: Network Measurement and Performance Modeling
`
`156
`Measurement and Analysis ofLDAP Performance
`Xin Wang, Henning Schulzrinne, Columbia University, Dilip Kandlur, Dinesh Verma, IBM T.J.
`Watson Research Center
`
`IP Packet Generation: Statistical Models for TCP Start Times Based on Connection-Rate
`Superposition
`William S. Cleveland, Dong Lin, Don X. Sun, Bell Laboratories Lucent Technologies
`
`On the Impact of Soft Hand-off in Cellular Systems
`Nidhi Hegde, Khosrow Sohraby, University of Missouri-Kansas City
`
`Session 6: Queueing and Performance Evaluation Techniques
`
`Delay Asymptotics for a Priority Queueing System
`Sanjay Shakkottai, R. Srikant, University of Illinois at Urbana-Champaign
`
`166
`
`178
`
`188
`
`A Fast and Accurate Iterative Solution of a Multi-Class Threshold-based Queueing System with
`196
`Hysteresis
`Leana Golubchik, University of Maryland, John C.S. Lui, The Chinese University of Hong Kong
`
`Using the Exact State Space of a Markov Model to Compute Approximate Stationary Measures 207
`Andrew S Miner, Gianfranco Ciardo, College of William and Mary,
`Susanna Donatelli, Universita di Torino
`
`x
`
`

`

`AMVA Techniques for High Service Time Variability.
`217
`Derek L. Eager, Univ. of Saskatchewan, Daniel J. Sorin, Mary K. Vernon, University of Wisconsin
`
`Session 7: Tools and Benchmarking
`
`Eff icient Performance Prediction for Modem Microprocessors
`David Ofelt, Juniper Networks, John Hennessy, Stanford University
`
`Improving Interactive System Performance Using TIPME
`Yasuhiro Endo, Network Appliance, Margo Seltzer, Harvard University
`
`Quantifying the Energy Consumption of a Pocket Computer and a Java Virtual Machine.
`Keith Farkas, Compaq Western Research Lab, Jason Flinn, Carnegie Mellon University,
`Godmar Back, University of Utah, Dirk Grunwald, University of Colorado,
`Jennifer Anderson, VMware
`
`Session 8: Memory Management and Databases
`
`Memory System Behavior of Java Programs: Methodology and Analysis
`Jin-Soo Kim, Seoul National University, Yarsun Hsu, National Tsing Hua University
`Best Systems Paper Award
`
`229
`
`240
`
`252
`
`264
`
`An Analytical Model of the Working-Set Sizes in Decision-Support Systems
`Magnus Karlsson, HP Labs, Fredrik Dahlgren, Ericsson, Per Stenstroem, Chalmers University
`
`275
`
`Towards Application/File-level Characterization of Block References:
`A Case for Fine-Grained Buffer Management
`Jongmoo Choi, Seoul National University, Sam H. Noh, Hong-Ik University, Sang Lyul Min,
`Yookun Cho, Seoul National University
`
`286
`
`Session 9: Network Routing
`
`Multicast Routing with Bandwidth Guarantees: A New Approach Using Multicast Network
`Flow
`Murali S. Kodialam, T. V. Lakshman, Bell Laboratories Lucent Technologies,
`Sudipta Sengupta, MIT
`
`Stable Internet Routing Without Global Coordination
`Lixin Gao, Smith College, Jennifer Rexford, AT&T Labs Research
`
`296
`
`307
`
`An Efficient Algorithm for Finding a Path Subject to Two Additive Constraints
`318
`Turgay Korkmaz, Marwan Krunz, University of Arizona, Spyros Tragoudas, Southern Illinois Univ
`Author Index
`
`328
`
`XI
`
`

`

`A Case for End System Multicast *
`
`Yang-hua Chu, Sanjay G. Rao, and Hui Zhang
`Carnegie Mellon University
`{yhchu, sanjay, hzhang}@cs.cmu.edu
`
`ABSTRACT
`The conventional wisdom has been that IP is the natural
`protocol layer for implementing multicast related function­
`ality. However, ten years after its initial proposal, IP Multi­
`cast is still plagued with concerns pertaining to scalability,
`network management, deployment and support for higher
`layer functionality such as error, flow and congestion con­
`trol. In this paper, we explore an alternative architecture for
`small and sparse groups, where end systems implement all
`multicast related functionality including membership man­
`agement and packet replication. We call such a scheme End
`System Multicast. This shifting of multicast support from
`routers to end systems has the potential to address most
`problems associated with IP Multicast. However, the key
`concern is the performance penalty associated with such a
`model. In particular, End System Multicast introduces du­
`plicate packets on physical links and incurs larger end-to-
`end delay than IP Multicast. In this paper, we study this
`question in the context of the Narada protocol. In Narada,
`end systems self-organize into an overlay structure using a
`fully distributed protocol. In addition, Narada attempts to
`optimize the efficiency of the overlay based on end-to-end
`measurements. We present details of Narada and evaluate
`it using both simulation and Internet experiments. Prelimi­
`nary results are encouraging. In most simulations and Inter­
`net experiments, the delay and bandwidth penalty are low.
`We believe the potential benefits of repartitioning multicast
`functionality between end systems and routers significantly
`outweigh the performance penalty incurred.
`
`1. INTRODUCTION
`Traditional network architectures distinguish between two
`types of entities: end systems (hosts) and the network (switches
`
`*This research was sponsored by DARPA under contract
`numbers N66001-96-C-8528, F30602-99-1-0518, and E30602-
`97-2-0287, and by NSF under grant numbers Career Award
`NCR-9624979 ANI-9730105, and ANI-9814929. Additional
`support was provided by Intel, Lucent, and Ericsson. Views
`and conclusions contained in this document are those of the
`authors and should not be interpreted as representing
`the of-
`ficial policies, either expressed or implied, of DARPA, NSF,
`Intel, Lucent, Ericsson or the U.S. government.
`
`Perrriission to make digital or hard copies of all or part of this work for
`personal or classroom use is granted without fee provided that
`copies are not made or distributed for profit or commercial advant
`-age and that copies bear this r»otice and the full citation on the first page.
`To copy otherwise, to republish, to post on servers or to
`redistribute to lists, requires prior specific permission and/or a fee.
`SIGMETRICS 2000 6/00 Santa Clara, California, USA
`© 2000 ACM 1-58113-194-1/00/0006...$5.00
`
`and routers). One of the most important architectural deci­
`sions is then the division of functionality between end sys­
`tems and networks.
`
`In the Internet architecture, the internetworking layer, or IP,
`implements a minimal functionality — a best-effort unicast
`datagram service, and end systems implement all other im­
`portant functionality such as error, congestion, and flow con­
`trol. Such a minimalist approach is probably the single most
`important technical reason for the Internet's growth from
`a small research network into a global, commercial infras­
`tructure with heterogeneous technologies, applications, and
`administrative authorities. The growth of the network in
`turn unleashes the development of new applications, which
`require richer network functionality.
`
`The key architecture question is: what new features should
`be added to the IP layer? Multicast and QoS are the two
`most important features that have been or are being added
`to the IP layer. While QoS is a functionality that cannot be
`provided by end systems alone and thus has to be supported
`at the IP layer, this is not the case for multicast. In partic­
`ular, it is possible for end systems to implement multicast
`services on top of the IP unicast service.
`
`In deciding whether to implement multicast services at the
`IP layer or at end systems, there are two conflicting consid­
`erations that we need to reconcile. According to the end-to-
`end arguments [17], a functionality should be (a) pushed to
`higher layers if possible, (b) unless implementing it at the
`lower layer can achieve large performance benefit that out­
`weighs the cost of additional complexity at the lower layer.
`
`In his seminal work in 1989 [4], Deering argues that this sec­
`ond consideration should prevail and multicast should be im­
`plemented at the IP layer. This view so far has been widely
`accepted. IP Multicast is the first significant feature that
`has been added to the IP layer since its original design and
`most routers today implement IP Multicast. Despite this, IP
`Multicast has several drawbacks that have so far prevented
`the service from being widely deployed. First, IP Multicast
`requires routers to maintain per group state, which not only
`violates the "stateless'' architectural principle of the origi­
`nal design, but also introduces high complexity and serious
`scaling constraints at the IP layer. Second, the current IP
`Multicast model allows for an arbitrary source to send data
`to an arbitrary group. This makes the network vulnerable
`to flooding attacks by malicious sources, and complicates
`network management and provisioning. Third, IP Multi­
`cast requires every group to dynamically obtain a globally
`unique address from the multicast address space and it is
`
`1
`
`

`

`difficult to ensure this in a scalable, distributed and consis­
`tent fashion. Fourth, IP Multicast is a best effort service.
`Providing higher level features such as reliability, congestion
`control, flow control, and security has been shown to be more
`difficult, than in the unicast case. Finally, IP Multicast calls
`for changes at the infrastructural level, and this slows down
`the pace of deployment. While there have been attempts
`to partially address some of the issues at the IP layer [9,
`15, 21], fundamental concerns pertaining to the "stateful"
`architecture of IP Multicast and support for higher layer
`functionality have remained unresolved.
`
`In this paper, we revisit the issue of whether multicast re­
`lated functionality should be implemented at the IP layer or
`at the end systems. In particular, we consider a model in
`which multicast related features, such as group membership,
`multicast routing and packet duplication, are implemented
`at end systems, assuming only unicast IP service. We call
`the scheme End System Multicast. Here, end systems par­
`ticipating in the multicast group communicate via an overlay
`structure. The structure is an overlay in the sense that each
`of its edges corresponds to a unicast path between two end
`systems in the underlying Internet.
`
`We believe that End System Multicast has the potential to
`address most problems associated with IP Multicast. Since
`all packets are transmitted as unicast packets, network pro­
`visioning is not affected and deployment may be acceler­
`ated. End System Multicast maintains the stateless nature
`of the network by requiring end systems, which subscribe
`only to a small number of groups, to perform additional
`complex processing for any given group. In addition, we be­
`lieve that solutions for supporting higher layer features such
`as error, flow, and congestion control can be significantly
`simplified by leveraging well understood unicast solutions
`for these problems. We hope to demonstrate this in future
`works. Finally, an end system based architecture no longer
`requires globed consistency in naming of groups and allows
`for application specific naming.
`
`While End System Multicast has many advantages, several
`issues need to be resolved before it become a practical alter­
`native to IP Multicast. In particular, an overlay approach
`to multicast, however efficient, cannot perform as well as
`IP Multicast. It is impossible to completely prevent mul­
`tiple overlay edges from traversing the same physical link
`and thus some redundant traffic on physical links is unavoid­
`able. Further, communication between end systems involves
`traversing other end systems, potentially increasing latency.
`In this paper, we focus on two fundamental questions per­
`taining to the End System Multicast architecture: (i) what
`are the performance implications of using an overlay archi­
`tecture for multicast? and (ii) how do end systems with
`limited topological information cooperate to construct good
`overlay structures?
`
`In this paper, we seek to answer these questions in the con­
`text of a protocol that we have developed called Narada.
`Narada constructs an overlay structure among participat­
`ing end systems in a self-organizing and fuliy distributed
`manner. Narada is robust to the failure of end systems and
`to dynamic changes in group membership. End systems
`begin with no knowledge of the underlying physical topol­
`ogy, and they determine latencies to other end systems by
`probing them in a controlled fashion. Narada continually
`refines the overlay structure as more probe information is
`available. Narada may be distinguished from many other
`
`self-organizing protocols in that it does not require a native
`multicast medium. We present details in Section 3.
`
`We evaluate the performance penalty of the overlay Narada
`produces using simulations. In a group of 128 members, the
`delay between at least 90% of pairs of members increases
`by a factor of at most 4 compared to the unicast delay be­
`tween them. Further, no physical link carries more than
`9 identical copies of a given packet. We have also imple­
`mented Narada and conducted preliminary Internet exper­
`iments. For a group of 13 members, the delay between at
`least 90% of pairs of members increases by a factor of at
`most 1.5 compared to the unicast delay between them.
`
`While we refer to end systems in this paper, it is with the
`understanding that this may easily be generalized to nodes
`at the edge of the network such as campus wide proxies and
`edge routers. Proxies can exploit native multicast available
`at the LAN level, have larger processing power than hosts,
`are better connected to the Internet and allow for organi­
`zation level agreements. Further, they could potentially ex­
`ploit temporal and spatial locality in group formations, At
`the other end of the spectrum, tree building mechanisms
`might be built into actual applications running on hosts,
`and could exploit application specific requirements and pe­
`culiarities. Irrespective of these differences, common to all
`these architectures is the design and evaluation of an actual
`self-organization protocol for construction of overlay struc­
`tures for data delivery. An end system in our paper refers to
`the entity that actually takes part in the self-organization
`protocol, and could be a host, or an application or a proxy.
`
`We believe End System Multicast is more appropriate for
`small size and sparse groups such as audio/video conferenc­
`ing, virtual classroom and multiparty network games, but is
`not appropriate for handling applications such as Internet
`TV that involve a single source streaming high bandwidth
`and real time data to several hundred thousand recipients.
`Regardless, we believe End System Multicast is of interest
`as we hypothesize that there will be an explosion of small
`sized and sparse groups in the near future.
`
`2. END SYSTEM MULTICAST
`In this section, we contrast IP Multicast, End System Mul­
`ticast and naive unicast with an example, and outline fun­
`damental issues in the design of an End System Multicast
`protocol.
`
`Consider Figure 1(a) which depicts an example physical
`topology, /il and R2 are routers, while A, B, C and D are
`end systems. Link delays are as indicated. Thus Rl — R2
`may be imagined to be a costly transcontinental link, while
`all other links are cheaper local links: Further, let us assume
`A wishes to send data to all other nodes.
`
`Figure 1(b) depicts the IP Multicast tree constructed by
`DVMRP [4]. DVMRP is the classical IP Multicast protocol,
`where data is delivered from the source to recipients using
`an IP Multicast tree composed of the shortest paths from
`each recipient to the source. R\ and R2 receive a single
`copy of the packet but forward it along multiple interfaces.
`At most one copy of a packet is sent over any physical link.
`Each recipient receives data with the same delay as though
`A were sending to it directly by unicast.
`
`End System Multicast does not rely on router support for
`
`2
`
`

`

`27
`
`27
`
`28
`
`28
`(c)
`
`27
`
`A
`
`B
`
`I
`
`2
`
`r1
`
`25
`
`r2
`
`c
`
`a
`
`1
`
`3
`
`1
`
`d
`
`b
`
`(a)
`
`a
`
`1
`
`|r1
`2
`
`25
`
`b
`
`c
`
`a
`
`3
`
`1
`r2
`
`27
`
`1 (d
`
`b
`
`fa
`
`T
`
`25
`
`(e)
`
`27
`
`c
`
`a
`
`2
`
`d
`
`b
`
`1
`
`2
`
`Mc
`
`a
`
`3
`
`D
`
`b
`
`(b)
`
`(d)
`
`a
`
`c
`1
`
`1 (D
`
`b
`
`1
`
`r2
`
`1
`R1
`
`2
`
`25
`
`c
`
`1 (D
`
`•Kc
`
`2
`
`d
`
`3
`
`g>
`
`(g)
`
`27
`
`28
`(h)
`
`i
`
`D
`
`(0
`Figure 1: Examples to illustrate IP Multicast, naive unicast and End System Multicast
`3. NARADA DESIGN
`In this section, we present Narada, a protocol we designed
`that implements End System Multicast. In designing Narada,
`we have the following objectives in mind:
`
`multicast. It abstracts the physical topology sis a Complete
`Virtual Graph (CVG) as shown in Figure 1(c). Further, it
`tries to construct a spanning-tree of the CVG along which
`A could send data to other recipients. In particular, this
`scheme could degenerate to naive unicast transmission as
`shown in Figure 1(d). Figure 1(e) depicts how naive unicast
`transmission maps onto the underlying physical network. It
`is seen that links Rl — R2 and A — Rl carry 2 and 3 copies
`of a transmission by A respectively. We refer to the number
`of identical copies of a packet carried by a physical link as
`the stress of a physical link. Thus, unicast in general leads
`to a high stress on the link nearest the source.
`
`We could build smarter spanning trees of the CVG such
`as the one shown in Figure 1(f). Figure 1(g) depicts how
`this tree maps onto the underlying physical network. It is
`seen that all physical links have a stress of at most 2. Not
`only does this tree reduce the worst case physical link stress,
`but it also reduces the stress of the costlier link Rl — R2
`to 1. In order to capture this, we introduce the notion of
`resource usage. We define resource usage as
`* 3,,
`where, L is the number of links active in data transmission,
`di is the delay of link i and s; is the stress of link i. The
`resource usage is a metric of the network resources consumed
`in the process of data delivery to all receivers. Implicit here
`is the assumption that links with higher delay tend to be
`associated with higher cost. The resource usage might be
`computed to be 30 in the case of transmission by DVMRP,
`57 for naive unicast and 32 for the smarter tree , shown in
`Figure 1(b), Figure 1(d) and Figure 1(f) respectively.
`
`While the spanning tree of Figure 1(f) improves link stress
`and resource usage as compared to naive unicast transmis­
`sion, it increases delay from the source to some of the recip­
`ients. Thus, the delay from A to D has increased to 29 from
`27. We refer to the ratio of the delay between two members
`along the overlay to the unicast delay between them as the
`Relative Delay Penalty (RDP). Thus, <A,D> has an RDP
`of
`while <A,B> and <A,C> have an RDP of 1.
`
`The out-degree oi a node in the overlay tree is its number of
`children in the tree. Thus, the out-degree of A in the smarter
`tree is 2, while it is 3 in naive unicast. The out-degree of a
`node in the overlay spanning tree directly impacts the stress
`of the links close to the node.
`
`3
`
`• Self-organizing: The construction of the end system over­
`lay must take place in a fully distributed fashion and must
`be robust to dynamic changes in group membership.
`
`• Overlay efficiency: The tree constructed must ideally have
`low stress, low RDP and low resource usage. Rirther, the
`out-degree of each member in the overlay must reflect the
`bandwidth of its connection to the Internet.
`
`• Self-improving in an incremental fashion: The overlay con­
`struction must include mechanisms by which end systems
`gather network information in a scalable fashion. The pro­
`tocol must allow for the overlay to incrementally evolve into
`a better structure as more information becomes available.
`
`There are two basic methods for construction of overlay
`spanning trees for data delivery. A first approach is to
`construct the tree directly - that is, members explicitly se­
`lect their parents from among the members they know [8].
`Narada however constructs trees in a two-step process. First
`it constructs a richer connected graph that we term mesh.
`The mesh could in general be an arbitrary connected sub­
`graph of the CVG (though Narada tries to ensure the mesh
`has desirable performance properties as discussed later.) In
`the second step, Narada constructs (reverse) shortest path
`spanning trees of the mesh, each tree rooted at the corre­
`sponding source using well known routing algorithms. Fig­
`ure 1(h) presents an example mesh that Narada constructs
`for the physical topology shown in Figure 1(a), along with
`the shortest path spanning tree rooted at A. We have several
`reasons for this two step process. First, group management
`functions are abstracted out and handled at the mesh rather
`than replicated across multiple (per source) trees. Second,
`distributed heuristics for repairing mesh partition and mesh
`optimization are greatly simplified as loop avoidance is no
`longer a constraint. Third, we may leverage standard rout­
`ing algorithms for construction of data delivery trees. Fi­
`nally, a mesh is more resilient to the failure of members
`than a tree and heavy weight partition repair mechanisms
`are invoked less frequently.
`
`

`

`In our approach, there is no control over the resulting span­
`ning trees for a given mesh. Hence, it becomes important
`to construct a good mesh so that good quality trees may
`be produced. In particular, we attempt to ensure the fol­
`lowing properties: (i) the shortest path delay between any
`pair of members along the mesh is at most K times the uni-
`cast delay between them, where K is a small constant and
`(ii) each member has a limited number of neighbors in the
`mesh which does not exceed a given (per-member) bound
`chosen to reflect the bandwidth of the member's connection
`to the Internet.1 Limiting the number of neighbors regu­
`lates the fanout of members in the spanning trees. Second,
`it controls the overhead of running routing algorithms on
`the mesh. The extreme case where the mesh is chosen to be
`the Complete Virtual Graph incurs all the overhead of rout­
`ing with none of its benefits as the resulting shortest path
`spanning trees degenerates to naive unicast transmission.
`
`Narad a has striking differences from self-configuring proto­
`cols developed in other contexts. First, Narada distinguishes
`itself from normal routing protocols in that it changes the
`very topology over which routing is performed. Second,
`most existing self-configuring protocols [12, 13, 14, 22] as­
`sume native IP Multicast support. Narada attempts self-
`configuration in the absence of a lower level multicast ser­
`vice, and this is fundamentally more challenging.
`
`We explain the distributed algorithms that Narada uses to
`construct and maintain the mesh in Section 3.1. We present
`heuristics that Narada uses to improve mesh quality in Sec­
`tion 3.2. Narada runs a variant of standard distance vector
`algorithms on top of the mesh and uses well known algo­
`rithms to construct per-source (reverse) shortest path span­
`ning trees for data delivery. We discuss this in Section 3.3.
`
`3.1 Group Management
`We have seen that Narada tries to construct a mesh among
`end systems participating in the multicast group. In this sec­
`tion, we present mechanisms Narada uses to keep the mesh
`connected, to incorporate new members into the mesh and
`to repair possible partitions that may be caused by members
`leaving the group or by member failure.
`
`As we do not wish to rely on a single non-failing entity to
`keep track of group membership, the burden of group main­
`tenance is shared jointly by all members. To achieve a high
`degree of robustness, our approach is to have every member
`maintain a list of all other members in the group. Since
`Narada is targeted towards small sized groups, maintaining
`the complete group membership list is not a major over­
`head. Every member's list needs to be updated when a new
`member joins or an existing member leaves. The challenge
`is to disseminate changes in group membership efficiently,
`especially in the absence of a multicast service provided by
`the lower layer. We tackle this by exploiting the mesh to
`propagate such information. However, this strategy is com­
`plicated by the fact that the mesh might itself become par­
`titioned when a member leaves. To handle this, we require
`that each member periodically generate a refresh message
`with monotonically increasing sequence number, which is
`disseminated along the mesh. Each member % keeps track of
`
`1An ideal mesh is a "Degree-Bounded K-spanner" [11] of
`the Complete Virtual Graph. The problem of construct­
`ing Degree-Bounded K-spanners of a graph has been widely
`studied in centralized settings that assume complete infor-
`mation and is NP-complete even in such scenarios [11].
`
`Let i receive refresh message from neighbor j at t ' s loca]
`time t. Let < k, s^j > be an entry in j s refresh message.
`* if i does not have em entry for k , then i inserts the
`entry < k, S k j, t > into its table
`* else if t's entry for k is < k , S k i ^ k i >» then
`• if s ^ i > sk j * ignores the entry pertaining to k
`• else i updates its entry for k to <
`>
`
`Figure 2: Actions taken by a member i on receiving
`a refresh message from member j.
`
`the following information for every other member k in the
`group: (i) member address k; (ii) last sequence number Ski
`that i knows k has issued; and (iii) local time at i when i
`first received information that k issued
`If member i has
`not received an update from member k for Tm time, then,
`i assumes that k is either dead or potentially partitioned
`from t. It then initiates a set of actions to determine the
`existence of a partition and repair it if present as discussed
`in Section 3.1.3.
`
`Propagation of refresh messages from every member along
`the mesh could potentially be quite expensive. Instead, we
`require that each member periodically exchange its knowl­
`edge of group membership with its neighbors in the mesh.
`A message from member i to a neighbor j contains a list of
`entries, one entry for each member k that i knows is part of
`the group. Each entry has the following fields: (i) membe

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket