`
`Computer
`Communication
`Review
`a quarterly publication of the special interest group on data communication
`
`Chairman:
`A. Lyman Chapin
`BBN Communications
`150 Cambridge Park Drive
`Cambridge, MA 02140
`+1617873 3133
`FAX: +1 617 873 4086
`lyman@bbn.com
`
`Editor:
`David Oran
`Digital Equipment Corporation
`Mail Stop LKG1-2/A19
`550 King Street
`Littleton, MA 01460
`+ 1 508 486 7377
`FAX: + 1 508 486 5279
`Oran@sneezy.!kg.dee.com
`
`Associate Editor:
`Lixia Zhang
`Xerox PARC
`3333 Coyote Hi II Road
`Palo Alto, CA 94304
`+1415812 4415
`lixia@parc.xerox.com
`
`Vice Chairman:
`Raj Jain
`Digital Equipment Corporation
`Mail Stop LKG1-2/A19
`550 King Street
`Littleton, MA 01460
`+ 1 508 486 7642
`FAX: +1 508 486 5279
`Jain@Erlang.enet.dec.com
`
`Executive Committe:
`A. Lyman Chapin
`Raj Jain
`Chris Edmondson-Yurkanan
`David Oran
`Vinton G. Cerf
`
`Conference Coordinator:
`Jose Joaquin Garcia-Luna
`SRI International
`Network Information Systems
`Center
`333 Ravenswood Avenue,
`EJ201 Menlo Park, CA 94025
`+1415859 5647
`FAX: +1 415 859 6028
`garcia@nisc.sri.com
`
`COMPUTER COMMUNICATION
`REVIEW is a quarterly publication of
`the ACM Special Interest Group on
`Data Communication. Its scope of
`interest includes : data communica(cid:173)
`tion systems for computers; data
`communication technology for com(cid:173)
`puters; reliability, security and
`integrity of data in data communica(cid:173)
`tion systems; problems of interfacing
`
`communication systems and computer
`systems; computer communication
`system modelling and analysis.
`Items attributed to persons will
`ordinarily be interpreted as personal
`rather than organizational opinions.
`Technical papers appearing in
`Computer Communication Review
`are informally reviewed .
`
`Secretary-Treasurer:
`Chris Edmondson-Yurkanan
`University of Texas
`Computer Sciences Department
`Austin, TX 78712-1188
`+1512471 9546
`FAX: +1512471 8885
`dragon@cs.utexas.edu
`
`SIG Program Director:
`Pat Mccarren
`Assoc. Computing ~achinery
`1515 Broadway, 17t Floor
`New York, NY 10036
`+1212626 0611
`FAX: +1 212 302 5826
`mccarren@acmvm .bitnet
`
`ACM SIGCOMM Lecturers:
`Raj Jain
`Deepinder Sidhu
`
`Award Committee Chairman:
`Franklin F. Kuo
`SRI International
`. Computer & Information Sciences
`, 333 Ravenswood Avenue
`Menlo Park, CA 94025
`+ 1 415 859 4116
`FAX: +1 415 859 6171
`kuo@nisc.sri.com
`
`A SIGCOMM membership application
`can be found on the last page of this
`issue
`
`COMPUTER COMMUNICATION REVIEW (ISSN 0146-4833) is published five times a year (January, April, July, and two issues in October)
`by the Association tor Computing Machinery, Inc., 1515 Broadway, New York, NY 10036. Second-class postage paid at New York, NY 10001,
`and at additional mailing offices.
`Postmaster:Send address changes to ACM COMPUTER COMMUNICATION REVIEW, 1515 Broadway, New York, NY 10036.
`
`
`
`•
`
`• • • • • • • • • • • • • • •
`
`acm (~:~, S i g C O m m
`
`Computer
`Communication
`Review
`.~.
`-.~
`
`iHt:,'LJN
`OF · . . IVERsrry
`M,CH;GAN
`
`JAN 12 7994_
`
`ENGJNEER·~'G
`LIBRAfi.,V
`
`Special Interest
`Group on
`Data
`Communication
`
`The
`SIGCOMM
`Quarterly
`Publication
`
`Volume 23
`Number 4
`
`October, 1993
`
`S1GCOMM'93
`
`CONFERENCE PROCEEDIN,GS
`
`Communications Arch itectures, Protocols and Applications
`
`September 13-17, 1993,
`
`San Francisco, California, USA
`
`LA~
`
`....
`
`
`
`The Association for Computing Machinery
`1515 Broadway
`New York, N.Y. 10036
`
`Copyright © 1993 by the Association for Computing Machinery, Inc. Copying without fee
`is permitted provided that the copies are not made or distributed for direct commercial
`advantage,and credit to the source is given. Abstracting with credit is permitted. For other
`copying of articles that carry a code at the bottom of the first page, copying is permitted
`provided that the per-copy fee indicated in the code is paid through the Copyright Clearance
`Center, 27 Congress Street, Salem, MA 01970. For permission to republish write to:
`Director of Publications, Association for Computing Machinery. To copy otherwise or
`republish, requires a fee and/or specific permission.
`
`Additional copies may be ordered prepaid from:
`
`ISBN: 0-89791-619-0
`
`ACM Order Department
`P.O. Box 12114
`Church Street Station
`New York, N.Y. 10257
`
`Phone: 1-800-342-6626.'
`(U.S.A. and Canada)
`1-212-626-0500
`(All other countries)
`Fax: 1-212-944-1318;
`E-mail: acmpubs@acm.org
`
`ACM Order Number: 533930
`
`Printed in the U.S.A.
`
`
`
`TABLE OF CONTENTS
`
`Session 1: Keynote Session
`Chair: Imrich Chlamtac, University of Massachussets
`
`Session 2: Real Time Communications
`Chair: Lyman Chapin! BBN Communications
`
`On Per-Session End-to-End Delay Distributions and the Call
`Admission Problem for Real-Time Applications with QOS
`Requirements
`David Yates, James Kurose, Don Towsley (University of Massachussets) and
`Michael G. Hluchyj (Motorola Codex)
`
`Analysis of Burstiness and Jitter in Real-Time Communications
`Zheng Wang and Jon Crowcroft (University College London)
`
`An Adaptive Congestion Control Scheme for Real-Time Packet
`Video Transport
`Hemant Kanakia (AT&T ,Bell Laboratories), Partho P. Misra (University
`of Maryland) and Amy Reibman (AT&T Bell Laboratories)
`
`Session 3: Routing
`Chair: Martha Steenstrup! BBN Systems and Technologies
`
`J
`
`The Synchronization of Periodic Routing Messages
`Sally Floyd and Van Jacobson (Lawerence Berkeley Laboratory)
`
`Dynamics of Internet Routing Information
`Bilal Chinoy (San Diego Supercomputer Center)
`
`Open Shortest Path First (OSPF) Routing Protocol Simulation
`Deepinder Sidhu, Tayang Fu, Shukri Abdallah, Raj Nair
`(University of Maryland Baltimore County) and Rob Col tun (Consultant)
`
`Session 4: Protocol Implementation Issues
`Chair: Craig Partridge! BBN
`
`Implementing Network Protocols at User Level
`Chandramohan A. Thekkath, Thu D. Nguyen, Evelyn Moy and
`Edward D. Lazowska (University of Washington)
`
`2
`
`13
`
`20
`
`33
`
`45
`
`53
`
`64
`
`ix
`
`
`
`Locking Effects in Multiprocessor Implementation of Protocols
`Mats Bjorkman (Uppsala University) and
`Per Gunningberg (Swedish Institute of Computer Science)
`
`74
`
`Session 5: Multicast
`Chair: David R. Cheriton! Stanford University
`
`Core Based Trees ( CBT) An Architecture for Scalable
`Inter-Domain Multicast Routing
`Tony Ballardie (University College London), Paul Francis (Bellcore) and
`Jon Crowcroft (University College London)
`
`Routing Reserved Bandwidth Multi-point Connections
`Dinesh C. Verma, P. M. Gopal (IBM TJ Watson Research Center)
`
`C~al Ordering in Reliable Group Communications
`Rosario Aiello, Elena Pagani and Gian Paolo Rossi
`(Universita' degli Studi di Milano)
`
`Session 6: Congestion and Admission Control, Scheduling
`Chair: Raj Jain! Digital Equipment Corporation
`
`Optimizing File Transfer Response Time Using the Loss-Load
`Curve Congestion Control Mechanism
`Carey L. Williamson (University of Saskatchewan)
`
`An Adaptive Framework for Dynamic Access to Bandwidth
`at High Speed
`Kerry W. Fendick and Manoel A. Rodrigues (AT&T Bell Laboratories)
`
`Warp Control: A Dynamically Stable Congestion Protocol
`and its Analysis
`Kihong Park (Boston University)
`
`Session 7: Protocol Description
`Chair: Jim Kurose, University of Massachusetts
`
`85
`
`96
`
`106
`
` 117
`
`127
`
`137
`
`Control Handling in Real-Time Communication Protocols
`Atsushi Shionozaki and Mario Tokoro (Keio University)
`
`149
`
`Structural Complexity and Execution Efficiency of Distributed
`
`160
`
`X
`
`
`
`Application Protocols
`K. Ravindran and X. T. Lin (Kansas State University)
`
`A Data Labelling Technique for High-Performance Protocol
`Processing and Its Consequences
`David C. Feldmeier (Bellcore)
`
`Session 8: Traffic Characteristics
`Chair: Peter B. Danzig, University of Southern California
`
`On the Self-Similar Nature of Ethernet Traffic
`Will E. Leland (Bellcore), Murad S. Taqqu (Boston University),
`Walter Willinger and Daniel V. Wilson (Bellcore)
`
`Application of Sampling Methodologies to Network Traffic
`Characterization
`Kimberly C. Claffy, George C. Polyzos (University of California,
`San Diego) and Hans-Werner Braun (San Diego Supercomputer Center)
`
`Session 9: Scheduling and Management
`Chair: Greg Wetzel, ATBT Bell Laboratories
`
`ATM Scheduling with Queueing Delay Predictions
`Daniel B. Schwartz (GTE Laboratories)
`
`HAP: A New Model for Packet Arrivals
`Ying-Dar Jason Lin, Tzu-Chieh Tsai, San-Chiao Huang and
`Mario Gerla (University of California, Los Angeles)
`
`Management of Virtual Private Networks for lnte,grated
`Broadband Communication
`J. M. Schneider (IBM European Networking Center),
`T. PreuB (Technical University of Cottbus)
`and P. S. Nielsen (L. M. Ericsson A/S)
`
`Session 10: Networking Issues
`Chair: Lawrence H. Landweber, University of Wisconsin
`
`170
`
`183
`
`194
`
`205
`
`212
`
`224
`
`A Case for Caching File Objects Inside Internetworks
`Peter B. Danzig (University of Southern California), Richard S. Hall and
`Michael F. Schwartz (University of Colorado)
`
`239
`
`xi
`
`
`
`Linear Recursive Networks and Their Applications in
`Topological Design and Data Routing
`Hsu Wen Jing, Amitabha Das (Nanyang Technological University) and
`Moon Jung Chung (Michigan State University)
`
`The Importance of Non-Data Touching Processing Overheads
`in TCP/IP
`Jonathon Kay and Joseph Pasquale (University of California, San Diego)
`
`Session 11: Practical Protocols
`Chair: Biswanath Mukherjee} University of California} Davis
`
`A Distributed Queueing Random Access Protocol for a
`Broadcast Channel
`Wenxin Xu (Reuters Information Technology Inc.) and
`Graham Campbell (Illinois Institute of Technology)
`
`Fault Detection in an Ethernet Network Using Anomaly
`Signature Matching
`Frank Feather (Hewlett-Packard Company), Dan Siewiorek and
`Roy Maxion (Carnegie Mellon University)
`
`End-to-End Packet Delay and Loss Behavior in the Internet
`Jean-Chrysostome Bolot (INRIA)
`
`249
`
`259
`
`270
`
`279
`
`289
`
`xii
`
`
`
`Core Based Trees (CBT)
`An Architecture for Scalable Inter-Domain Multicast Routing
`
`Tony Ballardie*(University College London)
`e-mail: A.Ballardie@cs.ucl.ac.uk
`Paul Francist(Bellcore, N.J., U.S.A.)
`e-mail: francis@thumper. bellcore. com
`Jon Crowcroft (University College London)
`e-mail: J. Crowcrojt@cs. ucl. ac. uk
`
`Abstract
`
`One of the central problems in one-to-many wide-area
`communications is forming the delivery tree - the collec(cid:173)
`tion of nodes and links that a multicast packet traverses.
`Significant problems remain to be solved in the area of
`multicast tree formation, the problem of scaling being
`paramount among these.
`In this paper we show how the current IP multicast
`architecture scales poorly (by scale poorly, we mean con(cid:173)
`sume too much memory, bandwidth, or too many pro(cid:173)
`cessing resources), and subsequently present a multicast
`protocol based on a new scalable architecture that is
`low-cost, relatively simple, and efficient. 0 We also show
`how this architecture is decoupled from (though depen(cid:173)
`dent on) unicast routing, and is therefore easy to install
`in an internet that comprises multiple heterogeneous
`unicast routing algorithms.
`
`1
`
`Introduction
`
`Multicast group communcation is an increasingly im(cid:173)
`portant capability in many of today's data networks.
`Most LANs and more recent wide-area network tech(cid:173)
`nologies such as SMDS (12) and ATM [7) specify mul(cid:173)
`ticast as part of their service, but perhaps the most
`apparent and widespread growth in multicast applica(cid:173)
`tions is being experienced in the IP Internet. We can
`see evidence of this growth in the MBONE, the set of
`routers and networks with multicast capapility.
`In order
`to cater
`to a very
`large number of
`internetwork-wide multicast applications, examples of
`
`•Principal author
`f Previously published under the name Paul Tsuchiya
`
`Permission to copy without fee all or part of this material is
`granted provided that the copies are not made or distributed for
`direct commercial advantage, the ACM copyright notice and the
`title of the publication and its date appear, and notice is given
`that copying is by permission of the Association for Computing
`Machinery. To copy otherwise, or to republish, requires e fee
`and/or specific permission.
`S1GCOMM'93 - Ithaca, N.Y., USA /9/93
`© 1993 ACM 0-89791-619-0/93/0009/0085 ...
`
`which include audio and video conferencing [15], repli(cid:173)
`cated database updating and querying, software up(cid:173)
`date distribution, stock market information services,
`and more recently, resource discovery (11), it is impor(cid:173)
`tant that the multicast routing protocol used be first
`and foremost scalable with respect to a network of very
`large size, and low-cost in terms of computational over(cid:173)
`head and storage requirements - properties lacking in
`current IP multicasting techniques. The protocol should
`also be designed to operate "invisibly" across domain
`boundaries, i.e. independent of the underlying unicast
`routing algorithm, so that it can evolve independently.
`This paper describes a new multicast routing architec(cid:173)
`ture which is applicable to any datagram network whose
`switches have multicast forwarding capability. We will
`present a multicast routing protocol ( CBT) for IP net(cid:173)
`works based on this new architecture that not only sat(cid:173)
`isfies the above criteria; but is also relatively simple in
`design.
`·
`In the following section we discuss the existing mul(cid:173)
`ticast architecture. Section 3 describes the current IP
`multicast environm,ent and goes on to briefly describe
`two IP multicast routing protocols. Section 4 presents
`a comprehensive critique of the existing architecture
`showing how it is inherently non-scalable and bound to
`particular underlying unicast routing algorithms. This
`leads us to the ne~ architecture in section 5 followed by
`a description of a protocol built on this new architecture
`in section 6. Sections 7 and 8 offer some thoughts on
`future work and an overall summary, respectively.
`
`J
`
`2 Existing Multicast Architec(cid:173)
`ture
`
`The existing multicast architecture is not restricted to
`IP networks, but is being accepted as the solution to
`multicasting in many different kinds of networks and
`environments.
`For each ~ulticast group, the current architecture
`builds a shortest-path source-based delivery tree be-
`
`85
`
`
`
`tween each sender and the corresponding multicast re(cid:173)
`cipients. The multicast tree-building algorithms are
`tightly-coupled to particular unicast algorithms. At do(cid:173)
`main boundaries, where differing multicast algorithms
`may interface, various ad hoc means are used to estab(cid:173)
`lish the tree. This is further discussed in section 4.2.
`Routers on a multicast tree store (source, group) pair
`information.
`
`2.1 Existing Properties
`
`Several properties, originally conceived for the LAN
`multicast environment, and later extended as desirable
`properties for internetwork multicasting, include:
`
`• Host Group Model conformance. The Host Group
`Model is a multicast service model for datagram in(cid:173)
`ternetworks, developed in the mid-1980s by Deer(cid:173)
`ing [9]. It defines what the multicast service looks
`like to users of the internetwork service interface
`within a host; it does not define how that service is
`implemented. Further, it lists a set of properties a
`multicast routing protocol should exhibit, that con(cid:173)
`tribute to its flexibility and generality; for example,
`a sender to a group need not know the location or
`identities of any of the group members, and the
`sender itself need not be a member of the group.
`
`• High probability of delivery. The probability of suc(cid:173)
`cessful delivery of multicast packets decreases when
`sending those packets over the wide-area. However,
`the successful delivery rate should remain high
`enough to allow for the recovery of lost/damaged
`packets by end-to-end protocols [8].
`
`• Low delay. Low delay is an important property
`for many multicast applicat~ons, for example, au(cid:173)
`dio conferencing. LANs impose very little delay
`on the delivery of multicast packets, but the delays
`over the wide-area are, inevitably, higher due to the
`greater geographic extent, and the greater number
`of links and switches packets must traverse. There(cid:173)
`fore, optimizing multicast routes can be an impor(cid:173)
`tant factor in minimizing delay exacerbation.
`
`2.2 Proposed Properties
`
`With the advent of multicasting in an internet of ever in(cid:173)
`creasing size and heterogeneity (with respect to routing
`and addressing) we feel that the list of desirable proper(cid:173)
`ties a multicast routing algorithm should exhibit, should
`be extended to include:
`
`• Scalability. With the internet growing at its current
`rate, we can expect to see a large increase in the
`number of wide-area multicasts. These can vary
`considerably in their characteristics. Clearly, any
`routing algorithm/protocol that does not exhibit
`
`good scaling properties across the full range of
`plications will have both limited usefulness and
`restricted lifetime in the internet.
`
`Any multicast routing
`• Robustness.
`should include features that provide robustness
`in terms of maintaining/repairing connectivity
`tween group members.
`
`• Information hiding. Information hiding is an im(cid:173)
`portant aspect of scaling. Routers/bridges, whose
`subnetwork(s) have no members with respect to a.
`particular group, should not have to know any
`formation as to the existence of that group, even if
`they need to forward multicast packets.
`
`• Routing Algorithm independence. It is highly desir~
`able that a multicast routing algorithm be"'"'""'''"' ...
`independent of any unicast routing algorithm, re(cid:173)
`sulting in much simplified multicast tree formation
`across domain boundaries.
`
`• Multicast tree fie-r,ibility. Multicast applications
`vary considerably in nature, according to sender
`population, receiver population, traffic character(cid:173)
`istics, and membership dynamics. Three example
`multicast applications with such differing charac(cid:173)
`teristics are video-broadcasting, audio/video con(cid:173)
`ferencing, and resource discovery. Therefore, a mul(cid:173)
`ticast delivery tree should be built so as to reflect
`the nature of the application.
`
`3 Existing IP Multicast Algo(cid:173)
`rithms
`
`The essentiar aim of wide-area multicast routing is es(cid:173)
`tablishing a reasonably optimal path between a mul(cid:173)
`ticast source and the other members of the group. A
`multicast packet should only ever need to be replicated
`when a shared path diverges into disjoint paths. This
`has the result of incurring the least packet processing
`and forwarding overhead per multicast router in the
`path, and the least bandwidth consumption by mul(cid:173)
`ticast packets between the source and destination(s).
`To summarise, multicasting optimizes bandwidth con(cid:173)
`sumption and transmitter costs.
`In the following subsection we briefly describe some
`features of the current IP multicast environment. Sub(cid:173)
`sequent subsections outline current IP multicast algo(cid:173)
`rithms, namely the Distance-Vector Multicast Routing
`Protocol (DVMRP) and the Link-State Multicast Rout(cid:173)
`ing Protocol, repectively. A more comprehensive de(cid:173)
`scription of these algorithms can be found in [9] and
`[8].
`
`86
`
`
`
`3.1
`
`IP Multicast Environment
`
`Most broadcast LANs, such as Ethernet, FDDI, and
`ATM, intrinsically support multicast addressing. That
`is, most end-systems and routers on these types of LAN
`are able to distinguish multicast packets from other
`types of traffic by means of address type; IP supports
`class D addressing. A class D address is an address
`taken from a portion of the IP address space set aside
`for multicasting. Each class D address uniquely identi(cid:173)
`fies a single host group.
`Routers normally set their network interfaces to
`promiscuously receive all multicast packets, but a host
`only does so after a higher-layer application explicitly
`requests it to do so. A single router, called the member(cid:173)
`ship interrogator, or designated router, polls the LAN
`for host memberships at intervals, to which only group
`member hosts reply, once for each group. This group(cid:173)
`query / group-reporting mechanism is implemented on
`LANs in hosts and routers by means of the Internet
`Group Management Protocol (IGMP ).
`
`3.2 DVMRP
`
`DVMRP [5] is based primarily on Reverse Path For(cid:173)
`warding (RPF) - an algorithm devised by Dalal and
`Metcalfe [6] for internetwork broadcasting. DVMRP
`uses a modified RPF algorithm to allow members (and
`non-member senders) of a group to build a shortest-path
`sender-based multicast delivery tree. The first few 1 mul(cid:173)
`ticast packets transmitted from a source are truncated
`broadcast? throughout the internetwork.
`Once the first packet has reached those routers that
`have neither child subnets nor leaves with members on
`them, those routers are each responsible for sending .a
`special message called a "prune" back one hop on the
`reverse-path tree. If the one-hop-back router receives
`prune messages from all of its subordinate routers, AND
`if its child subnets also have no members of the desti(cid:173)
`nation group, it in turn sends a prune message back to
`its predecessor.
`In this way, information about the absence of group
`members propogates back up the tree towards the
`source along all branches that do not lead to group
`members. Subsequent packets from the same source to
`the same group are blocked from travelling down the un(cid:173)
`necessary branches by the routers at the heads of those
`branches.
`A mechanism was also designed for quickly "grafting"
`a pruned branch back onto a multicast tree; a router
`
`1 How many depends on how long it talccs the special "prune"
`message to reach the source.
`2 A broadcast to all subnetworks throughout the internet ex(cid:173)
`cept those which arc "leaf" subnets with respect to the source. A
`"leaf" subnet of the reverse-path tree for a particular source, S,
`is a child subnet that no other router uses to reach S. In the case
`of DVMRP multicast packets only reach "leaf" subnetworks that
`have at least one member.
`
`can cancel a previously sent "prune" message by send(cid:173)
`ing a "graft" message to the same router. The "graft"
`message is propogated as far as necessary to rejoin the
`sending router to the multicast tree.
`
`3.3 Link-State Multicast Routing Pro(cid:173)
`tocol
`
`The link-state routing algorithm was extended in [9] to
`support shortest-path multicast routing by simply hav(cid:173)
`ing routers include, as part of the "state" of a link, a list
`of groups that have members on that link. Whenever a
`new group appears or an old group disappears from a
`link, the designated router on that link floods the new
`state to all other routers in the internetwork. Given
`full knowledge of which groups have members on which
`links, any router can compute the shortest-path multi(cid:173)
`cast tree from any source to any group using Dijkstra's
`algorithm [1]. If the router doing the computation falls
`within the tree computed, it can determine which links
`it must use to forward copies of multicast packets from
`the given source to the given group.
`
`4 A Critique of the Existing
`Multicast Architecture
`
`In this section we present a critique of. the existing
`source-based multicast architecture in light of the fact
`that the internet is a fast-growing, enormously complex,
`heterogeneous struqture. In the not too distant future
`we expect to see huge numbers of multicast groups in
`existence at any one time.
`
`4.1 Scaling ; Characteristics of Source(cid:173)
`Based Trees
`
`Poor scaling properties are inherent in multicast routing
`algorithms that build source-based delivery trees. The
`multicast algorithms we have discussed store per source
`information, which, if Sis the number of active sources
`per multicast group, and N is the number of multicast
`groups present, results in a scaling factor of S x N. This
`has serious consequences for routers in terms of storage
`and packet forwarding overheads.
`DVMRP exhibits another scaling characteristic that
`is both interesting and alarming, namely, that routers
`not on a mu'lticast tree are "charged" for staying off
`it, i.e. those routers not interested in sending/receiving
`multicast packets to/from a group are involved in the
`reception, generation, and interpretation of prune and
`graft messages, and additionally the storage of prunes,
`per (source, group) pair.
`
`87
`
`
`
`4.2 Unicast Routing Algorithm Depen-
`dence
`
`The multicast routing algorithms we have so far pre(cid:173)
`sented are based on a flat internetwork consisting of one
`large autonomous system in which all routers are run(cid:173)
`ning the same multicast/unicast algorithms. In reality
`the internet is a complex, heterogeneous environment
`with ASs running internal routing protocols of their
`choice. Tight coupling between multicast and unicast
`algorithms complicates the development of unicast al(cid:173)
`gorithms, since they must be modified to take multi(cid:173)
`cast into consideration. This coupling also requires spe(cid:173)
`cialised solutions for multicasting between domains run(cid:173)
`ning different multicast algorithms3 • Indeed, such solu(cid:173)
`tions have yet to be developed for IP [3]. The MBONE
`encompasses only those networks and routers that have
`multicast forwarding capability and which are running
`the same multicast algorithm.
`
`4.3 More Algorithm Specifics
`
`A DVMRP router must invest a modest amount of pro(cid:173)
`cessing power to determine which of its attached sub(cid:173)
`nets are child and leaf subnets relative to a given source.
`This overhead is incurred whenever the router's distance
`or next-hop subnet for a given source changes, or when(cid:173)
`ever the distance to a given source reported by a router
`on an attached subnet changes. Therefore, the total
`overhead involved in determining child and leaf subnets
`depends on the stability or dynamicity of the internet.
`There are two implications involved in disemminat(cid:173)
`ing group information in link-state packets: firstly, link(cid:173)
`state packets are flooded throughout the internetwork
`by a LAN's designated router both as the result of nor(cid:173)
`mal topology changes, and group membership changes
`on any of its directly attached subnets; secondly, and
`more seriously, global group membership information
`is maintained by all routers, whether they form part
`of a multicast tree(s) or not. Whilst the bandwidth
`overhead due to more frequent generation of link state
`packets could be deemed as less significant as we see
`media bandwidths continually increasing, we consider
`more serious the overhead of having all routers in the
`internet store global group membership information as
`unacceptable.
`
`5 CBT - The New Architecture
`
`First of all, exactly what is a core-based tree (CBT)
`architecture? Core-based, or centre-based forwarding
`
`3 "Tunnelling" has been defined as a technique for transporting
`multicast packets between multicast-capable routers. A "tunnel"
`then, is a sequence of routers that do not have multicast capability.
`A "tunnel" is created using a technique based on loose source
`routing, encapsulation, or a combination of both.
`
`trees, were first described by Wall [14]. He used a sin,.
`gle centre-based forwarding tree to investigate low-delay
`broadcasting and selective-broadcasting. He noted: "we
`can't hope to minimize the delay for each broadcast if
`we use just one tree, but we may be able to do fairly ··
`well, and the simplicity of the scheme may well make'
`up for the fact that it is no longer optimal".
`A core-based tree then, involves having a single node,.·
`in our case a router (with additional routers for ro(cid:173)
`bustness ), known as the core of the tree, from which ··
`branches emmanate. These branches are made up of
`other routers, so-called non-core routers, which form
`a shortest path between a member-host's directly at(cid:173)
`tached router, and the core. A router at the end of
`a branch shall be known as a leaf router on the tree.
`Unlike Wall's trees, the core need not be topologically
`centred4 between the nodes on the tree, since multicasts
`vary in nature, and correspondingly, so can the form of
`a core-based tree.
`Why then, is a core-based tree (CBT) architecture
`so attractive compared with the source-based architec(cid:173)
`ture? The key architectural features which drive the
`CBT approach are listed below:
`
`• Scaling. This is the fundamental premise driving
`CBT. A core-based architecture allows us to signif(cid:173)
`icantly improve the overall scaling factor of S x N
`we have in the source-based tree architecture, to
`just N. This is the result of having just one mul(cid:173)
`ticast tree per group as opposed to one tree per
`( source, group) pair. Each router ,on the tree need
`only store incident link information per group (i.e.
`per tree) as opposed to incident link information
`per (source,, group) pair. This represents the min(cid:173)
`imum pos~ble any router need store with respect.
`to its membership of a particular group. Routers
`not on the tree require no knowledge of the tree
`whatsoever.
`
`• Tree creation. The formation of core-based trees
`is receiver-based, i.e. no router is involved in be(cid:173)
`coming part of a tree for a particular group unless
`that router is intent on becoming a member of that
`group ( or is on the path between a potential mem(cid:173)
`ber and the tree, in which case that router must5
`become part of the tree). This implies that a tree
`is not built from a sender - only one tree is ever
`created per group. This is of significant benefit
`to all routers on the shortest-path between a non(cid:173)
`receiver sender and the multicast tree, since they
`are incurred no tree-building overhead.
`
`• Unicast routing separation. Core-based tree forma(cid:173)
`tion and multicast packet flow are decoupled from,
`
`4 To find the topological centre of a dynamic network is NP(cid:173)
`complete
`5 A router has the option to refuse a request to become part of
`a multicast tree.
`
`88
`
`
`
`but take full advantage of, underlying unicast rout(cid:173)
`ing, irrespective of which underlying unicast algo(cid:173)
`rithm is operating. All of the multicast tree infor(cid:173)
`mation can be derived solely from a router's exist(cid:173)
`ing unicast forwarding tables, with no additional
`processing necessary. These factors result in the
`CBT architecture being as robust as the underly(cid:173)
`ing unicast routing algorithm - most of which are
`designed with robustness as a high priority.
`In this architecture we can identify two distinct
`routing phases which provide the architecture with
`its scalability: firstly, unicast routing is used to
`route multicast packets to a multicast tree, allow(cid:173)
`ing multicast groups and multicast packets to re(cid:173)
`main "invisible" to routers not on the tree. This is
`achieved by using the unicast address of the centre
`(core) of the multicast spanning tree in the destina(cid:173)
`tion field of multicast packets originating off-tree;
`secondly, once on the corresponding tree, multi(cid:173)
`cast packets span the tree based on the packet's
`group identifier, or group-id6 (similar to a class D
`IP address). We consider this two-phase routing
`approach an important advancement in multicast(cid:173)
`ing. It has only been possible as a result of recog(cid:173)
`nising the need for having just one tree per group.
`With respect to IP networks, CBT requires no par(cid:173)
`tition of the unicast address space.
`
`Incoming "multicast" packet
`containing CORE address
`and group-id.
`
`-----------/
`' ' y
`
`____ ._
`
`,
`
`/I
`
`•
`
`= CORE ROUTER
`
`- - - - - .,.. - Path taken by multicast packet
`
`0 = Non-core router
`
`A diagram showing a single-core CBT tree is shown
`in Figure 1.
`
`Figure 1: A single-core CBT tree
`
`5.1 Disadvantages of the CBT Architec(cid:173)
`ture
`
`The following weaknesses can be identified through hav(cid:173)
`ing one core-based multicast tree per group, namely:
`
`• Core placement and shortest-path trees. Core-based
`trees may not provide the most optimal paths be(cid:173)
`tween members of a group. This is especially true
`for small, localised groups that have a non-local
`core. A dynamic core placement mechanism (see
`section 7) should prevent this from happening. In
`general, however, we feel that manual "best guess"
`placement will be aceptable for most situations.
`
`• The Core as a Point of Failure. The most obvi(cid:173)
`ous point of vulnerability of a core-based tree is its
`core, whose failure can result in a tree becoming
`parti