`US 6,839,321 B1
`(10) Patent N0.:
`Chiruvolu
`
`(45) Date of Patent: Jan. 4, 2005
`
`USOO6839321B1
`
`Fair Bandwidth Sharing Among Adaptive and Non—Adap-
`tive Flows in the Internet; Farooq M. Anjum , et al.
`XP—000878257; pp. 1412—1420.
`
`The FB—Red Algorithm for TCP over ATM; Wood—June
`Kim, et al; XP—000894360; pp. 551—555.
`
`Supporting Differentiated Services Using ATM ABR Ser-
`vice; Richard Rabbat, et al.; pp. 210—213.
`
`Service Differentiation Through End—to End Rate Control in
`Low Bandwidth Wireless Packet Networks; Thyagarajan
`Nandagopal, et al.; pp. 211—220.
`
`A Simple Packet Scheduling and Buffer Management
`Scheme for Scalable Support of QoS in the Internet; KJ Loh,
`et al.; pp. 276—281.
`
`Randomized Token Buckets: Reducing the Buffers Required
`in Multiplexors, J. Andrew Fingerhut, et al., IEEE 1997, pp.
`215—219.
`
`Primary Examiner—Douglas Olms
`Assistant Examiner—Van Kim T. Nguyen
`(74) Attorney, Agent, or Firm—Sughrue, Mion, PLLC;
`Jessica W. Smith; V. Lawrence Sewell
`
`(57)
`
`ABSTRACT
`
`The Domain-based Congestion Management method and
`apparatus detects and regulates congestion in a Diff-serv
`network. It use an improvRED method for congestion detec-
`tion at the core routers and token bucket filters for traffic
`
`regulation at the ingress nodes. In addition, improvRED also
`provides feedback control. ImprovRED uses three thresh-
`olds for detecting congestion: a minth, a maxth and a
`FeedbackThreshold, which lakes a value between the minth
`and the maxth thresholds. Whenever the average queue size
`is greater than minth and less than Feedback-Threshold, all
`outgoing packets are marked appropriately to indicate a
`potential onset of a congestion period. When the average
`queue size is greater th FeedbackThreshold (but less than
`maxth) packets are dropped probabilistically and all the
`outgoing packets are marked appropriately to denote the
`dropping phase. When the average queue size is greater than
`the maximum threshold, all incoming packets are dropped.
`
`(54) DOMAIN BASED CONGESTION
`MANAGEMENT
`
`Inventor:
`
`(75)
`
`Girish Vsr Chiruvolu, Richardson, TX
`(US)
`
`(73)
`
`Assignee: Alcatel, Paris (FR)
`
`(*)
`
`Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 918 days.
`
`(21)
`
`(22)
`
`(51)
`(52)
`(58)
`
`(56)
`
`Appl. No.: 09/618,196
`
`Filed:
`
`Jul. 18, 2000
`
`Int. Cl.7 ................................................ H04L 12/26
`US. Cl.
`.................. 370/230.1; 370/231; 370/235.1
`Field of Search ................................. 370/229—236,
`370/400, 401, 410, 412—416, 352—354;
`710/52—57
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`4,736,363 A
`4,901,277 A
`5,088,032 A
`
`4/1988 Aubin et al.
`2/1990 Soloway et al.
`2/1992 Bosack
`
`(List continued on next page.)
`FOREIGN PATENT DOCUMENTS
`
`DE
`EP
`W0
`
`43 16 872 A1
`0 503 469 A2
`WO 00/41365
`
`11/1994
`9/1992
`7/2000
`
`OTHER PUBLICATIONS
`
`F.M. Anjum et al. Fair Bandwidth SharingAmongAdaptive
`and Non—Adaptive Flows in the Internet, Proceedings IEEE
`Infocom 1999. The Conference on Computer Communica-
`tions, 18th Annual Joint Conference of the IEEE Computer
`and Communications Societies, New York, New York, Mar.
`21—25, 1999, Proceedings IEEE Infocom, The Computer
`Communica, vol. 3, Mar. 12, 1999, pp. 1412—1420.
`W—J Kim et al., The FB—RedAlgorithm for TCP overATM,
`IEEE Globecom 1998, Globecom 1998, The Bridge to
`Global Integration, Sydney, Nov. 8—12, 1998, IEEE Global
`Telecommunications Conference, New York, New York,
`IEEE, US, vol. 1, Nov. 8, 1998, pp. 551—555.
`
`1 100
`'
`
`1 --* ROUNDTRIPESTtMAIION
`BETWEENAGNEN PAIROF
`INGRESSAND EGRESS NODES.
`
`42 Claims, 16 Drawing Sheets
`
`IHE LOCAL
`CONGESIIONCLEARANCE
`NOIIFICANONOCCURS HERE
`
`THE LOCAL
`DONGESTION NOTIFICATION
`OCCURS HERE
`
`
`
`
`
`PACKEIWEIGHT
`
`
`
`
`
`
`AVERAGEMUESIZEANIEIIGZESSTBF
`
`
`
`THRESHOLDVALUE
`OFAVE. QUEUE SIZE
`
`VARYING 0F PKT_WT WITH DEMAND AND LON MESSAGES
`
`EX1014
`
`Palo Alto Networks v. Sable Networks
`
`IPR2020-01712
`
`EX1014
`Palo Alto Networks v. Sable Networks
`IPR2020-01712
`
`
`
`US 6,839,321 B1
`
`Page 2
`
`US. PATENT DOCUMENTS
`
`3/1993 Kramér 6t a1~
`591917650 A
`13/133:
`Jséhneldelr
`5,377,327 2
`/
`am eta‘
`7
`7
`4/1995 Gould et a1.
`5,404,565 A
`.
`4/1995 Yoshlzawa et a1.
`5,408,562 A
`.
`6/1995 Nemlrovsky et a1.
`5,426,674 A
`7/1995 Rahnema
`5 430 729 A
`5/1996 Iki
`5’521’972 A
`“1997 Rahnema
`5,596,722 A
`7/1997 Makishima
`5,644,713 A
`2/1998 Ikeda
`5,719,853 A
`’
`’
`2/1999 Jensen et a1.
`5,870,564 A
`3/1999 Corbin
`5 881 241 A
`.
`’
`’
`3/1999 Tephtsky
`5,884,043 A
`.............. 370/230
`6/1999 Hatono et a1.
`5,914,936 A
`6/1999 Attanasio et a1.
`5,918,017 A
`11/1999 Civanlar et 211.
`5,996,021 A
`6,147,970 A * 11/2000 Troxel
`........................ 370/235
`
`*
`
`.................... 370/229
`6/2001 Skirmont
`6,252,848 B1 *
`6,333,917 B1 * 12/2001 Lyon et a1.
`................. 370/412
`6,404,735 B1 *
`6/2002 Beshai et a1.
`............... 370/230
`6,424,624 B1 *
`7/2002 Galand et a1.
`.............. 370/231
`6,487,170 B1 * 11/2002 Chen et a1.
`................. 370/231
`.
`.
`6 510 160 B1 *
`1/2003 N1ku1e et a1.
`370/412
`
`’
`’
`.
`.
`3/2003 Hadl Sahm et a1.
`........ 370/229
`6 535 482 B1 *
`’
`’
`*
`.
`....... 370/412
`6,556,578 B1
`4/2003 Sllberschatz et a1.
`...... 370/230
`6,614,756 B1 *
`9/2003 Morgenstern et a1.
`6,618,378 B1 *
`9/2003 Giroux et a1.
`........... 370/395.1
`6,643,260 B1 * 11/2003 Kloth et a1.
`................ 370/235
`.. 370/235
`6,646,988 B1 * 11/2003 Nandy et a1.
`
`6 657 962 B1 * 12/2003 Barri et al.
`370/235
`’
`’
`*
`.
`1/2004 McCloghne et a1.
`....... 709/232
`6,675,220 B1
`1/2004 Nguyen ................
`6,680,906 B1 *
`370/229
`
`6,680,907 B1 *
`1/2004 Bonaventure
`370/229
`6690 645 B1 *
`2/2004 A
`t
`1
`370/230
`weya e a‘ ““““““““
`7
`7
`
`* cited by examiner
`
`
`
`US. Patent
`
`Jan. 4, 2005
`
`Sheet 1 0f 16
`
`US 6,839,321 B1
`
`
`I
`,- US Domain core
`‘
`Network
`¢’\
`
`
`
`
`
`Neighboring
`DS Domain
`
`Host 1
`
`Fig. 1
`
`DIFF—SERV DOMAIN
`
`(PRIOR ART)
`
`
`
`US. Patent
`
`m
`
`nu
`
`nu
`
`CU
`
`0
`
`6
`
`US 6,839,321 B1
`
`2_wWEuxmevmimmmumpwémggmwEonmmfiiumnnmmmH“M
`22AmAnflosmeSBxumnummmvmNfimwsmswmmmum>mvgucfieHWM
`
`muoqa\\\lv.omm>9wmofiomcmm>gflaflnmnonmnuguflsumxummmnuwsmsmcmHogout
`
`nuoHH\\\v.uoxomQmg»mzwzucmAcumwsvmNHmmsmsvmmmum>mv“a
`
`
`.3omH1\\1Luvxummmnumsosvcw
`>Hm>mco
`uo:mummuanunu“fl.ao.HVcu“:mamsvmumxommmcfiomuzoHamHo“Amufln.fiuwnwmafiaon“gums
`
`
`
`
`
`
`OOH\\|v.mmmwm>mumuzmfim3mcfl>osHmHqucoaxwcomemnmmwmmsmswmmmum>mmg“mamasufimo
`
`
`
`
`
`
`8H\\v:.:£3.39339898:mEafiié332:is.
`
`
`omH.\\Ymumxomm95:8505megEuxmsA«NS2.26mmmum>23
`
`
`
`
`
`Hm>fluumumxomm
`
`N.OE
`
`
`
`
`
`mmoozmmooEErbmowi0mmm=.:.9.sz:.<oE_ooz
`
`cm¢\\!v;H_Hvmm“mm>stofl>mum
`
`
`
`US. Patent
`
`Jan. 4, 2005
`
`Sheet 3 0f 16
`
`US 6,839,321 B1
`
`
`
`238SEmE8:88cozmmmcoo:558398:82oz
`
`
`
`
`
`
`
`
`
`8sz$2“9888:562:5885398.83
`
` @223$2“9685com8580
`
`cozmmmcou_moo..
`
`muo:mmemmm5509.55::
`
`59:8252a:5ow838%cofiomcoooz
`
`IE
`
`m.GE
`
`
`
`
`
`
`
`zO_._.wm®zooz_<s_oo..<oO..02:.zmwwmmmmmo”.msmIom._._m-O>>._.w._n=>__m<
`
`
`
`
`
`
`
`
`
`
`
`
`US. Patent
`
`Jan. 4, 2005
`
`Sheet 4 0f 16
`
`US 6,839,321 B1
`
`.Em20m#3040
`
`._._mzo._
`
`
`
`omwzz:>._._.mem:o
`
`
`
`m:.>mgown—$9.mIk
`
`vGE
`
`
`
`
`
`
`
`Allhz_on_mooo>mmmnr=oIIIIIIY
`
`
`
`
`
`US. Patent
`
`Jan. 4, 2005
`
`Sheet 5 0f 16
`
`US 6,839,321 B1
`
` TOKEN BUCKET FILTER CONNECTED TO DIFF-SERV DOMAIN
`
`FIG. 5A
`
`TBF
`
`a JR
`
`COMPONENTS OF ATOKEN BUCKET FILTER
`
`FIG. SB
`
`
`
`US. Patent
`
`Jan. 4, 2005
`
`Sheet 6 0f 16
`
`US 6,839,321 B1
`
`DATA PACKET
`
`200
`
`
`
`
`
`
`
`
`
`VARY THE NUMBER OF TOKENS
`
`CONSUMED BY DATA 0F UNIT SIZE.
`PktWtJ, BASED ON DEMAND AND
`STATE.
`
`
`
`
`
`
`iS
`.
`
`Pkt_size«Pktth <=
`
`
`
`AVATLABLE TOKENS IN BUCKET.
`
`IN BUCKET?
`
`
`
`
`
`
`DON'T TRANSMIT
`
`PACKET
`
`
`
`
`
`
`
`TRANSMIT PACKET ‘
`
`FLOWCHART FOR TBF-BASED RATE CONTROL METHOD
`
`FIG. 6A
`
`
`
`US. Patent
`
`Jan. 4, 2005
`
`Sheet 7 0f 16
`
`US 6,839,321 B1
`
`,
`pktth<—-1.o
`Initialize:
`PktWtJ is always within [minPktW’tJ,makatWt3]
`MD is a monotonously decreasing function that takes a value (0,1)
`MI is a monotonously increasing function that takes a positive value
`j denotes the label corresponding to fixed route between a given pair
`of ingress/egress nodes
`
`for every 1th round trip time (between ingress and egress nodes)
`
`During congestion-free periods
`
`if(average TBF queue size at ingress node B DemandThrshj)
`
`pktmsz «Hamill
`
`* MDkatWtil ) J230
`
`/* decrease the PktWtj during congestion free periods, based on demand
`at TBF */
`
`,
`else {
`if (pktwgll
`
`’
`_
`'
`> 1) PktWtij41— max[l,Pktth)._1* mmpkcwcllln
`
`if (17mm;1 < 1) 19km} «—min[1,Pcht§‘1* MI(PktWt{_1)}}
`/* restore PktWtJ close to 1.0 *I
`
`At congestion notification time
`
`)
`- 1H1 - pktml
`(makatWtj
`-
`mm; 4... ______—____L +
`(1 - minPktWtJ)
`
`1
`
`‘
`
`~-
`if Pktth.’_1< 1.
`
`/* The smaller the PktWtj just before LCN,
`congestion period.
`A uniform mapping of
`(1, makatWtJ)
`intervals */</‘250
`
`the bigger it will be during
`[minPktWtJ, 1) on to
`
`* MD(PktWtf_l )«f220
`
`During congestion period
`
`Pchtij<—PktWtij‘1 * MI(PktWt3_l)
`
`if Pktth-Vl
`
`f “/2110
`
`On receipt of congestion clearance notification
`
`Select a random time less than RTT and,
`
`PkCWCg<—Pchtij_l
`
`THE TBF—BASED CONGESTION MANAGEMENT ALGORITHM AT INGRESS NODES
`
`FIG. BB
`
`
`
`US. Patent
`
`Jan. 4, 2005
`
`Sheet 8 0f 16
`
`US 6,839,321 B1
`
`N.0;
`
`
`
`
`
`wmo<mwm§20..oz<ozSzmoI._._>>Sign.u_OOz_>~_<>
`
`IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII
`
`
`
`mummane.w><n5
`
`
`
`m3<>Soxmmmxb
`
`IIIIIIIIIIIIIII
`
`_.__---_ _:----.----
`
`IIllIII|IIIIIDIIIDIIlIIIIIIII
`
`uj
`E'—
`
`
`
`
`
`MEI958020:49:52
`
`
`
`52558zogmozoo
`
`.203m5
`
`
`
`20:57:52295528
`
`
`
`mmmz$580
`
`.209m1»
`
`
`
`.meOzmmmmomoz<mmwmoz
`
`.5$5$204.255mm
`
`
`
`zoEEEmm$5.QZDOxAllF
`
`oer
`4
`
`iHOlSM lBMOVd
`---------—--____->
`
`531 SSEIHEJNI 3H1 1V HZIS 3fl3fl0 HSWBAV
`
`
`
`US. Patent
`
`Jan.4,2005
`
`Sheet9 0f16
`
`US 6,839,321 B1
`
`S>§mx<2S>§Qz=2
`
`
`
`
`
`$58”:8<2E5BEES$838“moozwas$520.55on
`
`—__—_m._......-.fm.-__
`
`
`
`wo_s_<z>oExa“.0Eéofiom._.<._.m
`
`m.QE
`
`
`
`
`
`
`
`
`moozmmmmozEozéfiooz_>m<>2:3ozoé
`
`
`
`US. Patent
`
`Jan. 4, 2005
`
`Sheet 10 0f 16
`
`US 6,839,321 B1
`
`
`
`8mmn2N
`
`322N
`
`am:
`
`nSEw20:53—25th
`
`m.OE
`
`
`
`US. Patent
`
`M024,
`
`whS
`
`0
`
`US 6,839,321 B1
`
`uE:
`m“N82
`FJ33ca2_Se2958mmmagmaEmmEmS
`vagaw202
`
`53
`
`8mm:
`
`832
`
`Nummd
`
`38K
`
`among
`
`82..3
`
`83$
`
`2‘GE
`
`mzmxom200ommOmommm1...“.0moz<2m0mmmm
`
`$9;
`
`mmmmozE
`#50E
`
`$2.
`
`882
`
`mmoop<
`
`8$82
`
`
`
`
`
`Eon2:3Hzm2m>omm§._._<mw>owmg._._<mm>o
`
`msmxom28
`
`
`
`$0..55%tos.
`
`wsmzom
`
`£35<0.909am
`
`
`
`US. Patent
`
`Jan. 4, 2005
`
`Sheet 12 0f 16
`
`US 6,839,321 B1
`
`
`
`
`
`$3mmmmoz3828meEmamo§m><2953::
`
`
`
`
`
`525.
`
`
`
`.mnmvwmd
`
`mRBO.e
`
`mmmmsmé
`
`ommomm._‘
`
`Emmmm.P
`
`
`
`mémxom560m5.m0moz<§m0mmm¢><._mo
`
`3.GE
`
`
`
`
`
`
`
`
`
`
`US. Patent
`
`M
`
`US 6,839,321 B1
`
`49..Am88
`
`
`o_Mm._m.DoomNJ
`
`83w.
`
`m88
`_$88_.3_
`
`189
`6nmN2Vwe8283888:8:898288
`
`EE689m2:
`
`82Ts_ooomP:.fi
`
`oom—o_cm
`8—.
`
`:18]. SSSUSNI NV 139’ EZIS 30300 EN!
`
`
`
`mm?.9“.<3.OE
`
`
`
`
`
`3u4.5_29521555%moozmmmmozz<2Hamaze.m><
`
`
`
`US. Patent
`
`Jan. 4, 2005
`
`Sheet 14 0f 16
`
`US 6,839,321 B1
`
`m
`N
`
`
`avomcmor
`
`$892958mg:8%am:em
`
`
`
`
`
`m5nzo_H<N:_.5._.<mimzow200-202
`
`
`
`
`
`mm<Immomm—._.mx0<n_“.029.59me5
`
`
`
`mwoozmmoomIH.5.ZOEEDQ
`
`
`
`
`
`msmrow200-202TE>>
`
`
`
`mm?.OI
`
`88
`
`
`
`
`
`60me29559mmén.logo
`
`
`
`
`
`uwdnzo_._.<N_._F3._.<mEmIOm200-202
`
`mm<zmmomo._.mv_0<n.m029.26;;me
`mmoozmmooME..2zo:.<mDn_
`
`mémzow200-202It;
`
`
`
`<9.9...
`
`.mo:
`
`o?
`
`..om—
`
`C)
`co5
`$3
`
`.
`
`AONBDDBEH
`
`com
`
`8v
`
`AONBHOEHd
`
`
`
`US. Patent
`
`Jan. 4, 2005
`
`Sheet 15 0f 16
`
`US 6,839,321 B1
`
`m.¢
`
`
`
`
`
`aowg20.5.8mm<E“.95”28v3.m3N2?
`
`
`
`mdnzOF<N._.E:2£51an.200
`
`
`
`
`
`mw<xmmomoEva/EmozQSmEhwE
`
`$002$50$2..2zo_._.<m:o
`
`mEmIOm200I._._>>
`
`m3.OE
`
`AONEHDEHJ
`
`md
`
`m4.
`
` <3.OE
`“6%”28¢3m2N2v3
`
`mm<ImmomoExo/EmozO_._.:m_me_o
`$520.58mg:
`
`uwduzo_._.<N_.__.S._.<mzmxom560
`meOzmmooME.H<20.2130
`
`méwrom2001.55
`
`O
`
`OOt
`
`r)
`
`0OO‘
`
`—
`
`O0L
`
`0‘—
`
`DDON
`
`coon,
`
`8mm
`
`AONSHOEEH
`
`
`
`
`US. Patent
`
`Jan. 4, 2005
`
`Sheet 16 0f 16
`
`US 6,839,321 B1
`
`
`
`mflizaxion 0‘5
`ulaluafim 0.6
`Muslim 0.7 ~
`ulilizniion 0‘6
`J::i:za:i::r. (is?
`utmmmu
`1113315581 1.5
`2111413331 I 2
`
`.
`
`.
`
`f
`_ ,,.
`
`~- -~
`
`‘0
`
`35
`30
`
`25
`
`20
`I5
`
`10
`
`5 0
`
`utilization 0.5
`ulllizaltc-n 0.5
`utilization 0.7 A
`utili
`Ion 0.5
`“
`'m "- f1
`j
`j:
`.
`uullzanon 1
`utylgxazgon
`.
`
`-
`
`“-114.
`
`'
`
`a
`a)
`3
`
`C 9
`
`I!
`§
`
`34
`a
`2
`
`5 §
`
`a
`
`30
`
`40
`
`50
`
`70
`60
`domain-PITT (ms)
`
`80
`
`90
`
`100
`
`30
`
`40
`
`50
`
`70 -
`60
`domain-HIT (ms)
`
`80
`
`90
`
`100
`
`(3) packet loss % at core nodes;
`
`(b) Total packet loss in thc syslcm (core +TBF)
`
`PERFORMANCE OF THE DCM SCHEME WITH DOMAIN<RTT VARIATION
`
`Fig. 15
`
`
`
`US 6,839,321 B1
`
`1
`DOMAIN BASED CONGESTION
`MANAGEMENT
`
`FIELD OF INVENTION
`
`This invention is related to the field of congestion man-
`agement schemes for controlling the flow of packets over the
`Internet.
`
`BACKGROUND OF INVENTION
`
`Potential congestion periods can occur for a number of
`possible reasons such as i) burstiness that is inherent in
`nodes and generated due to statistical multiplexing at nodes
`along a given path; and ii) non-adaptive greedy applications
`that may often cause (potential) congestion periods leading
`to severe packet loss conditions which affect other sessions
`that share network resources. Congestion avoidance and
`management schemes are essential for a better utilization of
`network resources.
`
`Generally, congestion control schemes have two phases
`viz. i) early congestion detection and avoidance; and ii) a
`congestion management scheme that begins to operate once
`a congestion period occurs. Several congestion management
`schemes have been proposed so far. For example, binary
`feedback-based congestion management schemes rely on
`end sources to react to the congestion messages. Similarly,
`the current Internet relies on end-to-end congestion control
`mechanisms through either packet dropping or explicit con-
`gestion notification (ECN) by marking packets of a session.
`However, the end-to-end reaction to congestion is critically
`dependent on round trip time (RTT) between the end hosts.
`Explicit rate management algorithms have also been
`proposed in the context of ATM. However, the explicit rate
`notification schemes that indicate the rates to which the
`
`individual traffic sources have to adapt are too complex to be
`implemented and require extensive processing at the core
`switches (or routers). They also need to generate periodic
`resource management (RM) cells that cause extra traffic.
`Furthermore, these schemes, in particular, require per-flow
`state maintenance that cannot be tailored easily to suit the
`heterogeneous Internet.
`Core state-less fair queuing (CSFQ) has been proposed in
`order to eliminate the book keeping of each flow state at core
`routers. However, the key focus of CSFQ is on achieving fair
`bandwidth allocation. It relies on end hosts (traffic sources)
`to detect available bandwidth or congestion at the bottleneck
`nodes. The long round trip time between a given pair of
`source and destination nodes can lead to late reaction by the
`sources to the congestion notification. As a result, CSFQ
`may not be adequate in reducing packet loss.
`Differentiated services (Diff-serv) over the Internet Pro-
`tocol (IP) have been proposed to avoid maintaining state
`information of large number of flows at the core routers. In
`addition, Diff-serv moves the complexity of per-flow band-
`width management to intelligent edge routers. A Diff-serv
`cloud comprises i) a set of edge nodes known as ingress or
`egress nodes depending on the traffic flow direction that may
`maintain per-flow state and ii) a set of core nodes that do not
`maintain per-flow state information and carry a large number
`of aggregated flows (see FIG. 1).
`Overview of Diff-serv architecture
`
`The crux of Differentiated Services (DS) is that packets
`get different levels of service based on Type of Service
`(TOS) bits. These include i) traffic policing that leads to
`marking of the packets that are out of profile (violation of
`some traffic parameter as specified, e.g., peak-rate);
`ii)
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`packet dropping and buffering strategies at various routers,
`also known as Per-Hop-Behaviors (PHBs); and iii) choice of
`an appropriate queue that maps to the type of service that
`was chosen by the application as indicated by the TOS bits.
`The flow or flow-aggregate information is maintained only
`at a few selected routers, such as edge routers. Thus,
`per-low/aggregate monitoring is avoided at core routers.
`The PHBs that run on core routers can be adaptively tuned
`to compensate for the loose admission control at the edges
`where traffic of various classes are injected in to the network
`with a goal of predictable QoS. However, best-effort service
`still constitutes a considerable amount of net traffic. The
`
`allocation of the bandwidth available for best effort depends
`on the policy of individual Internet Service Providers (ISPs)
`and the service level agreements with other neighboring DS
`domains.
`
`Currently there are two classes of services defined in the
`context of Diff-serv viz.: i) the Assured service (AS) and ii)
`Premium service (PS). They are respectively mapped onto
`Assured forwarding (AF) and Expedited forwarding (EF)
`per-hop-behaviors (PHBs). The AF PHB forwards packets
`according to their priorities. Thus,
`in the event of
`congestion, high priority packets receive better service than
`low priority packets. The EF PHB aims at reducing the
`queuing delays and emulates a leased line service from the
`end user’s perspective.
`Nevertheless, congestion management schemes are essen-
`tial for good network utilization, even with priority-based
`packet handling schemes. Potential congestion periods can
`arise and it is difficult to assess the available bandwidth
`unless the core routers are enhanced with robust resource
`
`management schemes. Thus, each of the ingress nodes
`(unaware of an onset congestion period) can potentially
`inject more traffic into the core network of a Diff-serv
`domain. ECN has been proposed, however,
`the ECN
`requires end-hosts to interact and respond to the congestion
`notification.
`Red
`
`Active queue management algorithms, such as Random
`Early Detection (RED), can be employed in order to detect
`and avoid any potential network collapse due to congestion.
`Congestion detection can be based on buffer monitoring by
`setting a threshold value for buffer occupancy. However,
`simple buffer occupancy-based techniques may not be suf-
`ficient to handle bursty traffic because bursty traffic may
`temporarily lead to a buffer occupancy greater than the
`threshold value. This leads to frequent congestion
`avoidance/management triggering mechanisms. In contrast
`to simple buffer monitoring, the RED algorithm calculates
`an average queue size by using a low-pass filter with an
`exponential weighted moving average (EWMA). With a
`constant wq (0<wq<1), with the arrival of nth packet, the
`average queue size is given as follows
`
`angsizen=(1—wq).angsizem1+wq.currentQ5ize,,
`
`(1)
`
`The allowed range of the average queue size before
`packets are dropped determines the allowed burst sizes.
`Thus RED can accommodate traffic bursts unlike drop-tail
`FIFO-based queue thresholds, as the average queue size
`does not solely depend on the current queue size.
`RED employs two queue thresholds,
`i.e., minth and
`maxth. Whenever the average queue is between the minth
`threshold value and the maxth threshold, the RED algorithm
`drops (or marks) packets randomly with certain probability
`Pdrop indicating an incipient congestion. If the average queue
`size exceeds the maxth, it drops all the packets until the
`average queue size falls below the maxth threshold. The
`
`
`
`US 6,839,321 B1
`
`3
`probability of dropping is a function of average queue size
`and is given by
`
`P... = m-
`d p
`
`(angsize — minth)
`.
`(maxth—mmth)
`
`(2)
`
`where Pmax is the maximum probability of a packet drop. It
`is shown that
`the average queue size is substantially
`decreased with random packet dropping. This mitigates the
`tail-dropping effects and the associated synchronization of
`various TCP (application) back-offs (reduction in traffic
`transmission rate).
`SUMMARY OF THE INVENTION
`
`The present invention is a method and apparatus that uses
`thresholds for regulating congestion.
`It deterministically
`marks outgoing packets by setting a LCN bit(s) when an
`average queue size of a token bucket filter is between a
`minimum threshold and a feedback threshold. In addition, it
`probabilistically drops incoming packets and marks all out-
`going packets when the average queue size is between a
`feedback threshold and a maximum threshold. Finally, all
`incoming packets are dropped when the average queue size
`equals or exceeds said maximum threshold.
`In another preferred embodiment, the present invention is
`an apparatus and method for regulating traffic flow in a
`differentiated services network between nodes. First a core
`
`node detects congestion. Next, and egress node sends a
`congestion feedback notification message to at least one
`ingress node. In response, the ingress node reduces its traffic
`rate in proportion to the amount of traffic that it was injecting
`into the network when congestion was detected.
`In still another preferred embodiment, the present inven-
`tion comprises a method and apparatus for regulating the
`traffic rate at an ingress node by varying the number of
`tokens consumed by a data packet and transmitting the data
`packet if the number of tokens consumed by the packet is
`less than the available tokens.
`
`In still another preferred embodiment, the present inven-
`tion is an apparatus for controlling traffic flow in a differ-
`entiated services domain. It is comprised of a plurality of
`three types of nodes,
`ingress, egress and core nodes.
`Furthermore, each ingress node has a corresponding token
`bucket filter which is used to regulate the flow of data
`packets. The token bucket filter is comprised of a token
`generator and a bucket to hold at least one generated token.
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 contains the architecture of a Diff-serv domain.
`
`FIG. 2 illustrates the improvRED method.
`FIG. 3 illustrates a simple 2-bit scheme used to indicate
`the onset of a congestion period at the core nodes.
`FIG. 4 is the definition of DSCP byte.
`FIG. 5(a) illustrates a token bucket filter connected to a
`Diff-serv domain.
`
`FIG. 5(b) illustrates the components of a token bucket
`filter.
`
`FIG. 6(a) is a flowchart for the Token Bucket Filter-based
`rate control method.
`
`FIG. 6(b) illustrates a Token Bucket Filter-based conges-
`tion management method used with ingress nodes.
`FIG. 7 illustrates the varying of packet weight with
`demand and LCN messages.
`FIG. 8 depicts a possible discrete state implementation of
`the algorithm in FIG. 7.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`FIG. 9 depicts a simulation setup.
`FIG. 10 shows the performance of the DCM method vs.
`non-feedback based congestion control.
`FIG. 11 illustrates the delay performance of the DCM
`method.
`
`FIGS. 12(a) and 12(b) depicts a sample average queue
`size and the distribution of packet weight at an ingress Token
`Bucket Filter for a utilization factor of 0.8.
`
`FIGS. 13(a) and 13(b) illustrate the distribution of con-
`gestion periods for a non-DCM method at the core node.
`FIGS. 14(a) and 14(b) illustrate the drop phase duration
`for the DCM method for the utilization factors 0.8 and 0.9
`
`respectively.
`FIG. 15 illustrates the performance of the DCM method
`with domain-RTT variation.
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`invention is a feedback-based congestion
`The present
`control for a Diff-serv domain called Domain-based Con-
`
`gestion Management (DCM). One improvement it has over
`existing congestion control schemes is the advantage of
`shorter RTTs between a given pair of ingress/egress nodes of
`a Diff-serv domain. This is in contrast to the long end-to-end
`RTTs of existing congestion control schemes that invariably
`result in large latency of reaction to transient congestion
`periods. In addition, the present invention is not complex
`and requires no flow state maintenance at the core routers.
`Therefore, it can react quickly to transient congestion peri-
`ods that occur locally within a Diff-serv cloud. Furthermore,
`shorter RTTs between a given pair of ingress/egress nodes
`can lead to faster detection and better utilization of the
`transient available bandwidth in the core Diff-serv domain.
`
`The present invention improves upon the Random Early
`Detection (RED) and explicit congestion notification
`mechanisms to handle best-effort traffic. The DCM scheme
`
`to the RED scheme
`is based on an improvement
`(improvRED) of low complexity running on the core routers
`and an adaptive Token Bucket Filter (TBF)-based traffic
`regulation at the ingress nodes. It is a method and apparatus
`that allows all
`the ingress nodes (I) to share available
`bandwidth and adjust their rates of traffic injection such that
`the average congestion periods are reduced inside the Diff-
`serv domain. This leads to an overall
`improvement
`in
`utilization.
`
`The DCM scheme is a distributed congestion control
`algorithm that runs on each of the ingress nodes. On the one
`hand, it helps the ingress nodes to avoid packet loss during
`congestion periods and, on the other hand, detects available
`bandwidth during congestion-free periods. In addition, the
`RED mechanism is improved to distinguish between an
`onset of a likely congestion period (marking phase) and a
`persistent congestion period that invariably incurs packet
`drops (dropping phase).
`Feedback, in the form of a Local Congestion Notification
`(LCN) message (or message), is used to notify the ingress
`(or input) nodes (I) of a likely onset of a congestion (free)
`period. (In a preferred embodiment, the Local Congestion
`Notification message is an LCN bit(s) set in a data packet).
`In addition, an associated feedback control loop is intro-
`duced into the DCM scheme. Upon detecting the LCN bit set
`by any of the congested core nodes (C), egress (or output)
`nodes (E) shall report to corresponding ingress nodes (I)
`about the onset of a congestion period. The ingress nodes, as
`a result, shall respond to LCN messages. The LCN messages
`
`
`
`US 6,839,321 B1
`
`5
`are used to indicate the congestion state at the relevant core
`routers, based on the average queue size at their TBFs.
`The average queue size at the end of an adaptation interval
`(set to RTT) associated with a given pair of ingress/egress
`nodes (I/E) is used as a measure of demand for network
`bandwidth at the underlying ingress nodes (I). However,
`traffic rates associated with each ingress node (I) shall be
`varied in proportion to the amount of traffic each ingress
`node (I) is injecting at the onset of congestion. During the
`congestion management period,
`the ingress node (I) that
`injected more traffic at the time of an onset congestion, shall
`be responsible for a greater reduction in transmission rates
`during congestion recovery period. This leads to a fair
`resource utilization among ingress nodes (I).
`The Domain-based Congestion Management Method
`The DCM scheme comprises three main features: use of
`a improvRED method and apparatus for congestion detec-
`tion at the core nodes or routers (C), congestion feedback in
`the form of LCN bits and use of token bucket filters (TBF)
`to regulate traffic at ingress nodes (I).
`Improved RED
`Random probabilistic dropping/marking of packets
`affects individual sessions proportional to their traffic rates.
`In existing congestion control schemes, egress nodes can not
`estimate the degree of congestion at a bottleneck core router
`from the ECN bits of the packets. Yet, it would be beneficial
`if the egress nodes (E) are able to detect a potential onset of
`congestion period so as to minimize packet losses and to
`delay or prevent a corresponding congestion period.
`Therefore, the present invention provides an early feedback
`so as to minimize packet losses and to delay or prevent a
`corresponding congestion period. The DCM scheme intro-
`duces an improved RED, called improvRED, that basically
`improves the original RED to provide feedback control.
`In addition, an additional threshold, FeedbackThreshold
`(or Feedback), is introduced in the present invention. It takes
`a value between the minth (or minimum) and the maxth (or
`maximum) thresholds. The improvRED will be under a
`deterministically rather than probabilistically marking phase
`whenever the average queue size is greater than minth and
`less than FeedbackThreshold. During this phase, all outgo-
`ing packets are marked appropriately to indicate a potential
`onset of a congestion period and involves no packet drops
`130.
`
`When the average queue size is equal to or greater than
`FeedbackThreshold, but
`less than maxth, packets are
`dropped probabilistically 140 and all the outgoing packets
`are marked appropriately to denote the dropping phase 150.
`Both the marking and dropping phases are considered as
`indicators of potential congestion state at core nodes. These
`phases are experienced by the core nodes (C) to varying
`degrees as a result of the following condition (for an onset
`of a congestion period)
`Cong
`EIRj(t)>R
`
`(3)
`
`where IRj (t) denotes the incoming traffic rate at the con-
`gested node associated with an ingress node (I)j at time t and
`ang is the service rate (link capacity) of the corresponding
`congested core node (C). Condition 3 is a necessary condi-
`tion to drive the core node to a potential congested state
`through queue build-ups. As a result of the above condition
`the underlying core node (C) shall be in either of the
`drop/mark phase for short durations until the DCM scheme
`regulates the traffic such that the congested node is brought
`back to a congestion-free state. (The state where the average
`queue size is less than minth is referred to as a congestion
`free state). When the average queue size is greater than
`maxth, all incoming packets are dropped 160.
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`The improvement behind the introduction of a Feedback-
`Threshold is to avoid packet drops before any congestion
`control schemes ply in. The improvRED method is illus-
`trated in the FIG. 2.
`
`In order to indicate the onset of a congestion period at the
`core nodes (C), feedback in the form of a simple 2-bit
`scheme is proposed that is depicted in FIG. 3. The bits (bitl,
`bit2) serve as a notification to the egress nodes (E) of a
`specific congestion state in the Diff-serv domain. (The bits
`are set by the core nodes (C)). The egress nodes (E) shall
`appropriately notify corresponding ingress nodes (I) of the
`potential congestion at the core nodes (C).
`The details of the integration of the 2 bit-scheme with the
`last two bits of the differentiated service code point (DSCP)
`byte follows. (It is assumed that an LCN bit is available that
`is reset at every ingress node (I) and core nodes (C) can set
`it whenever they are congested).
`LCN Message
`As discussed above, feedback, in the form of a Local
`Congestion Notification (LCN) bit or bits, is used notify the
`ingress nodes (I) of a likely onset of congestion period. The
`IP (Internet protocol) provides the Type of Service (TOS)
`byte in the IP packets that can be used for explicit classifi-
`cation and the type of treatment (priority) the packet should
`receive at the intermediate routers. The TOS byte has been
`redefined as differentiated services code point (DSCP) byte
`in the context of Diff-serv. The definition of DSCP byte is
`described in K. Nichols, S. Blake, F. Baker, and D. Black,
`Definition of the Differentiated Services Field (DS Field) in
`the Ipv4 and Ipv6 Headers (RFC 2474), work in progress,
`1998, hereby incorporated by reference, and summarized in
`FIG. 4. The first
`leftmost 6 bits of the DSCP byte are
`intended to define various PHBs and their associated ser-
`
`vices. Bits 7 and 8 are used for explicit congestion notifi-
`cation (ECN).
`Whenever a router detects congestion, it sets the ECN bit
`(bit 8) of the DSCP byte so that the receiver can alert the
`source of the network congestion at an intermediate router.
`The source node, in turn, may respond to the ECN bit by
`adaptively decreasing the traffic transmission rate. This
`mechanism is incorporated into many transportation proto-
`cols such as TCP and contributes to a healthy network that
`can avoid congestion-collapse.
`Benefits of ECN include: i) avoidance of collapse of the
`network, and ii) flexibility of adapting to network conditions
`by the end applications. The Internet can provide an indi-
`cation of an incipient congestion when using an active queue
`management scheme such as RED.
`In a preferred
`embodiment, the response to the ECN bit set packet by the
`sender is essentially the same as the response to a dropped
`packet, i.e., the sending node lowers its transmission rate. In
`addition, ECN can be incrementally deployed in both end-
`systems and routers.
`When an ECN bit set packet is received by a router, the
`ECN bit is left unchanged and the packet is transmitted With
`existing congestion control schemes, when severe conges-
`tion has occurred and the router’s queue is full, then the
`router drops a packet when a new packet arrives. However,
`such packet losses will become relatively infrequent under
`the improvRED congestion control mechanism because the
`explicit notification can also be implemented through mark-
`ing packets rather than dr