`Dyckerhoff et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7,706,357 B1
`Apr. 27, 2010
`
`USOO7706357 B1
`
`(54) BANDWIDTH DIVISION FOR PACKET
`PROCESSING
`(75) Inventors: Stefan Dyckerhoff, Palo Alto, CA (US);
`Pankaj Patel, Cupertino, CA (US);
`Pradeep Sindhu, Los Altos Hills, CA
`(US); Ashok Krishnamurthi, San Jose,
`CA (US); Hann-Hwan Ju, San Jose, CA
`S. Ramalingam K. Anand, San Jose,
`
`(*) Notice:
`
`(73) Assignee: Juniper Networks, Inc., Sunnyvale, CA
`(US)
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 837 days.
`(21) Appl. No.: 11/470,040
`(22) Filed:
`Sep. 5, 2006
`Related U.S. Application Data
`(63) Continuation of application No. 09/534,838, filed on
`Mar. 24, 2000, now Pat. No. 7,139,282.
`
`(56)
`
`(51) Int. Cl.
`2006.O1
`H04L 2/66
`(
`)
`(52) U.S. Cl. ....................................... 370/352; 370/468
`(58) Field of Classification Search ................. 370/352,
`370/351,389, 401, 399, 468, 400, 498,474,
`370/465,397,392; 709/238
`See application file for complete search history.
`Ref
`Cited
`eerees e
`U.S. PATENT DOCUMENTS
`8/1994 Frank et al. ................. T11 163
`4, 1996 Sandguist .................. 370/60.1
`1/1998 Dugan ........................ 359,133
`5, 1998 Li et al. ...................... 370,235
`5/1999 Sindhu et al. ............... 370,389
`6/1999 Ferguson et al. ............ 370,389
`9, 1999 Ganmukhi et al. .......... 370.220
`
`5,335,325
`5,506,841
`5,710,650
`5,757,771
`5,905,725
`5,909,440
`5,953,314
`
`
`
`6,009,075. A 12/1999 Roberts et al. .............. 370,219
`6,092, 178 A
`7/2000 Jindal et al. ................... 712/27
`6,122.281 A
`9, 2000 D
`tal. ............ 370/401
`J. 44
`onovan et a
`6,263,368 B1
`7/2001 Martin ....................... TO9,224
`6,272.522 B1
`8/2001 Lin et al. .................... TO9/200
`6,324,580 B1
`1 1/2001 Jindal et al. ................. TO9,228
`6,327,622 B1
`12/2001 Jindal et al. ....
`... 709,228
`6,359,900 B1
`3/2002 Dinakar et al. .............. 370,458
`
`(Continued)
`OTHER PUBLICATIONS
`
`U.S. Appl. No. 09/752,827, filed Jan. 3, 2001; entitled: “High-Speed
`Line Interface for Networking Devices. 28 pages.
`(Continued)
`Primary Examiner Chi H Pham
`Assistant Examiner—Alexander Boakye
`(74) Attorney, Agent, or Firm Harrity & Harrity, LLP
`(57)
`ABSTRACT
`
`A bandwidth divider and method for allocating bandwidth
`between a plurality of packet processors. The bandwidth
`divider includes a plurality of counters for measuring the
`bandwidth of data packets transferred from the bandwidth
`divider to a respective packet processor; and a controller for
`analyzing the plurality of counters and transferring a data
`pack
`lected packet p
`based on th
`f
`acket to a Selected backet orOceSSOr based On the COIntentS O
`the counters. The method monitors the bandwidth consumed
`by the packet processors; determines, based on the bandwidth
`consumed by the packet processors, which packet processor
`has consumed the least amount of bandwidth; and allocates a
`next data packet to the packet processor which has consumed
`the least amount of bandwidth.
`
`22 Claims, 10 Drawing Sheets
`
`- - 30
`sandwidth divider chip ()
`23
`xclu?ter
`Corter
`
`B
`8)
`raute-controller
`Cuaua
`O
`
`0.
`Curtf
`
`-100
`
`Fack Pressor
`
`o
`
`2
`8
`
`2300
`240-0
`235-
`...--
`
`
`
`22
`
`8srxwidth Dwider Chip 1
`
`2-20
`
`sandwidth iwider chip 7
`
`|
`
`akat Processor
`3
`
`-
`Bandwich wider
`
`Packet Intelligence Ex. 2039 Page 1 of 19
`
`
`
`US 7,706,357 B1
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`5/2002 Skirmont et al. ............ 370/419
`6,385,209 B1
`6/2002 Allen, Jr. et al. .....
`... 370,335
`6,404,752 B1
`7/2002 Ramaswamy et al. ....... 370,230
`6.424,621 B1
`9/2002 Yamaguchi et al. ......... T10/100
`6,446,146 B1
`5/2003 Padmanabhan et al. ..... T11 165
`6,567,902 B1
`6,587.469 B1* 7/2003 Bragg .................
`... 370/401
`6,601,084 B1
`7/2003 Bhaskaran et al.
`709/105
`6,625,150 B1* 9/2003 Yu ................
`370,389
`6,636,515 B1
`10/2003 Roy et al.
`370/3951
`6,643,719 B1 1 1/2003 Baker .......................... 710/57
`6,646,983 B1
`1 1/2003 Roy et al. ...
`370.218
`6,650,641 B1
`1 1/2003 Albert et al.
`... 370,392
`6,728,492 B1
`4/2004 Baroncelli ........
`... 398,154
`6,741,615 B1
`5/2004 Patwardhan et al. ........ 370,514
`6,751,743 B1
`6/2004 Theodoras, II et al. ...... T13/400
`
`6,754,174 B1
`6,754,217 B1
`6,791.947 B2
`6,834,049 B1
`6,895,018 B1
`6,907,541 B1
`7,016,367 B1
`
`6/2004 Ben-Zur et al. ............. 370,225
`6/2004 Ahn ...........
`370,395.6
`9/2004 Oskouyet al. .............. 370.238
`12/2004 Tomar et al. ................ 370,369
`5, 2005 Klish
`370/471
`6/2005 Padmanabhan et al. ..... T13,503
`3/2006 Dyckerhoff et al. ......... 370,429
`
`
`
`OTHER PUBLICATIONS
`U.S. Appl. No. 09/534,838, filed Mar. 24, 2000; entitled: “Bandwidth
`Division for Packet Processing: 27 pageS.
`U.S. Appl. No. 1 1/332.402, filed Jan. 17, 2006; entitled: “Systems
`and Methods for Allocating Bandwidth for Processing of Packets”;
`61 pages.
`
`* cited by examiner
`
`Packet Intelligence Ex. 2039 Page 2 of 19
`
`
`
`U.S. Patent
`
`Apr. 27, 2010
`
`Sheet 1 of 10
`
`US 7,706,357 B1
`
`input
`Port
`Memory
`
`input
`Port
`
`Memo
`
`2
`
`o ?
`input
`Port
`
`12
`
`4.
`
`12
`
`4.
`
`4.
`
`8
`
`N
`
`Output
`Port
`
`6
`
`Switching
`Device
`
`8 N.
`
`Output
`Port
`
`8 N.
`
`Output
`Port
`
`Fig. 1A
`(Prior Art)
`
`input
`Port
`
`2
`1T
`
`8
`
`Na-
`
`Output
`Port
`
`input
`Port
`Meno
`
`2
`
`o ?
`input
`Port
`
`
`
`1, 2
`
`Switching
`Device
`
`8 N.
`
`4.
`
`4.
`
`9
`
`9
`
`Fig. 1B
`(Prior Art)
`
`Output
`Port
`Memo
`
`Output
`Port
`
`8
`
`Packet Intelligence Ex. 2039 Page 3 of 19
`
`
`
`U.S. Patent
`
`Apr. 27, 2010
`
`Sheet 2 of 10
`
`US 7,706,357 B1
`
`Bandwidth Divider Chip O
`
`BD
`H Controllier as
`O
`
`Counter
`Controle
`
`O
`Counter
`
`2OO
`
`1OO
`1.
`
`210
`
`
`
`Packet Processor
`O
`
`Bandwidth Divider Chip 1
`
`21
`Packet Processor
`
`--
`
`|
`
`212
`Packet Processor
`
`
`
`T
`
`Bandwidth Divider Chip 7
`
`23
`Packet Processor
`
`
`
`Bandwidth Divider
`
`F.G. 2A
`
`Packet Intelligence Ex. 2039 Page 4 of 19
`
`
`
`U.S. Patent
`
`Apr. 27, 2010
`
`Sheet 3 of 10
`
`US 7,706,357 B1
`
`Bandwidth Divider Receives
`Data Packets into BD
`linput Queue
`
`
`
`
`
`
`
`Data Packet
`Ready for
`Transmission?
`
`Yes
`Bandwidth Divider Sends
`"Data Packet Ready"
`Signal to its Controller
`
`255
`
`260
`
`265
`
`
`
`
`
`
`
`
`
`270
`
`Controller Reads Values in
`O Counter
`
`275
`
`Controller Determines Which
`Packet Processor has LOWest
`CounterValue
`
`28O
`
`Controller Allocates Data Packet
`to Packet Processor With Lowest
`Counter Value
`
`FIG. 2B
`
`Packet Intelligence Ex. 2039 Page 5 of 19
`
`
`
`U.S. Patent
`
`Apr. 27, 2010
`
`Sheet 4 of 10
`
`US 7,706,357 B1
`
`285
`
`
`
`
`
`
`
`290
`
`
`
`Data Packet Length
`Read by IO Counter
`
`iO Counter increments
`Appropriate Counter
`by Length of Data
`Packet
`
`Data Packet Transferred
`to Packet Processor to
`Which it as Been
`Allocated
`
`FIG. 2B, Cont.
`
`Packet Intelligence Ex. 2039 Page 6 of 19
`
`
`
`U.S. Patent
`
`Apr. 27, 2010
`
`Sheet 5 of 10
`
`US 7,706,357 B1
`
`
`
`FIG. 2C
`(Prior Art)
`
`Packet Intelligence Ex. 2039 Page 7 of 19
`
`
`
`U.S. Patent
`
`Apr. 27, 2010
`
`Sheet 6 of 10
`
`US 7,706,357 B1
`
`
`
`FIG 3
`
`Packet Intelligence Ex. 2039 Page 8 of 19
`
`
`
`U.S. Patent
`
`Apr. 27, 2010
`
`Sheet 7 of 10
`
`US 7,706,357 B1
`
`400
`
`4O1
`
`Receive a Stream of
`Data
`
`
`
`/
`
`402
`
`
`
`
`
`
`
`
`
`405
`
`Transfer Data Packet
`to Packet Processor
`
`
`
`Fig. 4
`
`Transfer "Data Packet
`Ready" indication to
`Controller
`
`403
`
`Receive Command
`FOn Controle to
`Transfer Data Packet
`to Certain Packet
`Processor
`
`
`
`404
`
`Packet Intelligence Ex. 2039 Page 9 of 19
`
`
`
`U.S. Patent
`
`Apr. 27, 2010
`
`Sheet 8 of 10
`
`US 7,706,357 B1
`
`
`
`501
`
`Receive Oata Packet
`Length
`
`5O2
`
`Add Length of Data Packet
`to Counter
`
`Packet Intelligence Ex. 2039 Page 10 of 19
`
`
`
`U.S. Patent
`
`Apr. 27, 2010
`
`Sheet 9 of 10
`
`US 7,706,357 B1
`
`Receive "Data Packet
`Ready" indication
`
`801
`
`607
`
`602
`
`
`
`803
`
`
`
`Analyze Counters to
`Determine Which Packet
`Processor Has the Lowest
`Counter Walue
`
`606
`
`
`
`or More Packet
`Processors Have The
`identical Lowest
`CounterValue?
`
`605
`
`
`
`
`
`
`
`
`
`Allocate Data
`Packet to Packet
`Processor With
`Lowest Counter
`Value
`
`
`
`
`
`
`
`
`
`Allocate Data
`Pac
`ket Randomly
`to a Packet
`Processor With
`The Lowest
`Counter Walue
`
`610 -/
`
`
`
`Decrement Counters to
`Compensate For Bandwidth
`Pulied From Packet Processors
`"
`" "
`"
`"
`
`Update Counters by
`incrementing Counter Associated
`With Packet Processor to Which
`Packet Was Transferred
`
`611
`
`613
`
`Transfer Data Packet to
`Packet Processor to
`Which it as Been
`Allocated
`
`F.G. 6A
`
`Packet Intelligence Ex. 2039 Page 11 of 19
`
`
`
`U.S. Patent
`
`Apr. 27, 2010
`
`Sheet 10 of 10
`
`US 7,706,357 B1
`
`Read Each Counter
`
`
`
`
`
`Compare Counter
`Walues to Determine
`Lowest Counter Walue
`
`F.G. 6B
`
`Packet Intelligence Ex. 2039 Page 12 of 19
`
`
`
`US 7,706,357 B1
`
`1.
`BANDWIDTH DIVISION FOR PACKET
`PROCESSING
`
`RELATED APPLICATION
`
`This application is a continuation of U.S. patent applica
`tion Ser. No. 09/534,838 filed Mar. 24, 2000 which is incor
`porated herein by reference.
`
`TECHNICAL FIELD
`
`This invention relates generally to data routing systems,
`and more particularly to a method and apparatus for forward
`ing data packets to processors.
`
`BACKGROUND
`
`10
`
`15
`
`2
`ventional router typically has to be modified or redesigned.
`The process of modifying a router to increase bandwidth
`capability entails tedious design processes involving risk that
`the new design will not perform as intended—(or at all),
`outlay of resources—(both monetary and human), as well as
`time delays.
`
`SUMMARY
`
`In one aspect the invention provides a bandwidth divider
`for allocating bandwidth between a plurality of packet pro
`cessors. The bandwidth divider comprises a plurality of
`counters for measuring the bandwidth of data packets trans
`ferred from the bandwidth divider to a respective packet
`processor, and a controller for analyzing the plurality of
`counters and transferring a data packet to a selected packet
`processor based on the contents of the counters.
`The bandwidth divider may also include a plurality of
`interfaces, each coupled to an input and output stream where
`there is a counter and/or queues for each input stream/packet
`processor combination. The packet processor may be a packet
`forwarding engine. The counter may indicate the level of
`bandwidth consumption of a packet processor, and Such indi
`cation of bandwidth consumption may be decremented over
`time in accordance with a decrement engine employing a
`half-life decay or other function. The indication of level of
`bandwidth consumption may also be normalized after each
`packet is processed using a normalizing engine, and may be
`normalized such that the lowest indication for all counters is
`0. In a system measuring bandwidth consumption, the con
`troller may transfera data packet to the packet processor with
`the lowest bandwidth consumption or, if the controller deter
`mines that a plurality of packet processors have an identical,
`lowest bandwidth consumption, the controller may use a ran
`dom selector, transferring the data packet randomly (using for
`example a Linear Feedback Shift Register function) to one of
`the plurality packet processors having the lowest bandwidth
`consumption. In another aspect, the invention provides a
`router, which in turn comprises: a plurality of bandwidth
`dividers for receiving a first set of inputStreams and providing
`a first set of output streams; a plurality of packet processors
`for receiving the first set of output streams from the band
`width dividers and providing a second set of input streams; a
`plurality of counters for monitoring the flow of data from the
`bandwidth dividers to the packet processors; a controller for
`monitoring the counters and allocating the streams of data
`between the packet processors; and a plurality of cross-bars
`for receiving the second set of input streams from the packet
`processors, multiplexing the second set of input streams, and
`providing a second set of output streams.
`In another aspect, the invention provides a method of
`directing data packets to a plurality of packet processors. The
`method comprises the steps of monitoring the bandwidth
`consumed by the packet processors; determining, based on
`the bandwidth consumed by the packet processors, which
`packet processor has consumed the least amount of band
`width; and allocating a next data packet to the packet proces
`sor which has consumed the least amount of bandwidth.
`Aspects of the invention include one or more of the follow
`ing features. The method of directing data packets may
`include the step of incrementing counters to track the band
`width consumed by the packet processors, wherein such step
`may include incrementing one counter for each input and
`output pair to track the bandwidth consumed by the packet
`processors. The determining step may include (1) comparing
`the counters to ascertain the counter with the lowest value
`and/or (2) determining if two or more counters have the
`
`25
`
`A data packet is a variable size unit of communication in a
`network. A router is a Switching device that receives packets
`containing data or control information on one port, and based
`on destination or other information contained within the
`packet, routes the packet out another port to the destination
`(or intermediary destination).
`Conventional routers perform this switching function by
`evaluating header information contained within a first data
`block in the packet in order to determine the proper output
`port for a particular packet.
`Referring now to FIG. 1a, one type of conventional router
`includes a plurality of input ports 2 each including an input
`buffer (memory) 4, a switching device 6 and a plurality of
`30
`output ports 8. Data packets received at an input port 2 are
`stored at least temporarily in input buffer 4 while destination
`information associated with each packet is decoded to deter
`mine the appropriate Switching through the Switching device
`6.
`Another type of conventional router is referred to as a
`“non-blocking router. Referring now to FIG. 1b, a conven
`tional “non-blocking router includes a plurality of input
`ports 2 each including an input buffer (memory) 4, a Switch
`ing device 6 and a plurality of output ports 8 each having an
`output buffer (memory) 9. In order to avoid blocking condi
`tions, each output port 8 is configured to include an output
`buffer 9. Each output port can simultaneously be outputting
`packets as well as receiving new packets for output at a later
`time. Typically the output buffer 9 is sized to be sufficiently
`large. Such that no data packets are dropped.
`Conventional routers, including the routers of FIGS. 1 a
`and 1b, include buffers that are sized to support a particular
`bandwidth (B). If the input bandwidth is too high, the router
`will drop data. The amount of input bandwidth is dependent
`on a number of factors including: the line input rate, the speed
`of the look-up process, and the blocking characteristics for
`the switching device. Input bandwidth also relates to the
`processing power of the packet processor, where the process
`ing power is related to: (1) the delay bandwidth memory, (i.e.,
`more memory is required for bigger and faster systems); and
`(2) the packet lookup power, (i.e., the ability to determine
`where to route packets).
`A key problem in designing routers is to make them Scale
`to large aggregate bandwidth. Building larger monolithic sys
`tems is made difficult by hard technology limits on the inte
`grated circuits in these systems. In addition, long develop
`ment times for the redesign of a whole system prohibit
`internet service providers from keeping up with the growth of
`bandwidth demand. To process a larger amount of bandwidth
`in a single system (i.e., a bandwidth of an amount N*B where
`N is a positive integer), the size and configuration of a con
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Packet Intelligence Ex. 2039 Page 13 of 19
`
`
`
`US 7,706,357 B1
`
`3
`identical, lowest value; and if two or more counters have the
`identical, lowest value, allocating the data packet randomly as
`between the packets with the identical, lowest value. The
`method may include decrementing the counters over time,
`using for example a half-life decay function. The method may
`also include the step of normalizing the counters, for example
`by subtracting the value of the lowest counter from all counter
`values.
`Aspects of the invention may include one or more of the
`following advantages. A system is provided for processing
`B*N amount of bandwidth in a single router without altering
`the size of the packet processors, where a packet processor is
`a receiver and processor of data packets. Each packet proces
`sor can be of the form of a packet forwarding engine (PFE).
`A PFE is a routing core with delay bandwidth memory and a
`route lookup function.
`A distributed system is provided where each bandwidth
`divider runs the same algorithm. As such, the bandwidth
`consumed by the packet processors is balanced. Because the
`system is distributed and is not controlled by a centralized
`controller, the system is more Scalable and is more fault
`tolerant (i.e., a failure of any given bandwidth divider will
`only affect the input streams directly connected to that band
`width divider).
`The invention allows multiple packet processors to be
`seamlessly integrated into a system Such that they perform the
`function of a single router. The invention Supports the inter
`connection of multiple packet processors, increasing system
`bandwidth without expending resources to develop a new
`router or incurring the risks associated with designing and
`developing a higher capacity router. The invention allows a
`designer to build a system that supports N times the band
`width of a single packet processor by using numerous cur
`rently existing packet processors combined with a single
`bandwidth divider.
`35
`The invention further allows for all the benefits of a com
`mon pool of memory to be achieved while maintaining a
`queue organization that does not exhibit head-of-line (HOL)
`blocking problems, i.e., the condition defined by the inability
`to send a packet which is ready for transmission because the
`device is waiting for another packet to become ready for
`transmission.
`The details of one or more implementations of the inven
`tion are set forth in the accompanying drawings and the
`description below. Other features, objects, and advantages of
`45
`the invention will be apparent from the description and draw
`ings, and from the claims.
`
`10
`
`15
`
`25
`
`30
`
`4
`FIGS. 6a and 6b are flow charts indicating the functions
`performed by a controller of FIG.2a.
`Like reference symbols in the various drawings indicate
`like elements.
`
`DETAILED DESCRIPTION
`
`Referring to FIG. 2a, data routing system 100 includes a
`bandwidth divider (BD) 200 for spraying packets amongst a
`plurality of packet processors, 210-213. In one implementa
`tion, BD 200 includes a plurality of bandwidth divider inte
`grated circuits (bandwidth divider chips BD0-BD7), 201
`208, each of which include a controller 240-0 to 240-7 and an
`IO counter 230-0 to 230-7. Bandwidth divider chip BD0, 201,
`is connected to input port 0 of each of packet processors
`(210-213). Similarly, bandwidth divider chip BD1 202 is
`connected to input port 1 of each of packet processors 0-4
`(210-213). Bandwidth divider chips BD2-BD7 (203-208) are
`similarly connected. The input to each bandwidth divider
`chip (BD0-BD7), 201-208, is received by and stored in BD
`input queues 235-0 to 235-7. In one implementation, the BD
`input queues 235-0 to 235-7 buffer the data packets until they
`have been completely received and until a decision is made as
`to where to route the packets. In this implementation, the BD
`chips 201-208 act as store-and-forward devices, i.e. the BD
`chips 201-208 will store a whole packet before routing that
`packet to a packet processor. Packet length is determined
`when the end of a packet is received. As such, the decision
`about where a packet is to be sent is made after the whole
`packet has been received. One advantage of the BD200 being
`a store-and-forward device is that by postponing a forwarding
`decision until the packet length is known, the load is opti
`mally balanced amongst the packet processors. In one imple
`mentation, the memory in a BD chip 201-208 is sized to hold
`a maximum sized packet in all of its BD input queues, 235-0
`to 235-7.
`In one implementation separate queues are maintained for
`each input stream/packet processor pair, e.g. if the number of
`input streams is 16 and the number of packet processors to
`which the BD chip 201-208 is connected is 4, the total number
`of queues will be 64. This configuration eliminates HOL
`blocking.
`When a packet has been received in its entirety, the packet
`is assigned to a queue in the BD 200 based on its input stream
`and packet processor destination, e.g. all packets from stream
`3, which are routed to packet processor 0, 210, are routed to
`the same queue. Thus, in one implementation, the minimum
`total memory size will be: (Number of queues)*MTU, where
`MTU is the maximum transfer unit defined as the maximum
`packet size plus some Smaller amount of extra storage pro
`portional to the delay associated with starting to send a packet
`after the packet has been received in its entirety.
`Since input streams can be of varying speeds, in one imple
`mentation it would be efficient to have a common pool of
`memory for all data, rather than dedicated memory for each
`queue. A common pool of memory means that any data cell in
`memory can be used by any input stream. Since the total
`bandwidth of all input streams combined is fixed, this allows
`the BD 200 to allocate memory in the common memory pool
`based on the actual speed of the stream rather than the worst
`case maximum (which would be one stream using all of the
`input bandwidth).
`Thus, the common memory pool can be organized as fixed
`sized data quantities (32 bytes per cell) and queues can be
`organized as linked lists of data. In a linked list each data cell
`includes the address or pointer to the next data cell of the
`
`40
`
`DESCRIPTION OF DRAWINGS
`
`FIGS. 1a and 1b are block diagrams of conventional router
`devices.
`FIG.2a is a schematic block diagram of an implementation
`of a data routing system.
`FIG.2b is a flow chart illustrating the flow of data through
`the data routing system.
`FIG.2c illustrates a graphical representation of a half-life
`function.
`FIG.3 is a schematic block diagram of abandwidth divider
`according to one implementation of the present invention,
`connected to four high speed Switching devices and eight
`output crossbars.
`FIG. 4 is a flow chart indicating the steps performed by a
`bandwidth divider of FIG. 2a.
`FIG. 5 is a flow chart indicating the steps performed by an
`IO counter of FIG. 2a.
`
`50
`
`55
`
`60
`
`65
`
`Packet Intelligence Ex. 2039 Page 14 of 19
`
`
`
`5
`queue associated with the packet such that, when the packet is
`read out of memory, the reader knows the location of the next
`data cell.
`One problem with making a forwarding decision after the
`entire packet has been received is that the BD 200 does not
`know to which queue a packet is assigned until after the
`packet has been received. In one implementation, the BD 200
`Solves this problem without using extra memory. In one
`implementation, the global memory is organized into cells,
`each of which can hold 32-bytes of data and each of which
`stores a pointer to the next cell. A packet (which is typically
`greater than 32 bytes) may occupy multiple cells that are
`linked as described above. Each queue consists of a linked list
`of packets that are linked together. The linked lists of the
`packets in the queue create a consecutive linked list. At the
`time that a packet is received, the packet is written into
`memory, the cells of the packet are linked together (not yet
`assigned to a queue), and the address of the first cell of the
`packet is saved. When the packet has been entirely received,
`the BD makes a decision as to which queue (i.e., as to which
`packet processor) the packet should be assigned. The BD then
`takes the saved address of the first cell of the packet and writes
`the saved address into the link information field of the last cell
`of the last packet of the appropriate queue. The packet is then
`linked to the queue and the packet reader follows the linked
`list of the queue. Referring again to FIG.2a, controllers 240-0
`to 240-7 read data from the BD input queues 235-0 to 235-7.
`Within each bandwidth divider chip 201-208, each controller,
`240-0 to 240-7, is connected to an IO counter, 230-0 to 230-7.
`The IO counter 230-0 to 230-7 includes an array of counters
`and a counter controller 231-0 to 231-7. In one implementa
`tion, the number of counters can be equal to the number of
`input streams to the BD chips 201-208 multiplied by the
`number of packet processors. In the configuration shown
`where BDO includes 8 input streams and four packet proces
`sors, IO counter 230 includes an array of 32 counters. The
`counter controllers 231-0 to 231-7 monitor each stream of
`data. Each counter's count reflects the flow of data between
`the respective BD chip data stream and a packet processor.
`For example, the first counter in IO counter 230-0 tracks the
`flow of data from the first stream into BD0, 201, to packet
`processor 0, 210, and the last counter in IO counter 230-7
`tracks the flow of data from the last stream in BD7, 208, to
`packet processor 3, 213.
`In one implementation, the size of each counter is slightly
`larger than the largest packet potentially sent to any BD chip
`201-208. For example, the size of each counter could be
`chosen to be log(MTU-4) bits, where MTU is the maximum
`transfer unit defined above. This counter size provides a com
`50
`fortable margin for the transfer of packets through the system.
`IO counters 230-1 to 230-7 and controllers 240-1 to 240-7
`are interconnected allowing controllers 240-0 to 240-7 to read
`and manipulate the counters.
`FIG.2b illustrates the flow of data. Bandwidth divider 200
`receives data packets into the BD input queues 235-0 to 235-7
`(255). When each packet is ready for transmission (260), the
`bandwidth divider chip 201-208 receiving the data packet
`sends a data packet ready signal to its respective controller
`240-0 to 240-7 (265). The controller 240-0 to 240-7 reads the
`values of the counters in the respective IO counter 230-0 to
`230-7, (270) and determines which packet processor has the
`lowest counter value (indicating that the packet processor
`associated with that counter has consumed the least amount
`of bandwidth) (275). The controller 240-0 to 240-7 then allo
`cates the data packet to the packet processor with the lowest
`counter value (280).
`
`6
`After the data packet is allocated, two events occur (in no
`particular order). First, the data packet length is read by the IO
`counter's countercontroller 231-0 to 231-7 (285). The appro
`priate counter is incremented by the length of the data packet
`(measured in bytes) (290). Second, the data packet is trans
`ferred to the packet processor to which the data packet was
`allocated (295).
`In one implementation, the counters are then decremented
`using a decrement engine employing a decay function. This
`decrementation is performed to ensure that the counters
`approximately reflect the amount of bandwidth currently
`being processed by the packet processors.
`For example, consider a large sized packet (Xbytes) sent to
`a first packet processor by BDO, 201, a long time ago. Such
`that the packet has already been forwarded out of the packet
`processor. If IO counter 230-0 was not adjusted to reflect the
`bandwidth currently being processed by the packet proces
`sors, packet processor 0 would appear more loaded than the
`other packet processors. Now, Suppose a burst of minimum
`sized packets is received by the bandwidth divider. If the
`packet processor 0 counter was not adjusted, the BD 200
`would not send any of the new packets to the first packet
`processor until the number of bytes received by the other
`packet processors reached the number of bytes originally
`received and processed by the first packet processor (i.e., X
`bytes). This would mean that only N-1 packet processors
`would be available to absorb the burst of small packets. The
`load on the packet processors would not actually be balanced,
`resulting in performance degradation of the system.
`When the first packet processor is out of parity with the
`others, undesirable packet reordering can occur once the
`packet processors achieve parity. Those packet processors
`most recently used, 211-213 in this example, will behave
`sluggishly compared to the first packet processor 210.
`Although any type of decay function can be used, one decay
`function is a half-life function shown in FIG.2c. The half-life
`function is defined mathematically by:
`
`T2 = --
`
`where T is the half life, ln is the natural logarithm, and w is
`the decay constant. The half-life function approximates a
`decay in which, over each period of time T, the number of
`bytes in a counter decreases by half. In a second time interval
`T, the bandwidth again decreases by half. Such that the
`number of bytes remaining after Successive intervals is /2, /4,
`/s, and so forth. The decay constant, w, which dictates the rate
`of decay, can be chosen based on the properties of the packet
`processors. The decay constant can be implemented in the IO
`counters 230-0 to 230-7 using a programmable register.
`The value of the half-life interval, T. can be proportional
`to the delay through a PFE. A decay constant should be
`selected that is not so Small that information regarding the
`state of the packet processors or PFE is lost too quickly.
`Similarly, the decay constant should not be so large as to
`eviscerate the role of the decay function.
`In another implementation, the counters are normalized
`using a normalization engine by Subtracting the lowest
`counter value from all counter values such that the lowest
`counter value is Zero. Normalizing prevents wrap around
`errors, i.e., errors that occur when a counter reaches its maxi
`mum value and “wraps around to 0. Comparisons between
`counters can still be made even if the counters are allowed to
`
`US 7,706,357 B1
`
`10
`
`15
`
`25
`
`30
`
`35
`
`40
`
`45
`
`55
`
`60
`
`65
`
`Packet Intelligence Ex. 2039 Page 15 of 19
`
`
`
`US 7,706,357 B1
`
`5
`
`10
`
`15
`
`7
`wrap-around. A comparison routine would, however, need to
`compensate for the wrap arounds.
`The implementation of FIG.2a operates to balance the load
`between the various packet processors. For example, in a
`system with two packet processors, if the first packet proces
`Sor receives a large packet, the next few smaller packets are
`forwarded to the second packet processor.
`FIG. 3 illustrates the bandwidth divider 200 of FIG. 2a
`connected to four PFEs, 20-0 to 20-3, which in turn are
`connected to output crossbars X-X, 320-327. The band
`width divider 200 operates in the same manner as described
`with respect to FIG.2a.
`The output ports 0-7 of each of the PFEs, 20-0 to 20-3, are
`connected to the respective output crossbars X-X, 320-327
`(such that each PFE output port 0 is connected to output
`crossbar X 320, each output port 1 is connected to X' 321,
`and so forth). The output crossbars send data out in the order
`in which the crossbars receive the data (typically using a First
`In, First Out (“FIFO) system).
`FIG. 4 is a flow chart illustrating the steps performed by the
`bandwidth divider chips, 201-208. A BD chip 201-208
`receives a data packet as an input (401). The BD chip 201-208
`then sends a "data packet ready signal to their respective
`controller 240-0 to 240-7 (403). At this stage, the BD chip is
`in a state of stasis until it receives a signal back from the
`controller 240-0 to 240-7. After a certain period of time, i.e.
`after the controllers 240-0 to 240-7 allocate the data packet,
`the BD chip receives a command from the controller to trans
`fer the data packet to a certain one of the packet processors
`210-213 (404). The BD chip then transfers the data packet as
`instructed (405).
`FIG. 5 is a flow chart illustrating the operation of the IO
`counters 230-0 to 230-7. Once controller 240-0 to 240-7
`allocates the data packet to a certain one of the packet pro
`cessors 210-213, the size of the data packet is determined and
`stored in the appropriate one of