throbber
(12) United States Patent
`Dyckerhoff et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 7,706,357 B1
`Apr. 27, 2010
`
`USOO7706357 B1
`
`(54) BANDWIDTH DIVISION FOR PACKET
`PROCESSING
`(75) Inventors: Stefan Dyckerhoff, Palo Alto, CA (US);
`Pankaj Patel, Cupertino, CA (US);
`Pradeep Sindhu, Los Altos Hills, CA
`(US); Ashok Krishnamurthi, San Jose,
`CA (US); Hann-Hwan Ju, San Jose, CA
`S. Ramalingam K. Anand, San Jose,
`
`(*) Notice:
`
`(73) Assignee: Juniper Networks, Inc., Sunnyvale, CA
`(US)
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 837 days.
`(21) Appl. No.: 11/470,040
`(22) Filed:
`Sep. 5, 2006
`Related U.S. Application Data
`(63) Continuation of application No. 09/534,838, filed on
`Mar. 24, 2000, now Pat. No. 7,139,282.
`
`(56)
`
`(51) Int. Cl.
`2006.O1
`H04L 2/66
`(
`)
`(52) U.S. Cl. ....................................... 370/352; 370/468
`(58) Field of Classification Search ................. 370/352,
`370/351,389, 401, 399, 468, 400, 498,474,
`370/465,397,392; 709/238
`See application file for complete search history.
`Ref
`Cited
`eerees e
`U.S. PATENT DOCUMENTS
`8/1994 Frank et al. ................. T11 163
`4, 1996 Sandguist .................. 370/60.1
`1/1998 Dugan ........................ 359,133
`5, 1998 Li et al. ...................... 370,235
`5/1999 Sindhu et al. ............... 370,389
`6/1999 Ferguson et al. ............ 370,389
`9, 1999 Ganmukhi et al. .......... 370.220
`
`5,335,325
`5,506,841
`5,710,650
`5,757,771
`5,905,725
`5,909,440
`5,953,314
`
`
`
`6,009,075. A 12/1999 Roberts et al. .............. 370,219
`6,092, 178 A
`7/2000 Jindal et al. ................... 712/27
`6,122.281 A
`9, 2000 D
`tal. ............ 370/401
`J. 44
`onovan et a
`6,263,368 B1
`7/2001 Martin ....................... TO9,224
`6,272.522 B1
`8/2001 Lin et al. .................... TO9/200
`6,324,580 B1
`1 1/2001 Jindal et al. ................. TO9,228
`6,327,622 B1
`12/2001 Jindal et al. ....
`... 709,228
`6,359,900 B1
`3/2002 Dinakar et al. .............. 370,458
`
`(Continued)
`OTHER PUBLICATIONS
`
`U.S. Appl. No. 09/752,827, filed Jan. 3, 2001; entitled: “High-Speed
`Line Interface for Networking Devices. 28 pages.
`(Continued)
`Primary Examiner Chi H Pham
`Assistant Examiner—Alexander Boakye
`(74) Attorney, Agent, or Firm Harrity & Harrity, LLP
`(57)
`ABSTRACT
`
`A bandwidth divider and method for allocating bandwidth
`between a plurality of packet processors. The bandwidth
`divider includes a plurality of counters for measuring the
`bandwidth of data packets transferred from the bandwidth
`divider to a respective packet processor; and a controller for
`analyzing the plurality of counters and transferring a data
`pack
`lected packet p
`based on th
`f
`acket to a Selected backet orOceSSOr based On the COIntentS O
`the counters. The method monitors the bandwidth consumed
`by the packet processors; determines, based on the bandwidth
`consumed by the packet processors, which packet processor
`has consumed the least amount of bandwidth; and allocates a
`next data packet to the packet processor which has consumed
`the least amount of bandwidth.
`
`22 Claims, 10 Drawing Sheets
`
`- - 30
`sandwidth divider chip ()
`23
`xclu?ter
`Corter
`
`B
`8)
`raute-controller
`Cuaua
`O
`
`0.
`Curtf
`
`-100
`
`Fack Pressor
`
`o
`
`2
`8
`
`2300
`240-0
`235-
`...--
`
`
`
`22
`
`8srxwidth Dwider Chip 1
`
`2-20
`
`sandwidth iwider chip 7
`
`|
`
`akat Processor
`3
`
`-
`Bandwich wider
`
`Packet Intelligence Ex. 2039 Page 1 of 19
`
`

`

`US 7,706,357 B1
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`5/2002 Skirmont et al. ............ 370/419
`6,385,209 B1
`6/2002 Allen, Jr. et al. .....
`... 370,335
`6,404,752 B1
`7/2002 Ramaswamy et al. ....... 370,230
`6.424,621 B1
`9/2002 Yamaguchi et al. ......... T10/100
`6,446,146 B1
`5/2003 Padmanabhan et al. ..... T11 165
`6,567,902 B1
`6,587.469 B1* 7/2003 Bragg .................
`... 370/401
`6,601,084 B1
`7/2003 Bhaskaran et al.
`709/105
`6,625,150 B1* 9/2003 Yu ................
`370,389
`6,636,515 B1
`10/2003 Roy et al.
`370/3951
`6,643,719 B1 1 1/2003 Baker .......................... 710/57
`6,646,983 B1
`1 1/2003 Roy et al. ...
`370.218
`6,650,641 B1
`1 1/2003 Albert et al.
`... 370,392
`6,728,492 B1
`4/2004 Baroncelli ........
`... 398,154
`6,741,615 B1
`5/2004 Patwardhan et al. ........ 370,514
`6,751,743 B1
`6/2004 Theodoras, II et al. ...... T13/400
`
`6,754,174 B1
`6,754,217 B1
`6,791.947 B2
`6,834,049 B1
`6,895,018 B1
`6,907,541 B1
`7,016,367 B1
`
`6/2004 Ben-Zur et al. ............. 370,225
`6/2004 Ahn ...........
`370,395.6
`9/2004 Oskouyet al. .............. 370.238
`12/2004 Tomar et al. ................ 370,369
`5, 2005 Klish
`370/471
`6/2005 Padmanabhan et al. ..... T13,503
`3/2006 Dyckerhoff et al. ......... 370,429
`
`
`
`OTHER PUBLICATIONS
`U.S. Appl. No. 09/534,838, filed Mar. 24, 2000; entitled: “Bandwidth
`Division for Packet Processing: 27 pageS.
`U.S. Appl. No. 1 1/332.402, filed Jan. 17, 2006; entitled: “Systems
`and Methods for Allocating Bandwidth for Processing of Packets”;
`61 pages.
`
`* cited by examiner
`
`Packet Intelligence Ex. 2039 Page 2 of 19
`
`

`

`U.S. Patent
`
`Apr. 27, 2010
`
`Sheet 1 of 10
`
`US 7,706,357 B1
`
`input
`Port
`Memory
`
`input
`Port
`
`Memo
`
`2
`
`o ?
`input
`Port
`
`12
`
`4.
`
`12
`
`4.
`
`4.
`
`8
`
`N
`
`Output
`Port
`
`6
`
`Switching
`Device
`
`8 N.
`
`Output
`Port
`
`8 N.
`
`Output
`Port
`
`Fig. 1A
`(Prior Art)
`
`input
`Port
`
`2
`1T
`
`8
`
`Na-
`
`Output
`Port
`
`input
`Port
`Meno
`
`2
`
`o ?
`input
`Port
`
`
`
`1, 2
`
`Switching
`Device
`
`8 N.
`
`4.
`
`4.
`
`9
`
`9
`
`Fig. 1B
`(Prior Art)
`
`Output
`Port
`Memo
`
`Output
`Port
`
`8
`
`Packet Intelligence Ex. 2039 Page 3 of 19
`
`

`

`U.S. Patent
`
`Apr. 27, 2010
`
`Sheet 2 of 10
`
`US 7,706,357 B1
`
`Bandwidth Divider Chip O
`
`BD
`H Controllier as
`O
`
`Counter
`Controle
`
`O
`Counter
`
`2OO
`
`1OO
`1.
`
`210
`
`
`
`Packet Processor
`O
`
`Bandwidth Divider Chip 1
`
`21
`Packet Processor
`
`--
`
`|
`
`212
`Packet Processor
`
`
`
`T
`
`Bandwidth Divider Chip 7
`
`23
`Packet Processor
`
`
`
`Bandwidth Divider
`
`F.G. 2A
`
`Packet Intelligence Ex. 2039 Page 4 of 19
`
`

`

`U.S. Patent
`
`Apr. 27, 2010
`
`Sheet 3 of 10
`
`US 7,706,357 B1
`
`Bandwidth Divider Receives
`Data Packets into BD
`linput Queue
`
`
`
`
`
`
`
`Data Packet
`Ready for
`Transmission?
`
`Yes
`Bandwidth Divider Sends
`"Data Packet Ready"
`Signal to its Controller
`
`255
`
`260
`
`265
`
`
`
`
`
`
`
`
`
`270
`
`Controller Reads Values in
`O Counter
`
`275
`
`Controller Determines Which
`Packet Processor has LOWest
`CounterValue
`
`28O
`
`Controller Allocates Data Packet
`to Packet Processor With Lowest
`Counter Value
`
`FIG. 2B
`
`Packet Intelligence Ex. 2039 Page 5 of 19
`
`

`

`U.S. Patent
`
`Apr. 27, 2010
`
`Sheet 4 of 10
`
`US 7,706,357 B1
`
`285
`
`
`
`
`
`
`
`290
`
`
`
`Data Packet Length
`Read by IO Counter
`
`iO Counter increments
`Appropriate Counter
`by Length of Data
`Packet
`
`Data Packet Transferred
`to Packet Processor to
`Which it as Been
`Allocated
`
`FIG. 2B, Cont.
`
`Packet Intelligence Ex. 2039 Page 6 of 19
`
`

`

`U.S. Patent
`
`Apr. 27, 2010
`
`Sheet 5 of 10
`
`US 7,706,357 B1
`
`
`
`FIG. 2C
`(Prior Art)
`
`Packet Intelligence Ex. 2039 Page 7 of 19
`
`

`

`U.S. Patent
`
`Apr. 27, 2010
`
`Sheet 6 of 10
`
`US 7,706,357 B1
`
`
`
`FIG 3
`
`Packet Intelligence Ex. 2039 Page 8 of 19
`
`

`

`U.S. Patent
`
`Apr. 27, 2010
`
`Sheet 7 of 10
`
`US 7,706,357 B1
`
`400
`
`4O1
`
`Receive a Stream of
`Data
`
`
`
`/
`
`402
`
`
`
`
`
`
`
`
`
`405
`
`Transfer Data Packet
`to Packet Processor
`
`
`
`Fig. 4
`
`Transfer "Data Packet
`Ready" indication to
`Controller
`
`403
`
`Receive Command
`FOn Controle to
`Transfer Data Packet
`to Certain Packet
`Processor
`
`
`
`404
`
`Packet Intelligence Ex. 2039 Page 9 of 19
`
`

`

`U.S. Patent
`
`Apr. 27, 2010
`
`Sheet 8 of 10
`
`US 7,706,357 B1
`
`
`
`501
`
`Receive Oata Packet
`Length
`
`5O2
`
`Add Length of Data Packet
`to Counter
`
`Packet Intelligence Ex. 2039 Page 10 of 19
`
`

`

`U.S. Patent
`
`Apr. 27, 2010
`
`Sheet 9 of 10
`
`US 7,706,357 B1
`
`Receive "Data Packet
`Ready" indication
`
`801
`
`607
`
`602
`
`
`
`803
`
`
`
`Analyze Counters to
`Determine Which Packet
`Processor Has the Lowest
`Counter Walue
`
`606
`
`
`
`or More Packet
`Processors Have The
`identical Lowest
`CounterValue?
`
`605
`
`
`
`
`
`
`
`
`
`Allocate Data
`Packet to Packet
`Processor With
`Lowest Counter
`Value
`
`
`
`
`
`
`
`
`
`Allocate Data
`Pac
`ket Randomly
`to a Packet
`Processor With
`The Lowest
`Counter Walue
`
`610 -/
`
`
`
`Decrement Counters to
`Compensate For Bandwidth
`Pulied From Packet Processors
`"
`" "
`"
`"
`
`Update Counters by
`incrementing Counter Associated
`With Packet Processor to Which
`Packet Was Transferred
`
`611
`
`613
`
`Transfer Data Packet to
`Packet Processor to
`Which it as Been
`Allocated
`
`F.G. 6A
`
`Packet Intelligence Ex. 2039 Page 11 of 19
`
`

`

`U.S. Patent
`
`Apr. 27, 2010
`
`Sheet 10 of 10
`
`US 7,706,357 B1
`
`Read Each Counter
`
`
`
`
`
`Compare Counter
`Walues to Determine
`Lowest Counter Walue
`
`F.G. 6B
`
`Packet Intelligence Ex. 2039 Page 12 of 19
`
`

`

`US 7,706,357 B1
`
`1.
`BANDWIDTH DIVISION FOR PACKET
`PROCESSING
`
`RELATED APPLICATION
`
`This application is a continuation of U.S. patent applica
`tion Ser. No. 09/534,838 filed Mar. 24, 2000 which is incor
`porated herein by reference.
`
`TECHNICAL FIELD
`
`This invention relates generally to data routing systems,
`and more particularly to a method and apparatus for forward
`ing data packets to processors.
`
`BACKGROUND
`
`10
`
`15
`
`2
`ventional router typically has to be modified or redesigned.
`The process of modifying a router to increase bandwidth
`capability entails tedious design processes involving risk that
`the new design will not perform as intended—(or at all),
`outlay of resources—(both monetary and human), as well as
`time delays.
`
`SUMMARY
`
`In one aspect the invention provides a bandwidth divider
`for allocating bandwidth between a plurality of packet pro
`cessors. The bandwidth divider comprises a plurality of
`counters for measuring the bandwidth of data packets trans
`ferred from the bandwidth divider to a respective packet
`processor, and a controller for analyzing the plurality of
`counters and transferring a data packet to a selected packet
`processor based on the contents of the counters.
`The bandwidth divider may also include a plurality of
`interfaces, each coupled to an input and output stream where
`there is a counter and/or queues for each input stream/packet
`processor combination. The packet processor may be a packet
`forwarding engine. The counter may indicate the level of
`bandwidth consumption of a packet processor, and Such indi
`cation of bandwidth consumption may be decremented over
`time in accordance with a decrement engine employing a
`half-life decay or other function. The indication of level of
`bandwidth consumption may also be normalized after each
`packet is processed using a normalizing engine, and may be
`normalized such that the lowest indication for all counters is
`0. In a system measuring bandwidth consumption, the con
`troller may transfera data packet to the packet processor with
`the lowest bandwidth consumption or, if the controller deter
`mines that a plurality of packet processors have an identical,
`lowest bandwidth consumption, the controller may use a ran
`dom selector, transferring the data packet randomly (using for
`example a Linear Feedback Shift Register function) to one of
`the plurality packet processors having the lowest bandwidth
`consumption. In another aspect, the invention provides a
`router, which in turn comprises: a plurality of bandwidth
`dividers for receiving a first set of inputStreams and providing
`a first set of output streams; a plurality of packet processors
`for receiving the first set of output streams from the band
`width dividers and providing a second set of input streams; a
`plurality of counters for monitoring the flow of data from the
`bandwidth dividers to the packet processors; a controller for
`monitoring the counters and allocating the streams of data
`between the packet processors; and a plurality of cross-bars
`for receiving the second set of input streams from the packet
`processors, multiplexing the second set of input streams, and
`providing a second set of output streams.
`In another aspect, the invention provides a method of
`directing data packets to a plurality of packet processors. The
`method comprises the steps of monitoring the bandwidth
`consumed by the packet processors; determining, based on
`the bandwidth consumed by the packet processors, which
`packet processor has consumed the least amount of band
`width; and allocating a next data packet to the packet proces
`sor which has consumed the least amount of bandwidth.
`Aspects of the invention include one or more of the follow
`ing features. The method of directing data packets may
`include the step of incrementing counters to track the band
`width consumed by the packet processors, wherein such step
`may include incrementing one counter for each input and
`output pair to track the bandwidth consumed by the packet
`processors. The determining step may include (1) comparing
`the counters to ascertain the counter with the lowest value
`and/or (2) determining if two or more counters have the
`
`25
`
`A data packet is a variable size unit of communication in a
`network. A router is a Switching device that receives packets
`containing data or control information on one port, and based
`on destination or other information contained within the
`packet, routes the packet out another port to the destination
`(or intermediary destination).
`Conventional routers perform this switching function by
`evaluating header information contained within a first data
`block in the packet in order to determine the proper output
`port for a particular packet.
`Referring now to FIG. 1a, one type of conventional router
`includes a plurality of input ports 2 each including an input
`buffer (memory) 4, a switching device 6 and a plurality of
`30
`output ports 8. Data packets received at an input port 2 are
`stored at least temporarily in input buffer 4 while destination
`information associated with each packet is decoded to deter
`mine the appropriate Switching through the Switching device
`6.
`Another type of conventional router is referred to as a
`“non-blocking router. Referring now to FIG. 1b, a conven
`tional “non-blocking router includes a plurality of input
`ports 2 each including an input buffer (memory) 4, a Switch
`ing device 6 and a plurality of output ports 8 each having an
`output buffer (memory) 9. In order to avoid blocking condi
`tions, each output port 8 is configured to include an output
`buffer 9. Each output port can simultaneously be outputting
`packets as well as receiving new packets for output at a later
`time. Typically the output buffer 9 is sized to be sufficiently
`large. Such that no data packets are dropped.
`Conventional routers, including the routers of FIGS. 1 a
`and 1b, include buffers that are sized to support a particular
`bandwidth (B). If the input bandwidth is too high, the router
`will drop data. The amount of input bandwidth is dependent
`on a number of factors including: the line input rate, the speed
`of the look-up process, and the blocking characteristics for
`the switching device. Input bandwidth also relates to the
`processing power of the packet processor, where the process
`ing power is related to: (1) the delay bandwidth memory, (i.e.,
`more memory is required for bigger and faster systems); and
`(2) the packet lookup power, (i.e., the ability to determine
`where to route packets).
`A key problem in designing routers is to make them Scale
`to large aggregate bandwidth. Building larger monolithic sys
`tems is made difficult by hard technology limits on the inte
`grated circuits in these systems. In addition, long develop
`ment times for the redesign of a whole system prohibit
`internet service providers from keeping up with the growth of
`bandwidth demand. To process a larger amount of bandwidth
`in a single system (i.e., a bandwidth of an amount N*B where
`N is a positive integer), the size and configuration of a con
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Packet Intelligence Ex. 2039 Page 13 of 19
`
`

`

`US 7,706,357 B1
`
`3
`identical, lowest value; and if two or more counters have the
`identical, lowest value, allocating the data packet randomly as
`between the packets with the identical, lowest value. The
`method may include decrementing the counters over time,
`using for example a half-life decay function. The method may
`also include the step of normalizing the counters, for example
`by subtracting the value of the lowest counter from all counter
`values.
`Aspects of the invention may include one or more of the
`following advantages. A system is provided for processing
`B*N amount of bandwidth in a single router without altering
`the size of the packet processors, where a packet processor is
`a receiver and processor of data packets. Each packet proces
`sor can be of the form of a packet forwarding engine (PFE).
`A PFE is a routing core with delay bandwidth memory and a
`route lookup function.
`A distributed system is provided where each bandwidth
`divider runs the same algorithm. As such, the bandwidth
`consumed by the packet processors is balanced. Because the
`system is distributed and is not controlled by a centralized
`controller, the system is more Scalable and is more fault
`tolerant (i.e., a failure of any given bandwidth divider will
`only affect the input streams directly connected to that band
`width divider).
`The invention allows multiple packet processors to be
`seamlessly integrated into a system Such that they perform the
`function of a single router. The invention Supports the inter
`connection of multiple packet processors, increasing system
`bandwidth without expending resources to develop a new
`router or incurring the risks associated with designing and
`developing a higher capacity router. The invention allows a
`designer to build a system that supports N times the band
`width of a single packet processor by using numerous cur
`rently existing packet processors combined with a single
`bandwidth divider.
`35
`The invention further allows for all the benefits of a com
`mon pool of memory to be achieved while maintaining a
`queue organization that does not exhibit head-of-line (HOL)
`blocking problems, i.e., the condition defined by the inability
`to send a packet which is ready for transmission because the
`device is waiting for another packet to become ready for
`transmission.
`The details of one or more implementations of the inven
`tion are set forth in the accompanying drawings and the
`description below. Other features, objects, and advantages of
`45
`the invention will be apparent from the description and draw
`ings, and from the claims.
`
`10
`
`15
`
`25
`
`30
`
`4
`FIGS. 6a and 6b are flow charts indicating the functions
`performed by a controller of FIG.2a.
`Like reference symbols in the various drawings indicate
`like elements.
`
`DETAILED DESCRIPTION
`
`Referring to FIG. 2a, data routing system 100 includes a
`bandwidth divider (BD) 200 for spraying packets amongst a
`plurality of packet processors, 210-213. In one implementa
`tion, BD 200 includes a plurality of bandwidth divider inte
`grated circuits (bandwidth divider chips BD0-BD7), 201
`208, each of which include a controller 240-0 to 240-7 and an
`IO counter 230-0 to 230-7. Bandwidth divider chip BD0, 201,
`is connected to input port 0 of each of packet processors
`(210-213). Similarly, bandwidth divider chip BD1 202 is
`connected to input port 1 of each of packet processors 0-4
`(210-213). Bandwidth divider chips BD2-BD7 (203-208) are
`similarly connected. The input to each bandwidth divider
`chip (BD0-BD7), 201-208, is received by and stored in BD
`input queues 235-0 to 235-7. In one implementation, the BD
`input queues 235-0 to 235-7 buffer the data packets until they
`have been completely received and until a decision is made as
`to where to route the packets. In this implementation, the BD
`chips 201-208 act as store-and-forward devices, i.e. the BD
`chips 201-208 will store a whole packet before routing that
`packet to a packet processor. Packet length is determined
`when the end of a packet is received. As such, the decision
`about where a packet is to be sent is made after the whole
`packet has been received. One advantage of the BD200 being
`a store-and-forward device is that by postponing a forwarding
`decision until the packet length is known, the load is opti
`mally balanced amongst the packet processors. In one imple
`mentation, the memory in a BD chip 201-208 is sized to hold
`a maximum sized packet in all of its BD input queues, 235-0
`to 235-7.
`In one implementation separate queues are maintained for
`each input stream/packet processor pair, e.g. if the number of
`input streams is 16 and the number of packet processors to
`which the BD chip 201-208 is connected is 4, the total number
`of queues will be 64. This configuration eliminates HOL
`blocking.
`When a packet has been received in its entirety, the packet
`is assigned to a queue in the BD 200 based on its input stream
`and packet processor destination, e.g. all packets from stream
`3, which are routed to packet processor 0, 210, are routed to
`the same queue. Thus, in one implementation, the minimum
`total memory size will be: (Number of queues)*MTU, where
`MTU is the maximum transfer unit defined as the maximum
`packet size plus some Smaller amount of extra storage pro
`portional to the delay associated with starting to send a packet
`after the packet has been received in its entirety.
`Since input streams can be of varying speeds, in one imple
`mentation it would be efficient to have a common pool of
`memory for all data, rather than dedicated memory for each
`queue. A common pool of memory means that any data cell in
`memory can be used by any input stream. Since the total
`bandwidth of all input streams combined is fixed, this allows
`the BD 200 to allocate memory in the common memory pool
`based on the actual speed of the stream rather than the worst
`case maximum (which would be one stream using all of the
`input bandwidth).
`Thus, the common memory pool can be organized as fixed
`sized data quantities (32 bytes per cell) and queues can be
`organized as linked lists of data. In a linked list each data cell
`includes the address or pointer to the next data cell of the
`
`40
`
`DESCRIPTION OF DRAWINGS
`
`FIGS. 1a and 1b are block diagrams of conventional router
`devices.
`FIG.2a is a schematic block diagram of an implementation
`of a data routing system.
`FIG.2b is a flow chart illustrating the flow of data through
`the data routing system.
`FIG.2c illustrates a graphical representation of a half-life
`function.
`FIG.3 is a schematic block diagram of abandwidth divider
`according to one implementation of the present invention,
`connected to four high speed Switching devices and eight
`output crossbars.
`FIG. 4 is a flow chart indicating the steps performed by a
`bandwidth divider of FIG. 2a.
`FIG. 5 is a flow chart indicating the steps performed by an
`IO counter of FIG. 2a.
`
`50
`
`55
`
`60
`
`65
`
`Packet Intelligence Ex. 2039 Page 14 of 19
`
`

`

`5
`queue associated with the packet such that, when the packet is
`read out of memory, the reader knows the location of the next
`data cell.
`One problem with making a forwarding decision after the
`entire packet has been received is that the BD 200 does not
`know to which queue a packet is assigned until after the
`packet has been received. In one implementation, the BD 200
`Solves this problem without using extra memory. In one
`implementation, the global memory is organized into cells,
`each of which can hold 32-bytes of data and each of which
`stores a pointer to the next cell. A packet (which is typically
`greater than 32 bytes) may occupy multiple cells that are
`linked as described above. Each queue consists of a linked list
`of packets that are linked together. The linked lists of the
`packets in the queue create a consecutive linked list. At the
`time that a packet is received, the packet is written into
`memory, the cells of the packet are linked together (not yet
`assigned to a queue), and the address of the first cell of the
`packet is saved. When the packet has been entirely received,
`the BD makes a decision as to which queue (i.e., as to which
`packet processor) the packet should be assigned. The BD then
`takes the saved address of the first cell of the packet and writes
`the saved address into the link information field of the last cell
`of the last packet of the appropriate queue. The packet is then
`linked to the queue and the packet reader follows the linked
`list of the queue. Referring again to FIG.2a, controllers 240-0
`to 240-7 read data from the BD input queues 235-0 to 235-7.
`Within each bandwidth divider chip 201-208, each controller,
`240-0 to 240-7, is connected to an IO counter, 230-0 to 230-7.
`The IO counter 230-0 to 230-7 includes an array of counters
`and a counter controller 231-0 to 231-7. In one implementa
`tion, the number of counters can be equal to the number of
`input streams to the BD chips 201-208 multiplied by the
`number of packet processors. In the configuration shown
`where BDO includes 8 input streams and four packet proces
`sors, IO counter 230 includes an array of 32 counters. The
`counter controllers 231-0 to 231-7 monitor each stream of
`data. Each counter's count reflects the flow of data between
`the respective BD chip data stream and a packet processor.
`For example, the first counter in IO counter 230-0 tracks the
`flow of data from the first stream into BD0, 201, to packet
`processor 0, 210, and the last counter in IO counter 230-7
`tracks the flow of data from the last stream in BD7, 208, to
`packet processor 3, 213.
`In one implementation, the size of each counter is slightly
`larger than the largest packet potentially sent to any BD chip
`201-208. For example, the size of each counter could be
`chosen to be log(MTU-4) bits, where MTU is the maximum
`transfer unit defined above. This counter size provides a com
`50
`fortable margin for the transfer of packets through the system.
`IO counters 230-1 to 230-7 and controllers 240-1 to 240-7
`are interconnected allowing controllers 240-0 to 240-7 to read
`and manipulate the counters.
`FIG.2b illustrates the flow of data. Bandwidth divider 200
`receives data packets into the BD input queues 235-0 to 235-7
`(255). When each packet is ready for transmission (260), the
`bandwidth divider chip 201-208 receiving the data packet
`sends a data packet ready signal to its respective controller
`240-0 to 240-7 (265). The controller 240-0 to 240-7 reads the
`values of the counters in the respective IO counter 230-0 to
`230-7, (270) and determines which packet processor has the
`lowest counter value (indicating that the packet processor
`associated with that counter has consumed the least amount
`of bandwidth) (275). The controller 240-0 to 240-7 then allo
`cates the data packet to the packet processor with the lowest
`counter value (280).
`
`6
`After the data packet is allocated, two events occur (in no
`particular order). First, the data packet length is read by the IO
`counter's countercontroller 231-0 to 231-7 (285). The appro
`priate counter is incremented by the length of the data packet
`(measured in bytes) (290). Second, the data packet is trans
`ferred to the packet processor to which the data packet was
`allocated (295).
`In one implementation, the counters are then decremented
`using a decrement engine employing a decay function. This
`decrementation is performed to ensure that the counters
`approximately reflect the amount of bandwidth currently
`being processed by the packet processors.
`For example, consider a large sized packet (Xbytes) sent to
`a first packet processor by BDO, 201, a long time ago. Such
`that the packet has already been forwarded out of the packet
`processor. If IO counter 230-0 was not adjusted to reflect the
`bandwidth currently being processed by the packet proces
`sors, packet processor 0 would appear more loaded than the
`other packet processors. Now, Suppose a burst of minimum
`sized packets is received by the bandwidth divider. If the
`packet processor 0 counter was not adjusted, the BD 200
`would not send any of the new packets to the first packet
`processor until the number of bytes received by the other
`packet processors reached the number of bytes originally
`received and processed by the first packet processor (i.e., X
`bytes). This would mean that only N-1 packet processors
`would be available to absorb the burst of small packets. The
`load on the packet processors would not actually be balanced,
`resulting in performance degradation of the system.
`When the first packet processor is out of parity with the
`others, undesirable packet reordering can occur once the
`packet processors achieve parity. Those packet processors
`most recently used, 211-213 in this example, will behave
`sluggishly compared to the first packet processor 210.
`Although any type of decay function can be used, one decay
`function is a half-life function shown in FIG.2c. The half-life
`function is defined mathematically by:
`
`T2 = --
`
`where T is the half life, ln is the natural logarithm, and w is
`the decay constant. The half-life function approximates a
`decay in which, over each period of time T, the number of
`bytes in a counter decreases by half. In a second time interval
`T, the bandwidth again decreases by half. Such that the
`number of bytes remaining after Successive intervals is /2, /4,
`/s, and so forth. The decay constant, w, which dictates the rate
`of decay, can be chosen based on the properties of the packet
`processors. The decay constant can be implemented in the IO
`counters 230-0 to 230-7 using a programmable register.
`The value of the half-life interval, T. can be proportional
`to the delay through a PFE. A decay constant should be
`selected that is not so Small that information regarding the
`state of the packet processors or PFE is lost too quickly.
`Similarly, the decay constant should not be so large as to
`eviscerate the role of the decay function.
`In another implementation, the counters are normalized
`using a normalization engine by Subtracting the lowest
`counter value from all counter values such that the lowest
`counter value is Zero. Normalizing prevents wrap around
`errors, i.e., errors that occur when a counter reaches its maxi
`mum value and “wraps around to 0. Comparisons between
`counters can still be made even if the counters are allowed to
`
`US 7,706,357 B1
`
`10
`
`15
`
`25
`
`30
`
`35
`
`40
`
`45
`
`55
`
`60
`
`65
`
`Packet Intelligence Ex. 2039 Page 15 of 19
`
`

`

`US 7,706,357 B1
`
`5
`
`10
`
`15
`
`7
`wrap-around. A comparison routine would, however, need to
`compensate for the wrap arounds.
`The implementation of FIG.2a operates to balance the load
`between the various packet processors. For example, in a
`system with two packet processors, if the first packet proces
`Sor receives a large packet, the next few smaller packets are
`forwarded to the second packet processor.
`FIG. 3 illustrates the bandwidth divider 200 of FIG. 2a
`connected to four PFEs, 20-0 to 20-3, which in turn are
`connected to output crossbars X-X, 320-327. The band
`width divider 200 operates in the same manner as described
`with respect to FIG.2a.
`The output ports 0-7 of each of the PFEs, 20-0 to 20-3, are
`connected to the respective output crossbars X-X, 320-327
`(such that each PFE output port 0 is connected to output
`crossbar X 320, each output port 1 is connected to X' 321,
`and so forth). The output crossbars send data out in the order
`in which the crossbars receive the data (typically using a First
`In, First Out (“FIFO) system).
`FIG. 4 is a flow chart illustrating the steps performed by the
`bandwidth divider chips, 201-208. A BD chip 201-208
`receives a data packet as an input (401). The BD chip 201-208
`then sends a "data packet ready signal to their respective
`controller 240-0 to 240-7 (403). At this stage, the BD chip is
`in a state of stasis until it receives a signal back from the
`controller 240-0 to 240-7. After a certain period of time, i.e.
`after the controllers 240-0 to 240-7 allocate the data packet,
`the BD chip receives a command from the controller to trans
`fer the data packet to a certain one of the packet processors
`210-213 (404). The BD chip then transfers the data packet as
`instructed (405).
`FIG. 5 is a flow chart illustrating the operation of the IO
`counters 230-0 to 230-7. Once controller 240-0 to 240-7
`allocates the data packet to a certain one of the packet pro
`cessors 210-213, the size of the data packet is determined and
`stored in the appropriate one of

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket