`
`1111111111111111111111111111I112,1111°1111o111jF111111111111111
`
`(19) United States
`(12) Patent Application Publication (10) Pub. No.: US 2002/0023168 Al
`Feb. 21, 2002
`Bass et al.
`(43) Pub. Date:
`
`(54) METHOD AND SYSTEM FOR NETWORK
`PROCESSOR SCHEDULING BASED ON
`SERVICE LEVELS
`
`(75)
`
`Inventors: Brian Mitchell Bass, Apex, NC (US);
`Jean Louis Calvignac, Cary, NC (US);
`Marco C. Heddes, Cary, NC (US);
`Michael Steven Siegel, Raleigh, NC
`(US); Fabrice Jean Verplanken, La
`Gaude (FR)
`
`Correspondence Address:
`Joscelyn G. Cockburn
`IBM Corporation 2Y7/B656
`PO Box 12195
`Research Triangle Park, NC 27709 (US)
`
`(73) Assignee: International Business Machines Cor-
`poration, Armonk, NY
`
`(21) Appl. No.:
`
`09/834,141
`
`(22) Filed:
`
`Apr. 12, 2001
`
`Related U.S. Application Data
`
`(63) Non-provisional of provisional application No.
`60/196,831, filed on Apr. 13, 2000.
`
`Publication Classification
`
`(51) Int. C1.7
`
` GO6F 15/16
`
`(52) U.S. Cl.
`
`709/232
`
`(57)
`
`ABSTRACT
`
`A system and method of moving information units from an
`output flow control toward a data transmission network in a
`prioritized sequence which accommodates several different
`levels of service. The present invention includes a method
`and system for scheduling the egress of processed informa-
`tion units (or frames) from a network processing unit
`according to service based on a weighted fair queue where
`position in the queue is adjusted after each service based on
`a weight factor and the length of frame, a process which
`provides a method for and system of interaction between
`different calendar types is used to provide minimum band-
`width, best effort bandwidth, weighted fair queuing service,
`best effort peak bandwidth, and maximum burst size speci-
`fications. The present invention permits different combina-
`tions of service that can be used to create different QoS
`specifications. The "base" services which are offered to a
`customer in the example described in this patent application
`are minimum bandwidth, best effort, peak and maximum
`burst size (or MBS), which may be combined as desired. For
`example, a user could specify minimum bandwidth plus best
`effort additional bandwidth and the system would provide
`this capability by putting the flow queue in both the NLS and
`WFQ calendar. The system includes tests when a flow queue
`is in multiple calendars to determine when it must come out.
`
`Flow Flow
`0
`1
`
`511
`511
`
`511
`
`511
`511
`511
`511
`
`511
`
`FlowID
`
`241
`
`511
`
`Flow
`2047
`
`1210
`
`5
`51
`51
`
`FlowID
`
`251
`
`250
`
`255
`
`Wrap
`
`
`
`11
`
`0
`Peak Bandwidth
`Shaping
`3
`
`0
`WFQ
`Port 39
`
`4
`
`260
`
`232
`
`240/".
`
`0
`WFQ
`Port 0
`
`231
`
`FlowID
`
`FlowID
`
`0
`NLS
`
`
`12
`
`221
`FlowID
`
`7
`220
`
`230
`
`0
`LLS
`
`270
`
`260a
`
`Discard
`
`3
`
`j
`
`Port Port
`Port Port
`38
`39
`1
`0
`High Priority Target Port Queues
`1
`
`260b
`
`272
`
`Port Port
`Port Port
`39
`1
`35
`0
`LorPrionty Target Port Queues
`
`V_Vjap
`
`I
`
`Juniper - Exhibit 1006, page 1
`
`Cloudflare - Exhibit 1042, page 1
`
`
`
`Patent Application Publication
`
`IV 89I£Z00/ZOOZ Sfl
`
`4
`
`f
`22 — I-1 DASL A I DASL B ..„,24
`„ 20
`SIF
`SDM-UP
`SWITCH DATA
`MOVER
`
`16
`
`/
`
`EDS-UP
`
`ENQUEUE
`DEQUEUE
`SCHEDULING
`
`18/
`
`INTERNAL
`S-RAM's
`
`EPC
`
`INTERNAL
`S-RAM
`
`DATA
`STORE
`
`12
`
`EMBEDDED
`PROCESSORS
`COMPLEX
`
`D-RAMS's (2 to 7)
`S-RAM (1)
`
`4
`4 1,28
`26--I DASIL A I DASiL B r
` 30
`
`SIF
`SDM-DN
`SWITCH DATA
`MOVER
`
`34
`
`/
`
`EDS-DN
`
`FIG. 1
`
`/ 10
`
`32
`
`ENQUEUE
`DEQUEUE
`
`40\
`
`TRAFFIC MGT
`SCHEDULER
`
`PMM - UP A
`MULTIPLEXED
`B
`MAC'S
`A
`L
`
`I
`
`/
`14
`
`C
`
`
`D
`
`W
`
`PMM - DN
`MULTIPLEXED
`MAC'S
`
`36 -1--
`
`DMU BUS
`
`ENET PHY - ATM FRAMER
`
`DMU BUS
`38
`
`D-RAMS's (4)
`DATA
`STORE
`I
`
`Juniper - Exhibit 1006, page 2
`
`Cloudflare - Exhibit 1042, page 2
`
`
`
`Patent Application Publication
`
`CI .0 Z lamIS
`
`IV 89ICZOO/ZOOZ Sfl
`
`On-Chip Memories
`
`Off-Chip Memories
`
`HO
`-I -
`
`HI
`7
`
`H2
`7
`
`H3
`- 1-
`
`D2
`D1
`H4
`S DO
`7
`7
`1-
`77
`TSM Arbiter
`
`D3
`7
`
`D4
`7
`
`D5
`7
`
`D6
`7
`
`Up
`Enqueue
`
`Interrupts
`EPC Freeze
`Exception
`
`PowerPC
`A
`
`/ 120
`Completion Unit
`
`Label
`
`-Or DN Enqueue
`
`Debug, Interrupts &
`Single Step Control
`
`LUDefTable
`CompTable
`FreeQs
`110
`
`1
`
`V
`
`EPC WEB
`Control
`T
`i
`Internal
`EPC WEB
`
`V
`DN
`Scheduler
`40
`
`FIG. 2
`
`DN
`DS
`
`\ 116
`
`UP
`DS
`i/f
`(Rd+Wr)
`
`.4-8.-
`-42—..-
`
`UlD
`DS
`i/f
`
`GxH
`
`118
`
`..--...- Dn
`..---..-
`DS
`i/f
`
`Instruction
`Memory
`
`HW
`Classi
`fier
`
`WEB Arbiter,
`WEB Watch i/f
`..--112
`
`Dispatcher
`
`Up
`DS
`i/f
`(Rd)
`
`V
`36
`PMM-DN 1"---
`
`PHY
`
`F- 38
`
`Juniper - Exhibit 1006, page 3
`
`Cloudflare - Exhibit 1042, page 3
`
`
`
`Patent Application Publication
`
`CI Jo C 13311S
`
`IV 89ICZOO/ZOOZ Sfl
`
`Flow Flow
`0
`1
`
`FIG. 3
`
`Flow
`2047
`
`1210
`
`250
`
`FlowID
`
`251
`
`Wrap
`
`511
`511
`511
`511
`
`511
`
`FlowID
`
`—241
`
`511
`
`221
`FlowID
`
`7"
`220
`
`0
`LLS
`
`i
`
`270
`—/
`
`232
`
`FlowID ----
`
`FlowID
`
`7-
`240
`
`230/. 0
`NLS
`
`2
`
`-.
`
`231
`
`0
`WFQ
`Port 0
`
`0
`WFQ
`Port 39
`
`I
`4
`t
`
`260a
`
`260
`- b-
`
`r 5-
`
`Discard
`—1---
`3
`
`i'
`
` 1
`Port Port
`Port
`Port
`1
`38
`0
`39
`High Priority Target Port Queues
`1
`
`Port Port
`1
`0
`LoWi
`
`Juniper - Exhibit 1006, page 4
`
`0
`Peak Bandwidth
`Shaping
`3
`r
`
`260b
`
`272
`
`\-... I
`
`Port Port
`38
`39
`riority Target Port Quet.-We'S- I .
`
`Wrap
`
`Cloudflare - Exhibit 1042, page 4
`
`
`
`Patent Application Publication
`
`IV 89I£Z00/ZOOZ Sfl
`
`„...---308
`
`.7.-306
`
`Epoch 3
`Step=4096*150ns/512B
`
`Epoch 2
`Step=25615Ons/512B
`
`z .--304
`Epoch 1
`Step=16*15Ons/512B
`
`FIG. 4
`
`511
`
`511
`
`.7.-- 302
`
`Epoch 0
`Step=150ns/512B
`
`511
`
`511
`
`FIowID
`LIFO
`
`0
`
`FIowID
`LIFO
`
`0
`
`Ow
`
`FIowID
`LIFO
`
`0
`
`FIowID
`LIFO
`
`0
`
`312 ---.
`
`Current Pointer
`
`Current Pointer
`
`Current Pointer
`
`Current Pointer
`
`1
`
`2
`1
`
`7 „---320
`
`Current Time
`
`...--
`
`( -330
`Scheduler Tick
`
`1
`absolute
`priority
`selection
`
`1
`1
`
`0
`
`Juniper - Exhibit 1006, page 5
`
`Cloudflare - Exhibit 1042, page 5
`
`
`
`Patent Application Publication Feb. 21, 2002 Sheet 5 of 13
`
`US 2002/0023168 Al
`
`FIG. 5
`
`Service for flow queue not in any calendar
`
`(Packet enqueue)
`
`to a flow queue
`i
`list
`management
`action:
`enqueue frame
`
`1109
`
`7
`
`'i
`
`1100
`
`Yes
`
`1110
`
`Yes
`
`1111
`
`1122
`
`
`
`Yes
`i
`
`Add pointer
`to NLS
`calendar at
`current time
`
`Add pointer
`to LLS
`calendar at
`current time
`
`1113
`N
`
`Configuration
`error
`
`Add pointer to WFQ at
`location specified by WFQ
`Distance Calculation for
`Flow Startup
`
`,1120
`
`Set
`NextGreenTime to
`current time
`
`1121
`/
`
`RR.V=O
`QinUse=1
`
`Juniper - Exhibit 1006, page 6
`
`Cloudflare - Exhibit 1042, page 6
`
`
`
`Patent Application Publication Feb. 21, 2002 Sheet 6 of 13
`
`US 2002/0023168 Al
`
`No
`
`FIG. 6
`
`No
`
`Add pointer
`to LLS
`calendar at
`current time
`
`1215N
`
`Add pointer to
`LLS calendar at
`NextRedTime
`
`1211
`
`NextRedTime
`LATER current
`time
`i
`
`/ 1212
`
`No
`I
`
`Add pointer
`to NLS
`calendar at
`current time
`
`Add pointer to
`NLS calendar at
`NextRedTime
`\ 1213
`
`1217
`/
`
`RR V=0
`
`Update
`MBSCredit:
`Credit earned
`
`QinRed=1
`
`p218
`
`1219
`
`1100
`
`QinUse=1
`
`I
`Yes
`
`From FIG. 5
`
`1200
`
`SSD.V=0
`
`QinRed=1
`
`To FIG. 8
`
`1200a
`
`PSD.V=0
`
`Yes
`
`1203
`
`I
`No
`I
`
`QinBlue=1
`
`QinBlue=1
`
`No
`
`1201
`
`No
`
`1202
`
`PSD.V=O
`
`1
`No
`
`current time
`LATER
`NextGreenTime
`
`1221
`N
`
`Set
`NextGreenTime to
`current time
`
`1220
`
`No
`
`QinGrn=1
`
`1204
`
`Yes
`
`To FIG. 7
`
`1205
`
`/
`Add pointer to WFQ at
`location specified by
`WFQ Distance
`Calculation for Flow
`Startup
`
`Timestamps are valid, but the flow queue is not in all of the
`calendar's it needs to be in.
`
`Juniper - Exhibit 1006, page 7
`
`Cloudflare - Exhibit 1042, page 7
`
`
`
`Patent Application Publication Feb. 21, 2002 Sheet 7 of 13
`
`US 2002/0023168 Al
`
`FIG. 7
`
`1202
`
`V
`Add pointer to WFQ at
`location specified by
`WFQ Distance
`Calculation for Flow
`Startup
`
`1305
`
`7
`
`1303
`
`Add pointer to
`PBS at
`NextGreenTime
`
`V
`
`7 1306
`
`QinBlue=1
`
`1304 -....,...
`
`I
`
`QinGrn=1
`
`Peak component is not in calendar, there is no
`minimum bandwidth component
`
`Juniper - Exhibit 1006, page 8
`
`Cloudflare - Exhibit 1042, page 8
`
`
`
`Patent Application Publication Feb. 21, 2002 Sheet 8 of 13
`
`US 2002/0023168 Al
`
`FIG. 8
`
`Add pointer to WFQ at
`location specified by
`WFQ Distance
`Calculation for Flow
`Startup
`
`Add pointer to WFQ at
`location specified by
`WFQ Distance
`Calculation for Flow
`Startup
`
`Add pointer to
`PBS at
`NextGreenTime
`
`QinBlue=1
`
`QinGrn=1
`
`Peak is not in calendar; minimum bandwidth component is in calendar.
`
`Note that NextGreenTime stamp is examined to determine attachement to either WFQ or PBS
`
`If MBS is specified, the MBSCredit is examined and if positive, adding the flow queue to either the WFQ or
`the PBS is permitted.
`
`Juniper - Exhibit 1006, page 9
`
`Cloudflare - Exhibit 1042, page 9
`
`
`
`Patent Application Publication Feb. 21, 2002 Sheet 9 of 13
`
`US 2002/0023168 Al
`
`515
`
`\a
`
`dd pointer to
`WFQ at
`location
`specified by
`WFQ
`distance
`calculation
`
`FIG. 9
`
`list
`management
`action:
`dequeue frame
`
`QinBlue=0
`
`(Flow Queue)
`
`Service
`
`Each calendar type is serviced
`(only one per scheduler tick).
`flow queues are reattached
`only if there are sufficient
`frames in the flow queue.
`flow queues added into WFQ
`with peak bandwidth components
`get the NextGreenTime
`stamp updated.
`
`520\
`
`update
`NextGreenTime
`using
`NextGreenTime
`calculation for
`flow not
`in violation
`
`add pointer to
`WFQ at
`location
`specified by
`WFQ
`distance
`calculation
`
`517
`add pointer to PBS at
`time specified by
`NextGreenTime
`calculation for flow in
`violation
`
`518\
`
`update
`NextGreenTime using
`NextGreenTime
`calculation for flow in
`violation
`
`Juniper - Exhibit 1006, page 10
`
`Cloudflare - Exhibit 1042, page 10
`
`
`
`Patent Application Publication Feb. 21, 2002 Sheet 10 of 13 US 2002/0023168 Al
`
`FIG. 10
`
`Row queue service
`when QCnt = 0 at
`start of service
`
`543
`
`Yes
`
`548
`
`Yes
`
`From FIG. 9
`
`Yes
`
`541
`
`Yes
`
`542
`
`/
`
`QinRed=0
`FrameCount=0
`QinUse=O
`{NextRedValid=0}
`
`/ 544
`
`QinBlue=0
`FrameCount=0
`
`/549
`
`QinGreen=0
`FrameCount=0
`{NextGrnValid=0}
`
`545
`
`No
`
`547
`
`(NextGrnValid=0)
`
`Braced {} text indicates function
`necessary to support flow queue
`aging. NextRedValid and
`NextGrnValid are flags used only to
`support a flow queue aging design.
`(reference 542,545,546,547,549)
`
`Juniper - Exhibit 1006, page 11
`
`Cloudflare - Exhibit 1042, page 11
`
`
`
`Patent Application Publication Feb. 21, 2002 Sheet 11 of 13 US 2002/0023168 Al
`
`LLS or NLS Service
`
`FIG. 11
`
`list
`management
`action:
`dequeue frame
`
`No
`
`add pointer to
`LLS at time
`specified by
`distance
`calculation
`
`add pointer to
`NLS at time
`specified by
`distance
`calculation
`
`1109
`
`QinRed=O r
`
`V
`update
`NextRedTime
`and RedResidue
`as specified from
`distance
`calculation
`
`Juniper - Exhibit 1006, page 12
`
`Cloudflare - Exhibit 1042, page 12
`
`
`
`Patent Application Publication Feb. 21, 2002 Sheet 12 of 13 US 2002/0023168 Al
`
`FIG. 12
`
`WFQ Calendar
`When there is an assured bandwidth component,
`the flow queue is removed from the WFQ allowing
`the last frame to be serviced from the NLS/LSS
`calendar. If there is a peak bandwidth component,
`the NextGreenTime stamp is updated.
`
`601
`
`608
`\
`
`update
`NextGreetTime using
`NextGreenTime
`calculation for flow not
`in violation
`
`/ 607
`
`update
`NextGreetTime using
`NextGreenTime
`calculation for flow
`in violation
`
`Juniper - Exhibit 1006, page 13
`
`Cloudflare - Exhibit 1042, page 13
`
`
`
`Patent Application Publication Feb. 21, 2002 Sheet 13 of 13 US 2002/0023168 Al
`
`FIG. 13
`
`QinGreen=0 r
`
`1306
`
`list
`management
`action:
`dequeue
`frame
`
`update
`NextGreenTime using
`NextGreenTime
`calculation when flow
`is not in violation
`
`Add pointer to WFQ at
`location specified by
`WFQ Distance
`Calculation for Flow
`Startup
`
`Service PBS Calendar. flow queue is always removed from this
`calendar. NextGreenTime is always recalculated. flow queue is
`placed into WFQ only when there is sufficient frames in the
`queue. If MBS is specified, a test for available credit is performed.
`
`1302
`
`No
`I
`
`1300
`
`Yes
`
`1318
`\
`
`QinBlue=1
`
`1303\
`
`I
`Add pointer to WFQ at
`location specified by
`WFQ distance
`calculation
`
`Juniper - Exhibit 1006, page 14
`
`Cloudflare - Exhibit 1042, page 14
`
`
`
`US 2002/0023168 Al
`
`Feb. 21, 2002
`
`1
`
`METHOD AND SYSTEM FOR NETWORK
`PROCESSOR SCHEDULING BASED ON SERVICE
`LEVELS
`
`and entitled "Method and System for Minimizing Conges-
`tion in a Network". This patent is sometimes referred to
`herein as the Flow Control Patent.
`
`CROSS REFERENCE TO RELATED PATENTS
`
`[0001] This patent relates to and claims the benefit of
`Provisional Patent Application Ser. No. 60/196,831 filed
`Apr. 13, 2000.
`
`[0002] The present invention is related to the following
`documents, all of which are assigned to the assignee of the
`present invention and which are specifically incorporated
`herein by reference:
`
`[0003] Patent application Ser. No. 09/384,691, filed Aug.
`27, 1999 by Brian Bass et al., entitled "Network Processor
`Processing Complex and Methods", sometimes referred to
`herein as the Network Processing Unit Patent or NPU
`Patent.
`
`[0004] U.S. Pat. No. 5,724,348 entitled "Efficient Hard-
`ware/Software Interface for a Data Switch" issued Mar. 3,
`1998, which patent is sometimes referred to herein as the
`Interface Patent.
`
`[0005] Patent application Ser. No. 09/330,968 filed Jun.
`11, 1999 and entitled "High Speed Parallel/Serial Link for
`Data Communications", sometimes referred to as the Link
`Patent.
`
`[0006] Various patents and applications assigned to IBM
`for its multiprotocol switching services, sometimes referred
`to as "MSS", some of which include Cedric Alexander as an
`inventor, and are sometimes referred to as the MSS Patents.
`
`[0007] Patent application Ser. No. 09/548,907 (Docket
`RAL9-00-0010) filed Apr. 13, 2000 by Brian M. Bass et al.
`and entitled "Method and System for Network Processor
`Scheduler". This patent is sometimes referred to herein as
`the Scheduler Structure Patent.
`
`[0008] Patent application Ser. No. 09/548,910 (Docket
`RAL9-00-0014) filed Apr. 13, 2000 by Brian M. Bass et al.
`and entitled "Method and System for Network Processor
`Scheduling Outputs Based on Multiple Calendars". This
`patent is sometimes referred to herein as the Calendar
`Scheduling Patent.
`
`[0009] Patent application Ser. No. 09/548,911 (Docket
`RAL9-00-0015) filed Apr. 13, 2000 by Brian M. Bass et al.
`and entitled "Method and System for Network Processor
`Scheduling Based on Calculation". This patent is sometimes
`referred to herein as the Calculation Patent.
`
`[0010] Patent application Ser. No. 09/548,912 (Docket
`RAL9-00-0017) filed Apr. 13, 2000 by Brian M. Bass et al.
`and entitled "Method and System for Network Processor
`Scheduling Outputs Using Queuing". This patent is some-
`times referred to herein as the Queuing Patent.
`
`[0011] Patent application Ser. No. 09/548,913 (Docket
`RAL9-00-0018) filed Apr. 13, 2000 by Brian M. Bass et al.
`and entitled "Method and System for Network Processor
`Scheduling Outputs using Disconnect/Reconnect Flow
`Queues. This patent is sometimes referred to herein as the
`Reconnection Patent.
`
`[0012] Patent application Ser. No. 09/546,651 (Docket
`RAL9-00-0007) filed April, 2000 by Peter I. A. Barri et al.
`
`[0013] Patent application Ser. No. 09/547,280 (Docket
`RAL9-00-0004) filed Apr. 11, 2000 by M. Heddes et al. and
`entitled "Unified Method and System for Scheduling and
`Discarding Packets in Computer Networks". This patent is
`sometimes referred to herein as the Packet Discard Patent.
`
`BACKGROUND OF THE INVENTION
`[0014] 1. Field of the Invention
`
`[0015] The present invention relates to communication
`network apparatus such as is used to link together informa-
`tion handling systems or computers of various types and
`capabilities and to components and methods for data pro-
`cessing in such an apparatus. The present invention includes
`an improved system and method for scheduling the distri-
`bution of information units from a flow control system
`coupled to a plurality of network processing unit toward a
`data transmission network through a PMM and MAC. More
`particularly, the present invention involves scheduling using
`a plurality of algorithms to handle a plurality of users who
`are processing variable size information packets or frames,
`providing an order to the frames being provided from the
`flow control system (which may be of the type described in
`the referenced Flow Control Patent) toward the data trans-
`mission network. The present invention includes a system
`for establishing and enforcing different types of service
`levels for the flows of different users.
`
`[0016] 2. Background Art
`
`[0017] The description of the present invention which
`follows is based on a presupposition that the reader has a
`basic knowledge of network data communications and the
`routers and switches which are useful in such network
`communications. In particular, this description presupposes
`familiarity with the International Standards Organization
`("ISO") model of network architecture which divides net-
`work operation into layers. A typical architecture based on
`the ISO model extends from a Layer 1 (which is sometimes
`referred to a "Ll") being the physical pathway or media
`through which signals are passed upward through Layers 2
`(or "L2"), 3 (or "L3"), and so forth to Layer 7 which is the
`layer of application programming resident in a computer
`system linked to the network. Throughout this document,
`references to such layers as Ll, L2, L3 are intended to refer
`to the corresponding layer of the network architecture. The
`present description also is based on a fundamental under-
`standing of bit strings used in network communication
`known as packets and frames.
`
`[0018] Bandwidth considerations (or the amount of data
`which a system can handle in a unit of time) are becoming
`important in today's view of network operations. Traffic
`over networks is increasing, both in sheer volume and in the
`diversity of the traffic. At one time, some networks were
`used primarily for a certain type of communications traffic,
`such as voice on a telephone network and digital data over
`a data transmission network. Of course, in addition to the
`voice signals, a telephone network would also carry a
`limited amount of "data" (such as the calling number and the
`called number, for routing and billing purposes), but the
`primary use for some networks had, at one point in time,
`been substantially homogenous packets.
`
`Juniper - Exhibit 1006, page 15
`
`Cloudflare - Exhibit 1042, page 15
`
`
`
`US 2002/0023168 Al
`
`Feb. 21, 2002
`
`2
`
`[0019] A substantial increase in traffic has occurred as a
`result of the increasing popularity of the Internet (a public
`network of loosely linked computers sometimes referred to
`as the worldwide web or "www.") and internal analogs of it
`(sometimes referred to as intranets) found in private data
`transmission networks. The Internet and intranets involve
`transmission of large amounts of information between
`remote locations to satisfy an ever-growing need for remote
`access to information and emerging applications. The Inter-
`net has opened up to a large number of users in geographi-
`cally dispersed areas an exploding amount of remote infor-
`mation and enabled a variety of new applications, such as
`e-commerce, which has resulted in a greatly-increased load
`on networks. Other applications, such as e-mail, file transfer
`and database access further add load to networks, some of
`which are already under strain due to high levels of network
`traffic.
`
`[0020] Voice and data traffic are also converging onto
`networks at the present time. Data is currently transmitted
`over the Internet (through the Internet Protocol or IP) at no
`charge, and voice traffic typically follows the path of lowest
`cost. Technologies such as voice over IP (VOIP) and voice
`over asynchronous transfer mode or ATM (VoATM) or voice
`over frame relay (VoFR) are cost-effective alternatives for
`transmission of voice traffic in today's environment. As
`these services migrate, the industry will be addressing issues
`such as the changing cost structure and concerns over the
`trade off between cost of service and quality of service in the
`transmission of information between processors.
`
`[0021] Aspects of quality of service include the capacity
`or bandwidth (how much information can be accommodated
`in a period of time), the response time (how long does it take
`to process a frame) and how flexible is the processing (does
`it respond to different protocols and frame configurations,
`such as different encapsulation or frame header methods).
`Those using a resource will consider the quality of service
`as well as the cost of service, with the tradeoffs depending
`on the situation presented.
`
`[0022] Some prior art systems handle outgoing informa-
`tion units from a processing system in a variety of ways. One
`suggestion is to use a round robin scheduler which fairness
`amongst a set of queues. Another one employs several
`different levels of priorities and a queue for each. In such a
`system, you have an absolute priority where the highest
`priority work is processed first and the lowest priority work
`may never be processed.
`
`[0023] Still another method of scheduling outputs
`involves a plurality of prioritized lists of work to be pro-
`cessed.
`
`It is also known to use a hierarchical packet sched-
`[0024]
`uling system. There are even systems which use several
`different scheduling methods in determining the order in
`which information units are to be sent toward a data trans-
`mission network, using a combination of different schedul-
`ing techniques.
`
`[0025] Other systems have used a weighted priority tech-
`nique implemented in the form of a round robin—which
`serves all queues, with some queues served more frequently
`than other queues, based on an algorithm which defines the
`level of service. Even such a weighted priority system would
`provide service to a user who continually exceeds the
`
`service levels assigned to it, continuing to serve, albeit less
`often, even as it exceeds the assigned service level and
`making it difficult for the system to enforce a level of service
`policy.
`
`[0026] Considering the size of a transmission packet or
`frame in determining which customers to serve adds a
`measure of fairness to a service system, in that a user who
`is processing large frames takes up more of the system
`capacity and therefore should receive service less often than
`a user with small frames. Some of the prior art systems
`consider the size of the transmission packet or frame in
`allocating resources, while others do not. Some communi-
`cation systems use a uniform, fixed-size packet, making
`consideration of packet size unnecessary, but others do not
`consider the size of the packet in allocating resources.
`
`[0027] Other prior art system are directed to handling
`information units which are of a common size as in the
`so-called Asynchronous Transfer Mode (or ATM) system, so
`that size of the information unit is not considered in deter-
`mining the priority of the current or a future information
`unit. An ATM system with a weight-driven scheduler is one
`of the solutions which is known in the prior art to schedule
`outputs from an ATM system.
`
`In any such system which involves weighting and
`[0028]
`queueing, it is desirable to allow for different types of
`service—for example, minimum bandwidth, best effort
`bandwidth, weighted fair queuing service, best effort peak
`bandwidth, and maximum burst size. While each of these
`types of service level are well known and accommodated in
`the prior art, it is a challenge to allow for the use of the any
`or all of them in the same system. It is also desirable to
`implement the weighted fair queuing using a system which
`considers the size of the transmission packet in determining
`the priority to be assigned to the packet in the queue.
`
`[0029] Thus, the prior art systems for handling data pack-
`ets for transmission to a network have undesirable disad-
`vantages and limitations which have an effect on the per-
`ceived fairness of the system.
`
`SUMMARY OF THE INVENTION
`
`[0030] The present invention overcomes the disadvan-
`tages and limitations of the prior art systems by providing a
`simple, yet effective, way of handling information units or
`frames coming out of a processing system and directing
`frames to output ports for dispatch to an data transmission
`network while providing a variety of different type of service
`levels in the same system.
`
`[0031] The present invention allows a single processing
`system to accommodate users which have service level
`agreements which include characteristics such as minimum
`bandwidth, best effort bandwidth, weighted fair queuing
`service, best effort peak bandwidth, and maximum burst size
`specifications, and any combinations of these characteristics
`in the same agreement.
`
`[0032] The present invention has the advantage that it
`allows the efficient use of resources and requires a minimum
`overhead to accommodate the various types of service
`levels. The present system establishes the types of service
`level agreement characteristics (also referred to as QoS) and
`provides the mechanism for enforcing them
`through
`manipulation of flow queues within a combination of time
`
`Juniper - Exhibit 1006, page 16
`
`Cloudflare - Exhibit 1042, page 16
`
`
`
`US 2002/0023168 Al
`
`Feb. 21, 2002
`
`3
`
`based calendars and weighted fair queuing calendars. The
`present invention also uses a technique for enforcing a level
`of service characteristic (for example, a minimum band-
`width) by determining the earliest time for the next service
`as a result of the current service, then testing the next request
`for service to determine whether it is after the allowable time
`for the next service based on the bandwidth established for
`the user.
`[0033] The present invention also allows for the use of any
`unused bandwidth by others through the use of a weighted
`fair queuing system which allows for the individual users to
`compete on a weighted fair basis for bandwidth which is
`unused at any given time. That is, even if bandwidth has
`been established for a user (for example, a user with a
`minimum bandwidth), when that bandwidth is not being
`used for that user, it may be used by others.
`[0034] The system and method of the present invention
`allows for the fair use of the unused bandwidth by consid-
`ering the size of the packet when determining the service
`order. That is, a user who sends a large packet is serviced
`later in the queue for unused bandwidth than a user who
`sends a small packet.
`[0035] The method to accomplish different levels of ser-
`vice is accomplished by establishing different calendars,
`both time based and weighted fair queuing and assigning
`flow queues to locations in one or more calendars. The
`calendars selected are determined based on the service level
`agreement which has been requested and paid for. Then, a
`user who has paid for a minimum bandwidth receives
`priority over others while that user is operating within and
`does not exceed that bandwidth. To the extent that the user
`with a minimum bandwidth exceeds that bandwidth, then
`the user may compete with other users for weighted fair use
`bandwidth allocation according to his service level agree-
`ment using a method which considers the length of the
`transmission packet. Similarly, a user who has arranged for
`a best effort bandwidth is provided with that bandwidth to
`the extent that it is available and a user who has arranger for
`best effort peak bandwidth or maximum burst size service is
`accorded the services in the system for allocating bandwidth
`of the present invention.
`[0036] Other objects and advantages of the present inven-
`tion will be apparent to those skilled in the relevant art in
`view of the following description of the preferred embodi-
`ment, taken together with the accompanying drawings and
`the appended claims.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`[0037] Having thus set forth some of the limitations and
`disadvantages of the prior art and some objects and advan-
`tages of the present invention, other objects and advantages
`will be apparent to those skilled in the relevant art in view
`of the following description of the drawings illustrating the
`present invention of an improved routing system and method
`in which:
`[0038] FIG. 1 is a block diagram for an interface device
`including embedded processor complex which is described
`in the NPU Patent, showing a DN Enqueue system and
`scheduler useful in practicing the present invention;
`[0039] FIG. 2 is a block diagram of an embedded proces-
`sor complex of type shown in FIG. 1, with the DN Enqueue
`(and its included scheduler) useful in understanding the
`present invention;
`
`[0040] FIG. 3 illustrates the scheduler of FIGS. 1-2,
`illustrating a system for scheduling egress of variable length
`packets according to the preferred embodiment of the
`present invention, in an "egress scheduler";
`
`[0041] FIG. 4 illustrates timer base calendar according to
`the preferred embodiment;
`[0042] FIGS. 5-8 illustrates the method and system for
`enqueuing packets into the scheduler system; and
`[0043] FIGS. 9-13 are logic flow charts of the calculations
`performed in the egress scheduler of the present invention,
`illustrating the servicing of a selected flow queue and
`calendar using the system of the present invention to provide
`minimum bandwidth, best effort bandwidth, weighted fair
`queuing service, best effort peak bandwidth and maximum
`burst size specifications, as different method of sharing
`bandwidth among users.
`[0044] DETAILED DESCRIPTION OF THE PRE-
`FERRED EMBODIMENT
`[0045]
`In the following description of the preferred
`embodiment, the best implementations of practicing the
`invention presently known to the inventors will be described
`with some particularity. However, this description is
`intended as a broad, general teaching of the concepts of the
`present invention in a specific embodiment but is not
`intended to be limiting the present invention to that as shown
`in this embodiment, especially since those skilled in the
`relevant art will recognize many variations and changes to
`the specific structure and operation shown and described
`with respect to these figures.
`[0046] FIG. 1 shows a block diagram of the interface
`device chip that includes the substrate 10 and a plurality of
`subassemblies integrated on the substrate. The subassem-
`blies are arranged into an upside configuration and a down-
`side configuration, with the "upside" configuration (some-
`times also referred to as an "ingress") referring to those
`components relating to data inbound to the chip from a data
`transmission network (up to or into the chip) and "down-
`side" (sometimes referred to as an "egress") referring to
`those components whose function is to transmit data from
`the chip toward the data transmission network in an out-
`bound fashion (away from the chip or down and into the
`network). Data flows follow the respective arrangements of
`the upside and downside configurations; thus, there is a
`upside data flow and a downside data flow in the system of
`FIG. 1. The upside or ingress configuration elements
`include an Enqueue-Dequeue-Scheduling UP (EDS-UP)
`logic 16, multiple multiplexed MAC's-UP (PMM-UP) 14,
`Switch Data Mover-UP (SDM-UP) 18, System Interface
`(SIF) 20, Data Align Serial Link A (DASL-A) 22 and Data
`Align Serial Link B (DASL-B) 24. Data links are more fully
`described in the Link Patent referenced above, and reference
`should be made to that document for a greater understanding
`of this portion of the system. It should be understood that the
`preferred embodiment of the present invention uses the data
`links as more fully described in that patent, other systems
`can be used to advantage with the present invention, par-
`ticularly those which support relatively high data flows and
`system requirements, since the present invention is not
`limited to those specific auxiliary devices such as the data
`links which are employed in the preferred embodiment.
`[0047] The components depicted on the downside (or
`egress) of the system include data links DASL-A 26 and
`
`Juniper - Exhibit 1006, page 17
`
`Cloudflare - Exhibit 1042, page 17
`
`
`
`US 2002/0023168 Al
`
`Feb. 21, 2002
`
`4
`
`DASL-B 28, system interface SIF 30, switch data mover
`SDM-DN 32, enqueue-dequeue-scheduler EDS-DN 34 and
`multiple multiplexed MAC's for the egress PMM-DN 36.
`The substrate 10 also includes a plurality of internal static
`random access memory components (S-RAM's), a traffic
`management scheduler (TRAFFIC MGT SCHEDULER,