`United States Patent
`5,938,736
`[11] Patent Number:
`[45] Date of Patent: Aug. 17, 1999
`Muller et at.
`
`
`
`USOUS938736A
`
`[54] SEARCH ENGINE ARCHITECTURE FOR A
`HIGH PERFORMANCE MULTI-LAYER
`SWITCH ELEMENT
`
`[75]
`
`Inventors: Shimon Muller, Sunnyvale; Ariel
`Hendel, (Tupertino; Louise Yeung, San
`Carlos, all of Calif.
`
`[73] Assignee: Sun Mlcmsystems, Inc., Mountain
`View, Calif.
`
`[21] Appl. No.: 08/885,116
`
`[22
`
`Filed:
`
`Jun. 30, 1997
`
`Int. CL“ ............................. G06F 13/38; G06F 15117
`[51]
`[52] US. Cl.
`7091243;7091236;370/392
`[58]
`Field of Search ............................ 395120073, 200.7,
`395120068, 200.66, 200.75; 3701386, 389,
`428, 392
`
`[56]
`
`References Cited
`US. PA'I‘EN'I‘ DOCUMENTS
`
`4,052,874
`4,807,111
`4,811,337
`4,850,042
`4,899,333
`4.922.503
`
`311987 Loyer ................................. 340182505
`211989 Cohen et al.
`.
`....... 3641200
`311989 lIart
`..............
`370185
`711989 Petronio et al.
`4551606
`211990 Roediger .......
`3701427
`511990 [tone ................................... 370,485.13
`
`
`
`(List continued on next page.)
`OTHER PUBLICATIONS
`
`International Search Repon, PCT111598113202, 4 pages.
`International Search Report, PCT/US98113368, 5 pages.
`International Search Report, PCT/US98113364, 4 pages.
`International Search Report, PCT,’US98113365, 4 pages.
`International Search Report, PCT/U898113177, 4 pages.
`International Search Report, P('l”US9X/I3199, 5 pages.
`International Search Report, PCTUS98113015, 5 pages.
`Wang et 211., A Novel Message Switch for llighly Parallel
`Systems, 151115., pp. 150-155 1989.
`
`'l'obagi, Fast Packet SwitchArchitectures for Broadband
`Integrated Services Digital Networks, Proceedings of the
`IEEE, vol. 78, Issue 1, pp. 133—167, Jan. 1990.
`Fliesser et 111., Design of a Multicast ATM Packet Switch,
`Electrical and Computer Engineering, 1993 Canadian ("on-
`fereuce, pp. 779—783, 1993.
`Chang et al., An Overview of the Pipelined Common Bufler
`Architecture (PCBA)
`for Memory Based Packet/Cell
`Switching Systems, Local Computer Networks, 1994, pp.
`288—297, 19th Conference, IEEE.
`
`Agrawal et al.. A Scalable Shared Buffer ATM Switch
`Architecture, VLSI, 1995 5th Great Lakes Symposium,
`IEEE, pp. 256—261, 1994.
`
`(List continued on next page.)
`
`Primary Examiner—Mark H. Rinehart
`Attorney, Agent, or Finn—Blakely Sokololf Taylor &
`Zafman
`
`[57]
`
`ABSTRACT
`
`A multi-layer switch search engine architecture is provided.
`According to one aspect of the present invention, a switch
`fabric includes a search engine, and a packet header pro-
`cessing unit. The search engine may be coupled to a for-
`warding databasc memory and one or more input porLs. The
`search engine is configured to schedule and perform
`accesses to the forwarding database memory and to transfer
`forwarding decisions to the one or more input ports. The
`header processing unit is coupled to the search engine and
`includes an arbitrated interface for coupling to the one or
`more input ports. The header processing unit is configured to
`receive a packet header from one or more of the input ports
`and is further configured to construct a search key [or
`accessing the forwarding database memory based upon a
`predetermined portion of the packet header. 'l‘he predeter-
`mined portion of the packet header is selected based upon a
`packet class with which the packet header is associated.
`
`23 Claims, 9 Drawing Sheets
`
`SEflRCH ENGINE
`J70
`
`ADDRESS
`MOUMULATION
`
`ROCK
`
`
`
`”Md—m3}
`
`
`
`It'l-t—llll)
`
`
`L: "was
`cuss MA‘ICHING
`BLOCK
`..
`
`ARISTA-1008
`
`1
`
`ARISTA-1008
`
`
`
`5,938,736
`
`Page 2
`
`US. PAI‘LiN'l‘ DOCUMENTS
`
`.
`‘
`.
`370/8513
`6/1990 Sheehy ......................
`4,933,938
`
`3951120000
`6/1990 Yamamoto
`4,935,809
`
`9/1992 Pun] etal- ----------------- 370/468
`5,150,358
`»
`.
`50993 Mahel' el al-
`5,210,746
`
`7.
`699% Takada et a1.
`~ 370/401
`5,220,562
`
`..................... 370/941
`7/1993 Hluehyj et al.
`5,231,633
`.......................... 370/60
`10/1993 Callon et ul.
`5,251,205
`
`1/1994 Kudo ...........
`370/941
`5,278,830
`
`3/1994 McHarg 81; a1.
`........................ 370/413
`5,291,482
`370/474
`3/1994 Carr
`5,293,379
`
`~ 395/725
`47/1994 bee
`5,301,333
`
`- 340/827
`57/1994 Pcrlman ct at.
`5,309,437
`------
`370/8543
`8/1994 Cassagnol
`5,343,471
`
`395/200.73
`5,353,412 10/1994 Douglas Ct al.
`
`1/1995 McAuleYet al~ ,.
`54386413
`370/54
`2/1995 Fnglstad elal- -
`- 395/700
`5392,4132
`
`370/941
`5,394,402
`2/1995 Ross
`
`~ 370/390
`41/1995 Aflflet HL
`5,410,540
`
`. 395/800
`5,410,722
`41/1995 Corn/thy
`5,424,838
`9/1995 Lm
`363/49
`6/1995 Brillon et 'dl.
`..
`370/941
`5,425,028
`
`
`67/1995 Guinean, 111
`5,426,736
`- 395/250
`
`7/1995 Phage/9.941.
`5,432,907
`395/200
`370/601
`5,450,399
`9/1995 SHglta
`
`- 370/413
`5,455,820 10/1995 Yamada
`
`10/1995 Gaddis 01 a1.
`5,457,681
`370/402
`5,459,714 10/1995 Lo et al.
`................. 370/131
`
`. 370/351
`..
`5,459,717 10/1995 Mullan et‘. al.
`
`. 370/54
`10/1995 Drake, Jr. et a1.
`5,461,611
`5,461,624 10/1995 Mazzola ......
`370/402
`5,473,607 12/1995 Housman .
`370/8513
`
`5,477,537 12/1995 Dunkert et al.
`......... 370/60
`
`5,481,540
`1/1996 Huang .......................
`370/8513
`
`5,485,455
`1/1996 Dobbins et al.
`370/255
`1/1996 Sweazey ............................ 395/200.54
`5,485,578
`'
`(
`’
`5,490,139
`2/1996 Baker et a1.
`........................... 370/60
`
`3’190’252
`2/1996 Mace” Ct al‘ "
`3)5’ 20001
`
`5’300’860
`3/1996 Palm“ 7 al'
`370/85‘13
`5,509,123
`4/1996 Dobbins ctal.
`395,200.73
`
`5,515,376
`5/1996 Murthy et al.
`340/402
`7/1996 Kofldoh ,,,,,,,,,,,,,,
`5,535,202
`370/601
`5,555,405
`9/1996 Griesmaet’ et a],
`..................... 395/600
`
`. 370/434
`5,561,666
`10/1996 Christensen et a].
`
`395/290‘96
`22917.91 10’:1996. Menquon et 81‘
`.,5/0,365 10,1996 Yodlnda .:..................
`370/856
`
`5,572,522 11/1996 Calumvokls et al.
`. 370/395
`12/1996 Pleyer ..................................... 395/326
`5,583,981
`5,594,727
`1/1997 K011360500 et a1.
`.................... 370/468
`
`5,600,641
`2/1997 Duault et at.
`. 370/400
`2/1997 Lebizay et a1.
`.
`5,602,841
`. 370/413
`
`
`395/209”
`5’606’66?
`2/1997 Rom“ Ct a"
`5’610’903
`3/1997 Murthy at al'
`370/401
`5,617,421
`4/1997 Chin eta].
`..
`. 370/402
`5,619,500
`471997 Hiekali
`.370/414
`5,619,661
`4/1997 crews et a1,
`, 395/299
`5,633,865
`5/1997 Short
`.. 370/412
`
`
`
`6/1997 Yu ........................................... 395/500
`5,636,371
`6/1997 Johnson et al.
`......................... 395/881
`5,640,605
`
`7/1997 Griesmer et a1.
`395/200.”
`5,649,109
`.......... 370/392
`7/1997 Van Seters et a].
`5,651,002
`10/1997 Aggarwaletal.
`................. 370/20012
`5,675,741
`
`5,684,800 11/1997 Dobbins ct a].
`370/401
`
`...............
`5,691,984 11/1997 Gardner ct al.
`370/401
`5,706,472
`1/1998 Ruff Ct 81.
`..............
`. 395/49104
`
`395/2008
`5,720,032
`2/1998 Picazo, Jr. et al
`_______ 370/418
`5,724,358
`3/1998 Hendrick et a]_
`
`5,726,977
`3/1998 Lee
`370,235
`
`5,734,865
`3/1998 Y1.
`,395/500
`4/1998 Mztzzolu 61 211.
`5,740,171
`370/392
`5,740,175
`4/1998 Wukenlun el ill.
`...................... 395/422
`395/200.68
`5,740,375
`4/1998 Dunne eta].
`
`.
`5,742,604
`4/1998 bdsall et a1,
`37017401
`
`
`4/1998 Picazo, Jr. eta
`5,742,760
`370/351
`.
`5,745,048
`4/1998 Taguchi eta].
`340/87001
`
`..
`_ 395/20079
`5,748,905
`5/1998 Hauscrct al.
`395/20058
`5,751,967
`5/1998 Raab et al.
`
`.....
`5,754,540
`5/1998 L16 et a1.
`370/315
`
`5/1998 Iambrechtetal.
`5,754,801
`395/308
`
`5.7577771
`5/1998 Li et a1,
`370,235
`
`. 370/392
`5,757,795
`5/1998 Schnell
`6/1998 Fukudaet al.
`..
`395/20068
`5,761,435
`
`6/1998 Christensen 61 61.
`5,764,634
`370/401
`
`5,764,636
`6/1998 1311152111
`. 370/401
`
`
`5,781,549
`7/1998 Dai
`. 370/398
`7/1998 Szczepanek et a].
`. 395/2008
`5,784,573
`
`_____________________ 370/400
`5,790,546
`8/1998 Dobbins et a].
`5,790,303
`8/1998 Seaman ______________________________ 395/20053
`5,802,052
`9/1998 Vcnkataraman
`370,395
`
`OTHER PUBLICATIONS
`
`Sabaa et a1., Implementation of a Window—Based Scheduler
`in an ATM Switch, Electrical and Computer Engineering,
`.
`.
`.
`1995 Canadian Conference, IEEE, pp 3245, 1995
`NaraghkPour et al., A Mult1ple Shared Memory Swneh,
`S 'stern Theorv 1996 Southeastern S m os'urn IEEE
`_~V‘
`«’
`y p
`1
`’
`*9
`.
`.
`.
`,
`.
`39541999
`chngar ct al., SW1tclung Prlorltlzcd Packets, Globccom ’89:
`IEEE Global
`'l'elecornmunications Conference,
`pp.
`1181—1186, 1989.
`“Foundry Products”, downloaded from Website http://ww-
`W.foundrynet,con‘1/ on Jun. 19) 1997'
`.
`“
`.
`Anthony J-,MCA“16y _& Pm” FranCIS: Ffls‘ Routlng Table
`L001911) Usmg CAMS , IEEE: 1993,1311 1383-1390
`“Gigabit Ethernet”, Network Strategy Report, The Burton
`Group, V2, May 8, 1997 40 pages.
`“IP On Speed”, Erica Roberts, Internet—Draft, Data Com-
`munications on the Web Mar 1997 12 8 es
`'
`_
`’
`, j
`’
`p g
`“Multllayer Topology”, WhltelPaper, Internet—Draft, 13
`pages, downloaded from websne http:/’/Wwwbaynetwork-
`s.com on Apr. 18, 1997.
`
`2
`
`
`
`
`
`1119ch'S'fl
`
`
`
`6661‘L1“finv
`
`610lUNIS
`
`99L‘896‘s
`
`——--~
`
`
`é'//////////fl//////A W/l/l/flflfl/fl/Wz’l‘WW/ll/IMW’7/l/////////7///////////¢
`\
`
`_—
`
` \\‘\\\\\\\\\\\\
`flM/l/M/ll/AWWI/l/lflfl/lmW/é W
`
`
`
`//\
`
`
`
`
`Wfl/flll/IV/flfl/l/WW/l/M/t
`
`.—
`
`Swilching
`Element
`
`.\\\\\\\\\\\\\\
`
`
`
`Switching
`Element
`
`
`
`I
`I
`|
`|
`l
`I
`L
`
`Swi lching
`Element
`
`l
`I
`|
`|
`I
`I
`
`To nodes and end-stations
`FIG. 1
`
`3
`
`
`
` FORWARDING
`AND FILTERING
`DATABASE
`
`
`140
`
`
`
`SWITCH FABRIC
`
`210
`
`
`NETWORK
`
`
`INTERFACE
`
`
`
`CPU INTERFACE
`
`
`215
`
`
`CASCADLNG
`
`INTERFACE
`
`
`2’25
`
`
`
`'E‘O
`NETWORK
`
`TC) ONE
`OR MORE
`OTHER
`SWITCH
`ELLMIENTS
`
`
`
`mama'90
`
`
`
`6661‘L1'finv
`
`6JO'2Walls
`
`9£L‘8£6‘s
`
`4
`
`
`
`US. Patent
`
`Aug. 17,1999
`
`Sheet 3 0f9
`
`5,938,736
`
` FORWAR DING DATABASE
`
`MEMORY
`
`
` 140
`
` FORWAR DING DATABASE
`MEMORY INTERFACE
`
`
`CAM ADDR AND INS GEN
`
`SEARCH ENGINE
`FWD a. FILTER
`
`8
`r:
`&
`E
`c: E’—
`§ 3
`% 5
`
`
`‘___.,
`
`
`U)
`U)LI.|
`
`
`
`
`
`HEADER
`
`Pnocessme
`LOGIC
`
`
`
`LEARNIN
`LOGIC
`
`SOFTWARE COMMAND
`EXECUTION BLOCK
`
`
`
`
`
`
`5
`
`
`
`US. Patent
`
`Aug. 17,1999
`
`Sheet 4 0f 9
`
`5,938,736
`
`/499
`
`
`
`ENCAPSULATED INDEPENDENT
`
`L3 ADDRESS
`
`PORTION
`485
`
`L3 ADDRESS
`
`DEPENDENT
`
`PORTION
`49°
`
`LgogfiiAgfiR
`
`475
`
`HEADER
`PREPROCESS
`ARBITER
`
`360
`
`
`
`501
`
`SEARCH ENGINE
`370
`
`ADDRESS
`ACCUMULATION
`BLOCK 510
`
`A
`R
`
`B I
`
`T
`E
`R
`
`
`
`545
`
`L 2 K
`
`5
`
`ENCAPSULATION
`
`BLOCK 520
`
`
`
`A R B I
`
`T
`
`L3 HEADER
`CLASS MATCHING
`
`E
`
`A
`R
`a
`I
`T
`ER
`
`BLOCK 530
`
`CLASS
`
`504
`
`L3 ADDRESS
`DEPENDENT
`BLOCK 540
`
`FIG. 5
`
`L
`2
`K
`E
`Y
`
`555
`
`6
`
`
`
`US. Patent
`
`Aug. 17,1999
`
`Sheet 5 0f 9
`
`5,938,736
`
`SRAM ADDRESS l1.0| CONTROL
`
`FIG. 6
`
`7
`
`
`
`SEAR CH
`SUPERCYCLE
`PROCESSING
`
`
`
`704
`
`DISTRIBUTED
`FLOW
`
`7 2
`0
`
`Y
`
`»
`
`N
`
`708
`
`LAYER 2 (C2)
`
`LEARNING
`
`FIG. 7A
`
`712
`
`714
`
`L3 FLOW CLASS
`SEARCH;
`LAYER 3
`
`SEARCH
`0718
`
`722
`
`APPLY
`CLASS
`ACTION
`OPTIONS
`
`720
`
`724
`
`READ ASSCE.
`DATA:
`
`
`
`_SFAR_'H
`
`DASEARCH
`
`L3 SEARCH:
`
`728
`
`30
`
`
`
`mama'S'fl
`
`
`
`6661‘L1'finv
`
`6J09malls
`
`9£L‘8€6‘s
`
`8
`
`
`
`734
`
`my
`CLASS
`ACTION
`ownons
`
`READASSOC. DATA
`
`736
`
`$738
`APPLY L2
`DECISION
`ALGORITHM
`
`DA SEARCH
`
`742
`
`{744
`READ ASSOC,
`DATA
`
`® 748
`
`750
`
`DA SEARCH:
`
`
`
`Y
`
`746
`
`768
`
`N
`
`
`
`FOR
`
`[.2
`
`APPLY L2
`
`
`DECISION
`ALGORITHM
`
`
`ASSEMBLE FORWARDENG
`
`APPLY L2 DECISION
`ALGORITHM
`
`
`
`FIG. 7B
`
`DECISION
`
`7 80
`
`@
`
`”313dST]
`
`
`
`6661‘L1'an
`
`6J0LlaallS
`
`9sL‘896‘s
`
`9
`
`
`
`
`CAM SHORT SEARCH
`CAM SHORT SEARCH
`
`
`mm wREAD
`‘— LZ SA SEARCH >4 L2 DA SEARCH—b
`
`F'G- 8“
`
`CAM SHORT
`SEARCH
`
`
`
`
`
`CAM LONG SEARCH
`
`Fl
`
`G
`
`.88
`
`RAM
`
`CAM SHORT
`SEARCH
`
`
`
`RAM BURST
`
`
`
`RAM
`
`H2 SA SEARCH >4 L3 SEARCH >é L2 DA SEARCHb—b
`
`CAM SHORT SEARCH
`
`
`
`
`
`CAM LONG SEARCH
`
`
`
`'
`
`RAM
`
`FIG 8C
`
`“5"”
`‘-——L2 SASEARCH % L3 SEARCH -————-’
`
`RAM BURST
`
`“5"”
`
`
`
`1‘1919d'S'fl
`
`
`
`6661‘L1'finv
`
`6J08walls
`
`9£L‘8€6‘s
`
`10
`
`10
`
`
`
`US. Patent
`
`Aug. 17,1999
`
`Sheet 9 0f9
`
`5,938,736
`
`GENERAL COMMAND
`PROCESSING
`
`950
`
`960
`
`970
`
`980
`
`910
`
`cgpsggmgmo
`REGISTENS)
`
`
`
`
`
`SOFTWARE COMMAND
`
`
`EXECUTION BLOCK LOADS
`
`
`APPROPRIATE DATA FROM
`DATA REGISTER(S)
`
`
`SOFIWRAE COMMAND
`
`
`
`FORWARDING DB
`ACCESS(ES)
`
`
`
`SOFTWARE COMMAND
`
`COMMAND
`EXECUTION BLOCK
`COMPLETE
`UPDATES RESULTS [N
`
`
`
`APPROPRIATE REGISTER(S)
`
`
`
`SOFTWARE COMMAND
`
`
`EXECUTION BLOCK SETS
` CPU ACTS ON RESULT(S)
`
`
`THE COMMAND STATUS
`FLAG(S)
`
`
`
`FIG. 9
`
`11
`
`11
`
`
`
`5,938,736
`
`1
`SEARCH ENGINE ARCHITECTURE FOR A
`HIGH PERFORMANCE MULTI-LAYER
`SWITCH ELEMENT
`
`FIELD OF THE INVENTION
`
`The invention relates generally to the field of computer
`networking devices. More particularly, the invention relates
`to a multi-layer switch search engine architecture.
`
`BACKGROUND OF TIIE INVENTION
`
`Local area networks (LANs) have become quite sophis-
`ticated in architecture. Originally, LANs were thought of a
`single wire connecting a few computers. Today I.ANs are
`implemented in complicated configurations to enhance fune-
`tionality and llexibility. In such a network, packets are
`transmitted from a source device to a destination device; in
`more expansive networks, this packet can travel through one
`or more switches and/or routers. Standards have been set to
`
`define the packet structure and layers of functionality and
`sophistication of a network. For example, the TCP/IP pro-
`tocol stack defines four distinct multiple layers, e.g.
`the
`physical layer (layer 1), data link layer (layer 2), network
`layer (layer 3), transport layer (layer 4). A network device
`may be capable of supporting one or more of the layers and
`refer to particular fields of the header accordingly.
`Today, typical LANs utilize a combination of Layer 2
`(data link layer) and Layer 3 (network layer) network
`devices. In order to meet the ever increasing performance
`demands from the network, functionality that has been
`traditionally performed in software and/or in separate layer
`2 and layer 3 devices have migrated into one multi-layer
`device or switch that implements the performance critical
`functions in hardware.
`
`One of the critical aspects for achieving a cost-effective
`high-performance switch implementation is the architecture
`of the forwarding database search engine, which is the
`centerpiece of every switch design. Therefore, it is desirable
`to optimize partitioning of the functional modules, provide
`efficient interaction between the search engine and its “cli-
`ents” (e.g., switch input ports and the central processing
`unit), and optimize the execution order of events, all of
`which play a crucial role in the overall performance of the
`switching fabric. Also,
`it
`is desirable to support diverse
`traffic types and policies by providing flexibility to match
`different packet header fields. Ideally this architecture
`should also allow for a very high level of integration in
`silicon, and linearly scale in performance with the advances
`in silicon technology.
`
`SUMMARY OF THE 1N VENTION
`
`A multi-layer switch search engine architecture is
`described. According to one aspect of the present invention,
`a switch fabric includes a search engine, and a packet header
`processing unit. The search engine may be coupled to a
`forwarding database memory and one or more input ports.
`The search engine is configured to schedule and perform
`accesses to the forwarding database memory and to transfer
`forwarding decisions to the one or more input ports. The
`header processing unit is coupled to the search engine and
`includes an arbitrated interface for coupling to the one or
`more input ports. The header processing unit is conligured to
`receive a packet header from one or more of the input ports
`and is further configured to construct a search key for
`accessing the forwarding database memory based upon a
`predetermined portion of the packet header. The predeter-
`
`LA
`
`10
`
`20
`
`k)m
`
`30
`
`DJat
`
`40
`
`4sm
`
`50
`
`U: u.
`
`60
`
`2
`mined portion of the packet header is selected based upon a
`packet class with which the packet header is associated.
`Other features of the present invention will be apparent
`from the accompanying drawings and from the detailed
`description which follows.
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The present invention is illustrated by way of example,
`and not by way of limitation, in the figures of the accom-
`panying drawings and in which like reference numerals refer
`to similar elements and in which:
`
`FIG. 1 illustrates a switch according to one embodiment
`of the present invention.
`FIG. 2 is a simplified block diagram of an exemplary
`switch element that may be utilized in the switch of FIG. 1.
`FIG. 3 is a block diagram of the switch fabric of FIG. 2
`according to one embodiment of the present invention.
`FIG. 4 illustrates the portions of a generic packet header
`that are operated upon by the pipelined header preprocessing
`subblocks of PIG. 5 according to one embodiment of the
`present invention.
`FIG. 5 illustrates pipelined header preprocessing sub—
`blocks of the header processing logic of FIG. 3 according to
`one embodiment of the present invention.
`FIG. 6 illustrates a physical organization of the forward-
`ing memory of FIG. 2 according to one embodiment of the
`present invention.
`FIG. 7 is a flow diagram illustrating the forwarding
`database memory search supercycle decision logic accord-
`ing to one embodiment of the present invention.
`FIGS. 8A—C are timing diagrams illustrating three exem-
`plary forwarding database memory search supercycles.
`FIG. 9 is a flow diagram illustrating generalized com—
`mand processing for typical forwarding database memory
`access commands according to one embodiment of the
`present invention.
`DETAILED DESCRIPTION
`
`A search engine architecture for a high performance
`multi-layer switch element is described. In the following
`description, for the purposes of explanation, numerous spe-
`cific details are set forth in order to provide a thorough
`understanding of the present invention. It will be apparent,
`however, to one skilled in the art that the present invention
`may be practiced without some of these specific details. In
`other instances, well-known structures and devices are
`shown in block diagram form.
`The present invention includes various steps, which will
`be described below. While the steps of the present invention
`are preferably performed by the hardware components
`described below, the steps may alternatively be embodied in
`machine-executable instructions, which may be used to
`cause a general-purpose or special-purpose processor pro-
`grammed with the instructions to perform the steps. Further,
`embodiments of the present invention will be described with
`reference to a high speed Ethernet switch employing a
`combination of random access memory (RAM) and content
`addressable memories (CAMs). However, the method and
`apparatus described herein are equally applicable to other
`types of network devices such as repeaters, bridges, routers,
`brouters, and other network devices and also alternative
`memory types and arrangements.
`AN EXEMPIARY NETWORK ELEMENT
`An overview of one embodiment of a network element
`
`that operates in accordance with the teachings of the present
`
`12
`
`12
`
`
`
`5,938,736
`
`3
`invention is illustrated in FIG. 1. The network element is
`used to interconnect a number of nodes and end-stations in
`a variety of different ways. In particular, an application of
`the multi-layer distributed network element (MLDNE)
`would be to route packets according to predefined routing
`protocols over a homogenous data link layer such as the
`IEEE 802.3 standard, also known as the Ethernet. Other
`routing protocols can also be used.
`The MLDNE’s distributed architecture can be configured
`to route message traffic in accordance with a number of
`known or
`future routing algorithms.
`In a preferred
`embodiment, the MLDNE is configured to handle message
`traifle using the Internet suite of protocols, and more spe-
`cifically the Transmission Control Protocol (TCP) and the
`Internet Protocol (IP) over the Ethernet LAN standard and
`medium access control (MAC) data link layer. The TCP is
`also referred to here as a Layer 4 protocol, while the IP is
`referred to repeatedly as a Layer 3 protocol.
`In one embodiment of the MLDNE, a network element is
`configured to implement packet routing functions in a dis-
`tributed manner, i.e., different parts of a filnction are per-
`formed by different subsystems in the MLDNE, while the
`final result of the functions remains transparent
`to the
`external nodes and end-stations. As will be appreciated from
`the discussion below and the diagram in FIG. 1, the MLDNE
`has a scalable architecture which allows the designer to
`predictably increase the number of external connections by
`adding additional subsystems, thereby allowing greater flex-
`ibility in defining the MLDNE as a stand alone router.
`the
`As illustrated in block diagram form in FIG. 1,
`MIDNE [0| contains a number of subsystems “0 that are
`fully meshed and interconnected using a number of internal
`links 141 to create a larger switch. At least one internal link
`couples any two subsystems. Each subsystem 110 includes
`a switch element 100 coupled to a forwarding and filtering
`database 140, also referred to as a forwarding database. The
`forwarding and filtering database may include a forwarding
`memory 113 and an associated memory 114. The forwarding
`memory (or database) 113 stores an address table used for
`matching with the headers of received packets. The associ-
`ated memory (or database) stores data associated with each
`entry in the forwarding memory that
`is used to identify
`forwarding attributes for forwarding the packets through the
`MLDNE, A number of external ports (not shown) having
`input and output capability interface the external connec-
`tions 117. In one embodiment, each subsystem supports
`multiple. Gigabit Ethernet ports, Fast Ethernet ports and
`Ethernet ports. Internal ports (not shown) also having input
`and output capability in each subsystem couple the internal
`links 141. Using the internal links, the MLDNE can connect
`multiple switching elements together to form a multigigabit
`switch.
`
`The MLDNE 101 further includes a central processing
`system (CPS) 160 that is coupled to the individual sub-
`system 110 through a communication bus 151 such as the
`peripheral components interconnect (PCI). The CPS 160
`includes a central processing unit (CPU) 161 coupled to a
`central memory 163. Central memory 163 includes a copy of
`the entries contained in the individual forwarding memories
`113 of the various subsystems. The CPS has a direct control
`and communication interface to each subsystem 110 and
`provides some centralized communication and control
`between switch elements.
`AN EXEMPIARY SWITCH ELEMENT
`
`FIG. 2 is a simplified block diagram illustrating an
`exemplary architecture of the switch element of FIG. I. The
`
`engine that provides access to the forwarding and filtering
`
`13
`
`LA
`
`10
`
`20
`
`4
`switch element 100 depicted includes a central processing
`unit (CPU) interface 215, a switch fabric block 210, a
`network interface 205, a cascading interface 225, and a
`shared memory manager 220.
`Ethernet packets may enter or leave the network switch
`element 100 through any one of the three interfaces 205,
`215, or 225. In brief, the network interface 205 operates in
`accordance with a corresponding Ethernet protocol
`to
`receive Ethernet packets from a network (not shown) and to
`transmit Ethernet packets onto the network via one or more
`external ports (not shown). An optional cascading interface
`225 may include one or more internal links (not shown) for
`interconnecting switching elements to create.
`larger
`switches. For example, each switch element 100 may be
`connected together with other switch elements in a full mesh
`topology to form a multi-layer switch as described above.
`Alternatively, a switch may comprise a single switch ele-
`ment 100 with or without the cascading interface 225.
`The CPU 161 may transmit commands or packets to the
`network switch element 100 via the CPU interface 215. In
`this manner, one or more software processes running on the
`CPU 161 may manage entries in an external forwarding and
`filtering database I40, such as adding new entries and
`k)1|
`invalidating unwanted entries. In alternative embodiments,
`" however, the CPU 161 may be provided with direct access
`to the forwarding and filtering database 140. In any event,
`for purposes of packet forwarding, the CPU port of the CPU
`interface 215 resembles a generic input port into the switch
`element 100 and may be treated as if it were simply another
`external network interface port. However, since access to the
`CPU port occurs over a bus such as a peripheral components
`interconnect (PCI) bus,
`the CPU port does not need any
`media access control (MAC) functionality.
`the two main
`Returning to the network interface 205,
`tasks of input packet processing and output packet process-
`ing will now briefly be described. Input packet processing
`may be performed by one or more input ports of the network
`interface 205. Input packet processing includes the follow-
`ing: (1) receiving and verifying incoming Ethernet packets,
`(2) modifying packet headers when appropriate, (3) request-
`ing buffer pointers from the shared memory manager 220 for
`storage of incoming packets,
`(4)
`requesting forwarding
`decisions from the switch fabric block 2“), (5) transferring
`the incoming packet data to the shared memory manager 220
`for temporary storage in an external shared memory 230,
`and (5) upon receipt of a forwarding decision, forwarding
`the buffer pointer(_'s) to the output port(s) indicated by the
`forwarding decision. Output packet processing may be per-
`formed by one or more output ports of the network interface
`205. Output processing includes requesting packet data from
`the shared memory manager 220, transmitting packets onto
`the network, and requesting deallocation of buffer(s) after
`packets have been transmitted.
`The network interface 205, the CPU interface 215, and the
`cascading interface 225 are coupled to the shared memory
`manager 220 and the switch fabric block 210. Preferably,
`critical functions such as packet forwarding and packet
`buffering are centralized as shown in FIG. 2. The shared
`memory manager 220 provides an efficient centralized inter-
`face to the external shared memory 230 for buffering of
`incoming packets. The switch fabric block 210 includes a
`search engine and learning logic for searching and main-
`taining the forwarding and filtering database 140 with the
`- assistance of the CPU 161.
`The centralized switch fabric block 210 includes a search
`
`30
`
`DJat
`
`40
`
`4:.m
`
`50
`
`U: u.
`
`60
`
`13
`
`
`
`5,938,736
`
`6
`(3) HerBus[X:l][N:0]—The Dedicated Header Bus
`The header bus is a dedicated X-bit wide bus from each
`input port to the switch fabric 210. In one embodiment, X is
`16, thereby allowing the packet header to be transferred as
`double bytes.
`(4) FwdiAck[N:O]—Forwarding Decision Acknowledg-
`ment Signals
`These forwarding decision acknowledgment signals are
`generated by the switch fabric 210 in response to corre-
`sponding forwarding request signals from the input ports
`(see FwdiReq[N:0] above). These signals are deasserted
`while the forwarding decision is not ready. When a forward-
`ing decision acknowledgment signal does become asserted,
`the corresponding input port should assume the forwarding
`decision bus (see Fwd_DecisionW:O] below) has a valid
`forwarding decision. After detecting its forwarding decision
`acknowledgment, the corresponding input port may make
`another forwarding request, if needed.
`(5) Fwd Decision[Y:U]—Shared Forwarding Decision
`Bus
`
`'l'his forwarding decision bus is shared by all input ports.
`It indicates the output port number(s) on which to forward
`the packet. The forwarding decision may also include data
`indicative of the outgoing packet’s priority, VID insertion,
`DA replacement, and other information that may be useful
`to the input ports.
`SWITCH FABRIC OVERVIEW
`
`LA
`
`10
`
`20
`
`k)m
`
`5
`database 140 on behalf of the interfaces 205, 215, and 225.
`Packet header matching, Layer 2 based learning, Layer 2
`and Layer 3 packet forwarding, filtering, and aging are
`exemplary functions that may be performed by the switch
`fabric block 210. Each input port is coupled with the switch
`fabric block 210 to receive forwarding decisions for
`received packets. The forwarding decision indicates the
`outbound port(s) (e.g., external network port or internal
`cascading port) upon which the corresponding packet should
`be transmitted. Additional information may also be included
`in the forwarding decision to support hardware routing such
`as a new MAC destination address (DA) for MAC DA
`replacement. Further, a priority indication may also be
`included in the forwarding decision to facilitate prioritiza-
`tion of packet traffic through the switch element 100.
`In the present embodiment, Ethernet packets are centrally
`bulfered and managed by the shared memory manager 220.
`The shared memory manager 220 interfaces every input port
`and output port and performs dynamic memory allocation
`and deallocation on their behalf, respectively. During input
`packet processing, one or more bulfers are allocated in the
`external shared memory 230 and an incoming packet is
`stored by the shared memory manager 220 responsive to
`commands received from the network interface 205, for
`example. Subsequently, during output packet processing, the
`shared memory manager 220 retrieves the packet from the
`external shared memory 230 and deallocates buffers that are
`no longer in use. To assure no buffers are released until all
`output ports have completed transmission of the data stored
`therein, the shared memory manager 220 preferably also
`tracks bulIer ownership.
`INPUT PORT/SWITCH FABRIC INTERFACE
`
`Before describing the internal details of the switch fabric
`210, the interface between the input ports (e.g., any port on
`which packets may be received) and the switch fabric 210
`will nowbriefly be discussed. Input ports in each of the CPU
`interface 215, the network interface 205, and the cascading
`interface 225 request forwarding decisions for incoming
`packets from the switch fabric 210. According to one
`embodiment of the present invention, the following interface
`is employed:
`(1) Fwd Req[N:0]—Forward Request Signals
`These forward request signals are output by the input
`ports to the switch fabric 210. They have two purposes. First,
`they serve as an indication to the switch fabric 210 that the
`corresponding input port has received a valid packet header
`and is ready to stream the packet header to the switch fabric.
`A header transfer grant signal (see IIdrinriGnt[N:0]
`below) is expected to be asserted before transfer of the
`packet header will begin. Second, these signals serve as a
`request for a forwarding decision after the header transfer
`grant is detected. The forward request signals are deasserted
`in the clock period after a forwarding decision acknowledg-
`ment is detected from the switch fabric 210 (see FwdiAck
`[N20] below).
`(:2) HdrinriGnt[N:0]—Header Transfer Grant Signals
`These header transfer grant signals are output by the
`switch fabric 210 to the input ports. More specifically, these
`signals are output by the switch fabric’s header preprocess-
`ing logic that will be described further below. At any rate,
`the header transfer signal indicates the header preprocessing
`logic is ready to accept the packet header from the corre-
`sponding input port. Upon detecting the assertion of the
`header transfer grant,
`the corresponding input port will
`begin streaming continuous header fields to the switch fabric
`2“).
`
`30
`
`DJat
`
`40
`
`Having described the interface between the input ports
`and the switch fabric 210, the internal details of the switch
`fabric 210 will now be described. Referring to FIG. 3, a
`block diagram of an exemplary switch fabric 210 is
`depicted. In general, the switch fabric 210 is responsible for
`directing packets from an input port to an output port. The
`goal of the switch fabric 210 is to generate forwarding
`decisions to the input ports in the shortest time possible to
`keep the delay though the switch low and to achieve wire
`speed switching on all ports. The primary functions of the
`switch fabric are performing real-time packet header
`matching, Layer 2 (L2) based learning, L2 and Layer 3 (L3)
`aging, forming L2 and L3 search keys for searching and
`retrieving forwarding information from the forwarding data-
`base memory 140 on behalf of the input ports, and providing
`a command interface for software to efficiently manage
`entries in the forwarding database memory 140.
`Layer 2 based learning is the process of constantly
`updating the MAC address portion of the forwarding data-
`base 140 based on the traffic that passes through the switch-
`ing device. When a packet enters the switching device, an
`entry is created (or an existing entry is updated) in the
`database that correlates the MAC source address (SA) of the
`packet with the input port upon which the packet arrived. In
`this manner, a switching device “learns” on which subnet a
`U: u
`' node resides.
`
`4sm
`
`50
`
`Aging is carried out on both link and network layers. It is
`the process of time stamping entries and removing expired
`entries from the forwarding database memory 140. There are
`two types of aging: (l) aging based on MAC SA, and (2)
`aging based on MAC destination address (DA). The former
`is for Layer 2 aging and the latter aids in removal of inactive
`Layer 3 flows. Thus, aging helps re