`5,978,951
`[11] Patent Number:
`'
`[19]
`United States Patent
`
`Lawler et al.
`[45] Date of Patent:
`Nov. 2, 1999
`
`llllllllllllllllllllll|l|l|llillllillllllllilllllllllllllllllilllllllillll
`U5005978951A
`
`[75‘]
`
`[54] HIGH SPEED CACHE MANAGEMENT UNIT
`1‘ 0R U51" IN A BRIDGE/ROI} l hR
`Inventors: Christopher P. Lawler, Wellesley;
`Shannon Q. HiII, Westford; David
`Lipschutz, Lexington; Thomas A.
`Radogna, Westhorough; John A.
`Flanders, Ashland; Robert M. France,
`Littleton; Stephen L. Van Seters, Stow,
`all of Mass
`
`[73} Assignee: 3Com Corporation, Santa Clara, Calif.
`
`[31] Appl. No.: 08/927,336
`m
`.
`[22] MIMI:
`59P- 11, 1997
`1511
`Int. cm .................................................... H03M 13/00
`9
`a
`-.
`.7.
`/
`[5;]
`I‘JIS' 0' l 714/738’ 370/3?" 372:?
`
`[5*] PM“ Of Ska/2“? 7‘3 39;)-rrrr0~~~~40271372795780
`37 ’“3“’ “3‘ ’
`‘ ' 7’ 4 ’1”
`“’ 466’ ‘
`l“ ’7’
`185'0“
`
`[56]
`
`References Cited
`US PATENT DOCUMENTS
`
`11/1995 Ghosh Ct a1.
`5,469,555
`
`2:332;
`22:32:; gig; gtiiggir‘m‘ .
`
`1/1998 Rostokcr et al.
`.
`.. 370/392
`5:708659
`
`............................. 711/162
`5,742,792
`4/1998 Yanai ct al.
`
`711/133
`
`5,802,278
`9/1998 lsfeld et al.
`Prinmljy Examiner—13mmanuel LV Moise
`Agrarian 19%;”) or I'zrnz—Weingarten, Sehurgin, Gagnebin
`“Yes
`
`395000.79
`
`[57]
`
`ABSTRACT
`
`A method and cache management for a bridge or bridge/
`router providing high-speed, flexible address cache manage~
`ment. The unit maintains a network address cache and an age
`table, searches the cache for layer 2 and layer 3 addresses
`from received frame headers, and returns address search
`results The unit includes an interface permitting processor
`manipulation of the cache and age table, and supports a
`4-way set associative cache to store the network addresses.
`A plurality of functions implemented in hardware enables
`gggngumnjggtghgfg 3233,1139333‘3311?'rcilonlif§af§§
`u c,
`estr.evr.
`t
`1
`t
`select destination ports within a Lead Balanced Port Group
`for frame forwarding. The unit utilizes Virtual LAN identi~
`fication in conjunction with a MAC address forlookup in the
`cache. A cyclic redundancy code for each address to be
`looked up in the cache is used as an index into the cache If
`:1 cache thrash rate exceeds a predetermined threshold, CRC
`table values can be rewritten. Four time—sliced cache lookup
`units are provided, each consisting of a cache Iookup eon—
`trOller for comparing a
`received network address to an
`address retrieved from an identified cache set.
`
`26 Claims, 14 Drawing Sheets
`
`
`
`12
`
`MOTHERB ARD
`
`30
`
`FRAME
`PROCESSOR
`
`
`
`AGE
`TABLE
`
`130
`
`“W
`110
`PIO REGISTER
`C112
`NC I f
`SETS
`NETWORK
`0
`106
`108
`INTFC INPUT
`14"TNIM 0 I
`REGISTER FILE _
`
`CRC LRU VLD
`100A ”'0 I ——~—-——
`
`ENG TBL TBL
`r _ _ _ £129 ._ ..
`1
`NETWORK
`
`
`- l
`116
`I
`INTFC OUTPUT +—
`
`.
`REGISTER FILE
`I
`-
`I
`LOOKUP
`N50 I t—fifm PACKETtZATION
`:
`
`QUEUE
`__
`123
`ENGINE
`I
`
`
`OUTPUT
`
`'
`L%%CKHUEP
`PACKET
`
`
`
`
`
`i
`ASSEMBLY
`CONTROLLER
`
`
`EECEEEQOELE’B NE
`
`
`
`NIC
`
`3
`
`r
`
`Z g (A)
`
`I
`
`
`
`
`
`
`
`
`
`
`
`
`
`104
`
`
`
`ADDRESS
`CACHE
`
`2
`C)
`N_
`
`-
`
`26
`
`UNIFIED 1001
`
`UNIFIED 1001
`
`
`
`US. Patent
`
`5,978,951
`
`mcmaxomm
`
`mmw.fl322522:820:6220:622
`(m{0362£9552£0262x5502
`
`
`
`
`9895522momtmE.08:35oomtmEmomtmE
`
`
`
`
`
`
`N65%
`
`
`
`US. Patent
`
`Nov. 2, 1999
`
`Sheet 2 of 14
`
`5,978,951
`
`do
`
`
`
`N.UNE
`
`KEEN$532--amigo/N
`elm1838mmL1inmowwmoommwN.mIoS
`
`
`
`Nlm92l-lL'-.©NI$5105
`
`Qm<0mmm1HO§
`
`
`
`
`
`m§<mmzo_._.<o:n_n_<mmmmomz
`
`585%65%?
`
`éozfiz
`
`
`
`mmemIEma»:8:225NEE;wl>molmml-ImemwmooE
`
`
`3memmwooENINmomwmooEI-fll-
`I.||lfilllllulllullmflfllléIl'l'
`888Emugs-mWES@3892
`mma23>
`
`lENN-ENEojozE/Ns.
`
`
`OZEKSSKOudim.Omwmw._.<.rmm>m0mm
`25>I:’IIH,l
`
`
`aNINENE
`
`
`
`mmofi:m>_momm
`
`g
`
`
`
`mmmfimfirmZ_IO<_>_
`
`mucrwfifimmmmmgwémm3NomhzooEonwonEPw
`
`aII:II.
` M:
`
`olv
`
`
`
`
`
`
`
`
`
`
`
`
`US. Patent
`
`Nov. 2, 1999
`
`Sheet 3 0f 14
`
`5,978,951
`
`wmmxoo<
`
`mIo<o
`
`.r@3504
`
`mIo<o
`
`mN
`
`
`
`mzo<o
`
`nSxOB
`
`mambo
`
`ommdmmmIHOE a5-0.2
`
`méém
`
`mowwmooma
`
`om
`
`ZOfi<Nfimxo<a
`
`mzfizw
`
`Sago
`
`Exoé
`
`stmmw<
`
`xmogkmz
`
`5&20&2.
`
`BEmmfimfiwm
`
`meEmz
`
`5950Quiz.
`
`BEmewGwm
`
`
`
`US. Patent
`
`Nov. 2, 1999
`
`Sheet 4 0f 14
`
`5,978,951
`
`
`
`
`
`
`
`0-95m:
`
`<o<
`I8%EH5!Nm2%HmEé
`
`
`806%208$25229$?th<o<
`
`HEB$06
`
`cm2082:0mags
`
`m.
`
`HESmm
`
`
`
`8-35%i:
` GEM
`2E22%2089:8<o<
`'3?!
`
`3v.UNK
`
`ES280
`
`3%?
`
`£8
`
`
`#11
`
`
`
`
`
`
`
`
`
`208£35
`
`mwmwSow2mm3%
`
`2.80msom
`
`$8280
`
`F“mams8
`
`_‘“momwmm‘fibxxomom38ommm$052
`N».UNK
` BowNwmm2%3mm
`
`
`
`
`
`
`
`N26m_26w.o26m
`
`7:26m
`
`m261F30mo26m
`
`7:26m
`
`
`
`
`
`
`US. Patent
`
`Nov. 2, 1999
`
`Sheet 5 of 14
`
`5,978,951
`
`Cache configured with 1 Bank of 128KX8 SRAMS
`BankA
`
`
`11X128KX8
`
`Hill!
`765
`
`all!!!
`
`0
`
`I!!!
`43210
`
`I!!!
`
`
`ACA
`
`aca~ac_ram_adr_a(16-0)
`
`
`aca_ac~ram_oe_a_l
`
`
`
`
`
`
`
`
`
`
`FIG. 6A
`
`aca_ac_ram_we_a_l
`
`aca_ac_ram_data(87-40)
`aca_ac_ram_adr_b(16—0)
`aca_ac_ram_oe_b~|
`aca_ac_ram_we_b__1
`aca_ac_ram_data__(39-0)
`
`
`
`
`
`Cache configured with 2 Banks of 64KX16 SRAMs
`
`ACA
`
`BankA 6x64kx 16
`
`
`
`aca_ac_ram_adr__a(15~0)
`acawac_ram_oe_a_1
`aca__ac_ram_we_a_l
`
`aca_ac_ram_data(87—O)
`
`aca_ac_ram_adr_b(15-O)
`aca_ac_ram~oe_b_1
`acawac_ram_we_b_l
`
`
`
`BankB 6x64kx 16
`
`FIG. 6B
`
`ACA
`
`acawac_ram_wefia~l
`
`Cache configured with 1 Bank of 64KX16 SRAMS
`
`
`BankA 6X64kx 16
`aca_ac‘ram_adr_a(15-O)
`
`
`aca_ac_ram_oe_a__l
`
` acawac_ram_data(87~0) aca_ac_ram_adr_b(15—0)
`
`
`
`acawac_ram_oe_b_l
`aca_ac_ram_we_b,l
`
`
`
`
`US. Patent
`
`Nov. 2, 1999
`
`Sheet 6 of 14
`
`5,978,951
`
`200
`
`212
`
`RECEIVE PACKET OF
`
`HEADER ADDRESSES
`
`REGISTER FILE
`
`FROM FHP IN INPUT
`
`202
`
` MAC SA
`RECEIVED?
`
`N0
`
`
`
`Yes
`
`204
`
`CRC ENGINE RECEIVES
`
`
`
`CACHE SIZE AND MODE
`
`
`
`
`FROM PIO REGISTER
`
`SETS LOADED VIA FP
`
`206
`
`
`
`CRC PERFORMS HASH
`ON MAC SA USING CACHE
`
`
`
`
`AND PROTOCOL ID
`
`
`
`CONFIGURATION INFO
`
`208
`
`PACKETIZATION ENGINE
`
`USES HASH TO RETRIEVE
`
`VALUES FROM LRU AND
`
`VALID TABLES
`
`
`CACHE LOOKUP QUEUE
`
`210
`
`PACKETIZATION ENGINE
`
`PUTS HASH, MAC SA,
`PROTOCOL ID IN
`
`PACKET AND LOADS
`
`
`
`RECEIVE PACKET OF
`
`HEADER ADDRESSES
`
`FROM FHP IN INPUT
`REGISTER FILE
`
`MAC OR
`
`NETWORK DA
`
`RECEIVED?
`
`CACHE LOOKUP QUEUE
`
`CRC PERFORMS HASH
`
`ON DA (MAC 0R
`NETWORK) USING CACHE
`CONFIGURATION INFO
`
`AND PROTOCOL ID
`
`PACKETIZATION ENGINE
`
`USES HASH TO RETRIEVE
`
`VALUES FROM LRU AND
`
`VALID TABLES
`
`PACKETIZATION ENGINE
`
`PUTS HASH, DA,
`PROTOCOL ID IN
`
`PACKET AND LOADS
`
`FIG. 7A
`
`
`
`US. Patent
`
`Nov. 2, 1999
`
`Sheet 7 of 14
`
`5,978,951
`
`
`
`220
`
`
`
`
`
`222
`
`
`LOOKUP UNIT CONTROLLER
`STRIPS PACKET FROM CACHE
`
`LOOKUP QUEUE AND ADDS
`
`SET # AND VALID BIT
`
`
`
`ANOTHER
`
`SELECT NEXT
`
`SET NUMBER
`
`
`
`SE
`
`CLOCK CYCLE
`
`LOOKUP CONTROLLER READS
`
`CACHE USING HASH
`
`CONCATENATED W/ SET #
`
`
`S. WAITONE
`
`
`
` ADDR READ =
`
`ADDR RECD?
`
`ANOTHER
`
`SET #7
`
`228
`
`230
`
`READ DATA VALUE ASSOCIATED
`
`WITH THIS CACHE ADDRESS
`
`FIG. 7B
`
`
`
`US. Patent
`
`N0v.2,1999
`
`Sheet 8 0f 14
`
`5,978,951
`
`9
`
`236
`
`IDENTIFY PROPER OUTPUT
`
`PACKET ASSY FOR THIS LOOKUP
`
`238
`
`OUTPUT PACKET ASSY FILLS
`
`NETWORK INTERFACE OUTPUT
`
`REGISTER FILE W/FRAME ID INFO
`
`240
`
`
`
`
`OUTPUT PACKET ASSY FILLS
`
`
`
`
`NETWORK INTERFACE OUTPUT
`
`REGISTER FILE W/LOOKUP
`
`STATUS AND DATA
`
`242
`
`
`
`SA & DA
`DONE FOR
`
`No
`THIS FRAME
`
`ID?
`
`
`244
`
`Yes
`
`OUTPUT PACKET ASSY SIGNALS
`
`RHP AND RHP TO RETRIEIVE
`
`DONE
`
`FIG. 7C
`
`
`
`US. Patent
`
`Nov. 2, 1999
`
`Sheet 9 of 14
`
`5,978,951
`
`NIC to ACA data
`
`
`
`Nibbles
`
`
`
`
`
`
`Protocol ID
`
`127-120
`
` ——n
`
`
`
`——_
`
`FIG. 8
`
`
`
`US. Patent
`
`Nov.2_, 1999
`
`Sheet 10 0f 14
`
`5,978,951
`
`
`
`
`
`ACA to NIC data
`
`Bridged Frame
`
`Muiticast
`
`
`
`
`
`Protocol lD
`
`Frame control
`
`Protocol lD
`
`Frame control
`
`
`
`
`DA VC handle
`DA VC handle
`——
`SA search status
`SA search status
`
`SA address state
`
`SA address state
`
`'
`
`DA search status
`
`DA search status
`
`DA address state
`DA address state
`—_
`
`——
`
`
`RHP vlan lD
`RHP vlan lD
`
`Unused
`Unused
`
`
`Transmit virtual port
`Transmit virtual port
`LBPG mask
`LBPG mask
`__
`DA adr oorou mask(31-24
`DA adr oorou mask(23-0)
`-INDAvort mas -
`.
`.
`_
`—_
`DA adr ' r0” mask 31‘02
`DA adggroup mask(31—O
`
`
`
`
`
`
`
`
` DA learned port
`
`Discard bit, TX encap
`Receive virtual port
`Se uence number
`Se uence number
`
`
`SA software ta
`
`DA software ta
`
`Unused 12 bits
`
`SA software ta
`
`DA software ta
`
`
`Unused 12 bits
`
`FIG. 9A
`
` Receive virtual ort
`
`
`
`
`
`US. Patent
`
`N0v.2,1999
`
`Sheet 11 of 14
`
`5,978,951
`
`ACA to MC data
`
`Routed Frame
`
`Protocol ID
`
`Protocol lD
`
`Multicast
`
`
`
`
`
`
`
`
`
`DA VC handle
`DA VC handle
`__
`
`SA search status
`SA search status
`
`
`
`
`
`
`
`SA address state
`
`DA search status
`
`SA address state
`
`DA search status
`
`DA address state
`DA address state
`——
`DA 47-24 .
`Parent oort mask
`DA 23-16
`Forward sort mask 23-16
`_—
`DA(15-0L__
`Forward port mask 15—0
`RHP vlan ID
`RHP vlan lD
`
`
`
`008 39-32
`008 31-8
`
`
`
`
`
`
`
`
`
`
`
`ACA vlan lD
`ACA vlan lD
`——
`
`
`Transmit virtual port
`Transmit virtual port
`LBPG mask
`LBPG mask
`
`
`—
`
`
`QOS 39-32
`008 31-8
`
`
`—
`
`
`
`DA adr group mask(31-O
`DA adr ourou mask 31-0)
`
`
`DA learned ort tmask >W t
`
`
`
`Puma multicas
`
`35~-Re§erivt3dj-i I}?!.........j..............152375315 7?
`Discard bit, TX encap
`
`
`008 7-0
`QOS 7—0
`
`
`
`
`
`
`
`
`
`
`
`
`—
`
`SA software ta
`SA software tag
`DA software tag
`DA software tag
`_—
`——
`
`Unused 12 bits
`Unused 12 bits
`
`
`
`US. Patent
`
`N0v.2,1999
`
`Sheet 12 of 14
`
`5,978,951
`
`8Bridge cache address beat56
` ,
`
`8Bridgecache unicast data beat
`-_Mm&mmmm-mm
`Address port group
`Learned Bridg
`xmit Address
`Software
`Circuit
`port
`misc encap
`state
`tag
`handie
`
`
`.
`
`..
`
`,
`
`MAC address
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Bridge cache muiticast data beat
` 'mm3L_fi-EI_J[Io-t]
`
`[Kc—oi"
`5Puma BridgR
`Address
`Software
`Circuit
`
`
`
`
`mask
`
`misc
`
`state
`
`tag
`
`handle
`
`Address port mask
`
`FIG. 10
`
`Route cache address beat 0
`
`[o-[I'BM‘HE—l
`-.}:,-.
`.--
`Network address(79-0
`
`
`
`
`
`Route cache address beat 1
`87
`80_
`Network address(159-80)
`
`FIG. 11
`
`
`
`US. Patent
`
`Nov. 2, 1999
`
`Sheet 13 of 14
`
`5,978,951
`
`Route cache unicast data beat 0
`
`[Mlmmmmw
`Learned Route Xmit Address
`Software
`Circuit
`
`handle
`
`.j—
`
`port
`
`misc encap
`
`state
`
`tag
`
`Route cache unicast route flow data beat 1
`
`40
`87
`0
`39
`MAC destination address
`008 flow data
`
`Route cache unicast bridge flow data beat 1
`
` Address ort grou
`
`QOS flow data
`
`Route cache multicast data beat 0
`
`
`
`
`
`Inn-1mm
`[5.0)
`Address
`Circuit
`Software
`Dest
`Reserved
`Puma
`Route 3:37“.
`
`
`
`
`state
`VLAN
`mask
`misc :1:
`
`
`tag
`
`handle
`
`Route cache multicast data beat 1
`
`0
`v
`39
`‘0
`_&I-
`FonIvard oort mask
`008 flow data
`
`
`
`
`FIG. 12
`
`ACA cache status bits
`
`Bit osition
`Bit description
`
`Cache hit
`
`
`Load balanced port group equal
`
`Port equal
`incomplete search
`Soft path only
`Lec id eual
`
`6 5 4
`
`3 2 1
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Broadcast (bridge addresses only)
`Multicast brid-e addresses only
`
`
`
`FIG. 14
`
`
`
`US. Patent
`
`Nov. 2, 1999
`
`Sheet 14 0f 14
`
`5,978,951
`
`Network address state field bits
`
`Bit description
`Multicast identl er
`
`39
`
`
`
`
`
`
`
`
`
`
`
`
`
`37
`Quality of service loss
`
`36
`Queue select high
`Queue select low
`
`35
`
`internal address
`34
`
`Notification High
`33
`Notification low
`32
`
`
`008 flow data
`
`
`
`Bit positions
`Bit description
`
`
`39-36
`Class of service (COS)
`
`
`‘_
`'
`t
`..... - ...........................3 ”
`
`
`
`
`
`27—16
`Token bucket depth
`Token bucket index
`15-8
`
`
`
`
`
`
`
`
`
`Bit position
`
`Bit description
`
`
`
`5,978,951
`
`1
`HIGH SPEED CACHE MANAGEMENT UNIT
`FOR USE IN A BRIDGE/ROUTER
`
`RELATED APPLICATIONS
`
`Not Applicable
`
`STATEMENT REGARDING FEDERALLY
`SPONSORED RESEARCH
`
`Not Applicable
`
`BACKGROUND OF THE INVENTION
`
`Network devices such as bridges and routers are
`employed in networks which are operating at relatively high
`speeds. Functions such as frame header parsing, address
`lookup, frame processing, and related functions have previ-
`ously been performed primarily in software. Software pro-
`cessing of frames results in higher device latency, implicat—
`ing a higher frame drop rate due to congestion and the
`necessity of providing expensive buffering.
`SUMMARY OF THE INVENTION
`
`The presently disclosed cache management unit is also
`referred to herein as an Address Cache ASIC (ACA), though
`the functions and features presently described are also
`realized,
`in alternative embodiments of the invention, via
`any other hardware structure. The ACA is preferably used as
`one element of a network device such as a bridge or
`bridge/router, and provides high-speed,
`flexible address
`cache management. The ACA is responsible for: maintain-
`ing a hardware address table and a hardware age table;
`searching the hardware address table for layer 2 and layer 3
`addresses received from a Received Header Processing
`(RHP) element which parses frame headers for address
`information; and returning address search results, including
`the destination port(s) for received frames,
`to a Received
`Frame Processor (RFP) element. The ACA includes an
`interface which permits a Frame Processor to manipulate the
`hardware address table and age table.
`A cache associated with the ACA stores data associated
`with each of plural network addresses. The ACA receives
`packets of source and destination addresses from RI-IP’s in
`the network device, and searches the associated cache for
`each address. The ACA responds to the appropriate RFP
`with a response packet which indicates whether the source
`and destination addresses were located (hit) in the cache and,
`for each located address, includes data associated with that
`address. If there was a hit,
`the RFP generates a transmit
`vector which contains information which permits the frame
`to be processed in hardware at speeds approximating frame
`reception rates. The ACA supports a 4-way set associative
`cache to store the network addresses. This hardware imple-
`mentation provides better performance than the binary
`search table employed in the prior art due to a significantly
`shorter search time in the cache.
`
`The ACA, through a plurality of discrete functions imple—
`mented by ACA hardware, enables software manipulation of
`the associated cache. Such manipulation includes installing
`entries, removing entries, changing entries and diagnosing
`the memory. Entry installation by a processor executing such
`software is the mechanism by which the cache learns new
`addresses. One further function,
`the SEMICH function,
`enables automatic address processing and lookup in the
`cache by the ACA without software intervention.
`The ACA provides four cache operating modes:
`DISABLE, LEARN, BRIDGE ONLY, and BRIDGE/
`
`2
`the cache is
`ROUTE MODE. When in DISABLE mode,
`accessible only through diagnostic react and write opera»
`tions.
`In LEARN mode, address installation, as well as
`diagnostic read and write operations, are enabled.
`In
`BRIDGE ONLY mode, the cache is employed solely for
`storing data associated with bridge addresses. Lastly,
`in
`BRIDGE/ROUTE mode,
`the cache is divided into equal
`portions for layer 2 (bridge) address processing and for layer
`3 (route) address processing.
`In a network element which supports Load Balanced Port
`Groups (LBI’Gs), a group of ports treated as a single
`high—bandwidth port, the presently disclosed ACA identifies
`and selects destination ports within the identified LBPG for
`respective frame forwarding.
`When functioning in a network bridge which supports
`Virtual Local Area Networks (VLANS), the present ACA
`utilizes the VLAN identification in conjunction with the
`MAC for address lookup in the cache.
`A cyclic redundancy code is generated for each address to
`be looked up in the cache by the ACA. The CRC code is used
`as the index for lookup Within the cache. As noted, the cache
`is 4—way associative, meaning that there are four sets of
`addresses and data for each CRC code. If the thrash rate
`exceeds a predetermined threshold, software can rewrite the
`CRC table values, thus causing alternative CRC codes to be
`generated, and avoiding the high thrash rate problem.
`Finally, owing to the fact that the cache itself is accessed
`at most once out of every four clock cycles for a single
`address search, the presently disclosed ACA provides four
`cache lookup units, each being provided with access to the
`cache every four clock cycles. Each cache lookup unit
`consists of a cache lookup queue for storing the CRC code
`of the address to be searched, and a cache lookup controller
`for pulling codes from the queue concatenated with a set
`number from a least-recentlysused table value for the code,
`and compares the network address associated with the code
`to that of the identified set. If the addresses match,
`the
`controller signals that the lookup is complete and moves on
`to the next code in the queue. Otherwise, when all sets are
`exhausted without a match, the controller reports the failure
`to match and moves on to the next code in the queue.
`Thus,
`the presently disclosed ACA enables network
`address lookup to permit frame processing at high speed
`while enabling flexible modification and diagnosis of the
`cache via software instituted functions.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The invention will be more fully understood by reference
`to the following Detailed Description of the Invention in
`conjunction with the drawings of which:
`FIG. '1 is pictorial diagram of a network device in accor«
`dance with the present invention illustrating Network Inter-
`face Modules and a Motherboard mounted to a backplanc;
`FIG. 2 is a block diagram of a network device in accor~
`dance with the present invention illustrating one Network
`Interface Module coupled to the Motherboard via a back-
`plane;
`FIG. 3 is a block diagram of an Address Cache ASIC
`(ACA) according to the present invention, interfaced to an
`Address Cache, Frame Processor, and plural Network Inter-
`face Modules;
`FIG. 4A illustrates the organization of at least a portion of
`the Address Cache of FIG. 3 configured as a Bridge Cache;
`FIG. 4B illustrates the organization of at least a portion of
`the Address Cache of FIG. 3 configured as a Route Cache;
`
`an
`
`20
`
`h)Ln
`
`Lu0
`
`40
`
`50
`
`’JILn
`
`6t)
`
`(35,
`
`
`
`5,978,951
`
`3
`FIG. 5 illustrates the format of words used to address the
`Address Cache of FIG. 3;
`FIGS. 6A, GB and 6C are block diagrams of various
`physical configurations of memory devices comprising the
`Address Cache of FIG. 3 in conjunction the ACA;
`flow
`FIGS. 7A, 7B and 7C collectively comprise a
`diagram of the method of operation of the ACA and Address
`Cache according to the present invention;
`FIG. 8 illustrates the data content of a packet transmitted
`from a Network Interface Chip (NIC), associated with a
`Network Interface Module, to the ACA;
`FIGS. 9A and 9B illustrate the data content of bridged and
`routed frame packets transmitted from the ACA to a Net-
`work Interface Chip (NIC);
`FIG. 10 illustrates bridge cache beat formats;
`FIGS. 11 and 12 illustrate route cache beat formats;
`MG. 13 illustrates specifics of various fields of FIGS.
`10~12; and
`FIG. '14 provides a description of the status hits sent from
`the ACA to a Network Interface Chip.
`DETAILED DESCRIPTION OF THE
`INVENTION
`'1 and 2 Overview
`FIGS.
`Referring to FIGS. 1 and 2, a bridge/router network
`device 10 for use in a telecommunications network includes
`a motherboard 12 and at least one network interface module
`14. Each of the network interface modules 14 interfaces to
`the motherboard 12 through a backplane 16.
`Each network interface module 10 includes at least one
`input port 18 through which data units such as frames,
`packets and cells are received and at least one output port 20
`through which data units are forwarded downstream for
`receipt by another network device. In particular, the ports
`provide connections via communication links to other
`devices in the network. Incoming data units may be bridged,
`routed, translationally routed or filtered.
`In one embodiment of the presently disclosed network
`device 10, four slots are provided for network interface
`modules '14. Each slot may be populated with an Ethernet,
`FDDI or an ATM UNI interface module.
`In a preferred
`embodiment, each 10/100 megabyte Ethernet network inter—
`face module 14 includes six input/output ports, each FDDI
`network interface module 14 includes six input/output ports,
`and each gigabit Ethernet network interface module 14
`includes one input/output port. An ATM network interface
`module 14 includes four 0C3 input/output ports or one
`OC12 input/output port.
`Elements in the motherboard 12 and interface modules 14
`are responsible for data unit reception and transmission,
`parsing of data link and network layer headers within
`received frames, look—up of source and destination Media
`Access Control (“MAC") and network layer addresses and
`for making forwarding decisions regarding the received
`frames.
`The motherboard [2 includes an address cache ASIC
`(“ACA”) 26 with an associated cache 28, a Frame Processor
`30, an application processor 31 and a Master Buffer ASIC
`(“MBA”) 32.
`The ACA 26 serves to perform look-ups on source and
`destination addresses (8A5 and DAs) passed to the ACA
`from a Receive Header Processor (“RI-1P") within the
`respective network interface modules 14. The ACA 26 is
`capable of looking up MAC addresses for bridging support
`and Network Layer destination addresses for routing sup-
`port.
`
`u:
`
`If)
`
`I»m
`
`40
`
`50
`
`u.LII
`
`60
`
`65
`
`4
`The MBA 32, located on the motherboard 12, serves to
`provide global data buffer management of frames which
`reside in buffers in the Buffer RAM 22 disposed on respec-
`tive Network Interface Modules 14.
`Each network interface module 14 includes a receive
`ASIC and a transmit ASIC, both of which are specific to the
`type of data trafiic supported by the respective network
`interface module (such as Ethernet, ATM and FDDI).
`Address Cache ASlC (ACA) Overview
`The principle functions of the Address Cache ASIC
`(ACA) 26 include maintaining hardware address tables,
`maintaining a hardware age table, searching the hardware
`address table for layer 2 and layer 3 addresses provided by
`network interfaces, returning address search results includ—
`ing a destination port identifier for received frames to the
`network interfaces, and providing a mechanism for software
`to manipulate the hardware address table and age table. A
`4-way set associative cache 28 (discussed subsequently) is
`employed to store network addresses.
`The main function of the ACA 26 is to look up network
`addresses in the cache 28. The ACA receives packets con—
`taining two addresses, a source address and a destination
`address, from the Network Interface Chips (NICS) 100. The
`ACA 26 searches the cache 28 for each of these addresses
`and responds to the Network Interface Module (NIM) .14,
`and more specifically to the respective NIC that sent the
`packet after the searches are completed, by assembling a
`response packet and forwarding the packet to the respective
`NIC 100. The response packet indicates whether or not the
`addresses were found in the cache 28 and, for each found
`address, the packet includes data stored in association with
`the network address. A more detailed explanation of this
`process is provided subsequently.
`The ACA 26 also provides an interface for software to
`manipulate and diagnose the cache 28. A Frame Processor
`(PP) 30 allows software to install and remove network
`addresses and associated data from the cache 28. Although
`the ACA does not
`learn addresses by itself,
`it provides
`hardware assistance,
`through Programmable Input/Output
`(PIO) register sets 102, in installing addresses in the cache
`at a rate of approximately 500K addresses per second. The
`PF 30 also provides a software path (soft path) accessible
`into an age RAM and hardware assistance during the aging
`process.
`The ACA 26 runs at speeds approximately the frame
`reception rate (wire speed) to avoid a costly and complex
`queuing system for received frames. To achieve wire speed,
`the ACA handles 4 million packets per second from the Nle
`100. In contrast, received frames handled by software can be
`processed at approximately 1/2 million packets per second.
`Since each packet from a NIC 100 contains two addresses
`(SA and DA), the ACA 26 must be able to search the entire
`cache 8 million times per second, or 4.5 cycles per search at
`an ACA clock rate of 37.5 MHz.
`The cache 28 stores data link and network layer addresses.
`As described later, the cache can be configured as a bridge
`only cache or as a split bridge/route cache. The bridge cache
`store both data link layer SAs and DAs, while the route
`cache store network layer DAs and link layer SAs.
`If a frame is intended for a destination address which is
`not
`in the cache 28, the frame is transmitted out all ports
`under software control. If a reply frame is received from the
`device having the uncached MAC address, the port associ—
`ated with the previously unknown DA address becomes
`known and the frame is forwarded via a high—speed hard-
`ware forwarding path within the device 10 since the reply
`frame destination address (the original source address) is
`
`
`
`5,978,951.
`
`Ur
`
`10
`
`wm
`
`40
`
`5
`known and included in the cache. Software updates the
`cache to include the address of the station having the
`original, unknown destination address (the subsequent
`source address). Software also updates the cache 28 if the
`ACA 26 determines that the source address data received
`does not agree with the source address data stored in the
`cache. This situation may arise in the event a device is
`moved such that frames it originates are received at a
`different port of the device 10. The ACA 26 also makes a
`determination of whether the destination address and the
`source address are on the same segment, and if so, constructs
`a packet indicating this situation to the RI'TI’ 46, which will
`inhibit retransmission of the frame.
`In a preferred embodiment, the address cache 28 is 4—way
`set associative, can vary in depth up to a maximum of 16K
`rows, and is 88 bits wide. Each cache row contains four
`cache entries, one per set. The cache 28 itself is implemented
`in SRAMS and the ACA 26 supports both X8 and X16
`devices. For a cache organized as bridge-only cache, there
`are two storage elements or “beats” per bridge cache set
`(FIG. 4A). For a bridge-route cache, there are four beats per
`route cache set (FIG. 4B). The specific format of these beats
`are found in FIGS. 10, 11 and 12. FIG. 13 provides details
`of what is stored in various fields in FIGS. 10, 11 and 12.
`At a high level, a cache lookup occurs as follows. An _
`address to be searched (MAC SA, MAC DA, or network
`DA) is received by the ACA 26. A Cyclic Redundancy Code
`(CRC) process is performed by a CRC engine 104 on the
`address to generate a CRC. This code is then used to identify
`a cache row. There being four sets associated with each
`cache row,
`the ACA uses a Least Recently Used (LRU)
`algorithm to identify a set order for address comparison, and
`a valid table is referenced to identify whether any of the sets
`are invalid. From the latter two values, a most likely, valid
`set in the identified cache row is chosen, and the address
`value stored therein is compared against the address from
`which the CRC was generated. If a match occurs, the data
`stored in conjunction with that address in the cache set is
`retrieved and returned to the respective NIC 100. If no match
`occurs, the next valid set in the row is selected and compared
`to the received address. When all valid sets have been
`searched without a match, an indication thereofis forwarded
`to the frame processor 30 so that the received frame can be
`processed under software control.
`With reference to FIG. 5, the cache address generated by
`the ACA 26 for the purpose of addressing the cache 28 has
`three parts. A CRC generated from the address of interest is
`known as a “hash.” The hash serves as a row index into the
`cache, and identifies a row of four sets, i.e. sets 0—3 (see also
`FIGS. 4A and 4B). The set index identifies one of four cache
`sets in the row. A Cache Line Index (CLI) identifies a line
`(or “‘bettt”) within a cache set;
`the CLI is one bit for the
`bridge cache and two bits for the route cache. For bridge/
`route addresses, each half of the cache 28 is addressed
`ditferently. The most significant bit of the cache address
`indicates whether the address is for the bridge or route cache
`searching.
`The ACA 26 accesses the cache 28 once per cycle. In the
`worst case, the ACA must read and compare to the address
`beat of all
`four sets before a match (or "'hit”) occurs.
`Subsequent to a hit occurring, a cycle is needed to read the
`associated data beat, for a total of five cycles. This is more
`than the 4.5 cycle maximum required to operate at wire
`speed. However, if the hit occurs on the first, second or third
`compare, the total time is up to 4 cycles, below the maxi«
`mum. On average,
`the search time will be below the 4.5
`cycle threshold.
`
`50
`
`LAU1
`
`60
`
`65
`
`6
`the presently disclosed ACA 26 is provided
`However,
`with a fallback option. The number of addresses queued up
`to be searched is monitored, and if it reaches a given
`threshold, the ACA 26 operates in a degraded mode where
`the maximum number of sets searched is reduced. When the
`number of enqueued searches falls below the threshold
`again, the normal search mode involving the search of up to
`all four sets is resumed.
`While each route address set is provided with four beats,
`two for address and two for data, not every lookup of a route
`address will require comparison against both address beats.
`If the route network layer destination address is less than ten
`bytes in length, the second address beat is not used. Both
`data beats are used regardless of the length of the address
`compared.
`For a short route address (less than ten bytes), the worst
`case search requires four address compares, and two cycles
`for data retrieval. All route frames have a MAC layer SA
`which has a worst case search of five cycles, as above.
`Therefore,
`the ACA 26 processes route frames with short
`addresses in eleven cycles, worst case. In degraded mode,
`limiting the number of sets searched per row, the ACA 26
`docs three searches on the MAC layer SA and three searches
`on the network layer DA. The maximum search time is then
`reduced to an acceptable nine cycles.
`For long route addresses (eleven bytes or more), the worst
`case search involves two address beat reads per set (eight
`cycles) plus two data reads, for a total of ten cycles. Five
`more cycles are added by the MAC SA address search for a
`overall
`total of fifteen cycles.
`In degraded mode,
`three
`searches are performed at the MAC SA layer, and one search
`is performed for the network layer DA. The maximum
`search time, in degraded mode, for a long route address, is
`therefore eight cycles.
`The ACA 26 implements at Least Recently Used (LRU)
`policy for replacing and searching cache lines. An LRU table
`106, stored in internal RAM, provides entries for each cache
`line. Each LRU table 106 entry provides an ordered series of
`four numbers, zero through three,
`indicating the most
`recently used set
`in the respective line through the least
`recently used set of the same line. For instance, a value of
`“2013” for a given cache line indicates that set “2” was most
`recently used, and set “3” was least recently used. When all
`four sets in a row are full, and the frame processor 30 is
`going to install a new address in one of the sets, the set that
`was least recently used is chosen for replacement. In the
`foregoing example, the address and related data in set “3”
`would be overwritten. When a set, which is not the most
`recently used, is hit, the respective LRU bits are swapped
`with the previous set in the order. For example, if the I.RU
`code is 20.13 and a hit occurs on set 3, the set LRU order then
`becomes 2031. When a set is installed via frame processor
`30 intervention, it becomes the most recently used set. 80 if
`the LRU for a cache line is initially “2013” and a new
`address and data are installed in set “3”, the LRU entry for
`this line becomes “3201". Finally, when a set is removed, it
`becomes the least recently used set.
`The LRU bits are also used to determine how the sets
`within the cache are searched. The set that has been most
`recently seen is searched first. The least recently searched set
`is tested last. The ACA stores VALID bits for each of the
`cache 28 entries in the same internal RAM as the LRU table.
`The VALID bits are used to optimize the address lookups.
`’I‘he VALID bits are react before the cache lookup begins, to
`provide an indication of which sets are available for search—
`ing. After eliminating sets according to the VALID bits, sets
`are searched according to the LRU bit order.
`
`
`
`5,978,951,
`
`’J\
`
`10
`
`7
`The size of the cache is variable in the number of cache
`rows only. The size of the cache 28 is conveyed to the ACA
`26 via the PIO register sets 102; the ACA 26 uses the cache
`size information to mask out hash bits from the CRC engine
`104. The LRU and VALID tables 106, 108 in the ACA26 are
`directly related to the cache 28 size.
`The cache 28 is physically configured in one of three ways
`in the first embodiment, though other configurations are also
`envisaged. These three configurations are shown in FIGS.
`6A, 6B and 6C. Each configuration supports multiple SRAM
`depths. The cache 28 configuration information is required
`by the ACA 26 to properly generate SRAM ad