`
`TITLE
`
`
`A HIGH SPEED CACHE MANAGEMENT UNIT FOR USE IN’A BRIDGE/ROUTER
`
`RELATED APPLICATIONS
`
`Not Applicable
`
`STATEMENT REGARDING FEDERALLY
`
`SPONSORED RESEARCH
`
`Not Applicable
`
`BACKGROUND OF THE INVENTION
`
`Network devices such as bridges and routers are employed
`
`in networks which are operating at relatively high speeds.
`
`Functions such as frame header parsing, address lockup, frame
`
`processing,
`
`and related functions have previously been
`
`performed primarily in software.
`
`Software processing of
`
`frames results in higher device latency, implicating a higher
`
`frame drop rate due
`
`to congestion and the necessity of
`
`providing expensive buffering .
`
`SUMMARY OF THE INVENTION
`
`The presently disclosed cache management unit
`
`is also
`
`referred to herein as an Address Cache ASIC (ACA) ,
`
`though the
`
`functions and features presently described are also realized,
`
`in alternative embodiments of the invention, via any other
`”Mstructure. The ACA is preferably used as one element
`of a network device such as a bridge or bridge/router, and
`
`provides high—speed, flexible address cache management. The
`
`ACA is responsible for: maintaining a hardware address table
`
`5
`
`lO
`
`
`
`25
`
`_ 0x
`"
`
`30
`
`and a hardware age table;
`
`searching the hardware address
`
`table for layer 2 and 'layer 3 addresses received from a
`
`Received Header Processing (RHP) element which parses frame
`
`headers for address information; and returning address search
`
`results,
`
`including the destination port(s)
`
`for
`
`received
`
`35
`
`frames,
`
`to a Received Frame Processor (RFP) element. The ACA
`
`includes an interface which permits a Frame Processor to
`
`manipulate the hardware address table and age table.
`A cache associated with the ACA stores data associated
`
`GAGNEBIN 5 HAYES LLP
`WEINGARTEN, SCEU‘RGIN,
`«VTEL (517) 542-2293
`gm: (61.7) 451.0313
`
`EXpreSS MELJ'l Number
`s
`4
`.
`.
`EM'LSCp L? ”\z/Ol‘ W8
`
`UNIFIED 1014
`
`1
`
`UNIFIED 1014
`
`
`
`_2_
`
`3‘
`
`with each of plural network addresses.
`
`The ACA receives
`
`packets of source and destination_addresses from RHP’s in the
`
`network device, and searches the associated cache for each
`
`The ACA responds to the appropriate RFP with a
`address.
`response packet which indicates whether
`the
`source and
`
`in the cache and,
`destination addresses were located (hit)
`for each located address,
`includes data associated with that
`
`the RFP generates a transmit
`JIf there was a hit,
`address.
`vector which contains information which permits the frame to
`
`be processed in hardware at
`
`speeds approximating frame
`
`The ACA supports a 4—way set associative
`reception rates.
`cache
`to store the network addresses.
`This hardware
`
`implementation provides better performance than the binary
`
`search table employed in the prior art due to a significantly
`shorter search time in the cache.
`
`The ACA,
`
`through. a plurality of discrete functions
`
`implemented by ACA hardware, enables software manipulation
`
`of
`
`the
`
`associated cache.
`
`Such manipulation includes
`
`installing entries,
`
`removing entries, changing entries and
`
`diagnosing the memory.
`Entry installation by a processor
`ekecuting such software is the mechanism by which the cache
`learns new addresses.
`One
`further
`function,
`the SEARCH
`
`function, enables automatic address processing and lockup in
`the cache by the ACA without software intervention.
`
`The ACA provides four cache operating modes: DISABLE,
`
`LEARN, BRIDGE ONLY, and BRIDGE/ROUTE MODE. When in DISABLE
`
`mode,
`
`the cache is accessible only through diagnostic read
`
`and write operations.
`
`In LEARN mode, address installation,
`
`as well as diagnostic read and write operations, are enabled.
`
`In BRIDGE ONLY mode,
`
`the cache is employed solely for storing
`
`data
`
`associated with
`
`bridge
`
`addresses.
`
`Lastly,
`
`in
`
`BRIDGE/ROUTE mode,
`
`the cache is divided into equal portions
`
`for
`
`layer
`
`2
`
`(bridge)
`
`address processing and for
`
`layer
`
`3
`
`(route) address processing.
`
`In a network element which supports Load Balanced Port
`
`Groups
`
`(LBPGs),
`
`a group of ports treated as a single high-
`
`bandwidth port,
`
`the presently disclosed ACA identifies and
`
`lO
`
`
`
`
`
`25
`
`3O
`
`35
`
`WEIX‘GARTEN, SCHURGIN,
`GAGNEBIN & HAYES LL?
`TEL (61.7) 542-2290
`’EFAX (617! 45170313
`
`2
`
`
`
`-3-
`
`selects destination ports within the identified LBPG for
`
`respective frame forwarding.
`
`When functioning in a network bridge which supports
`
`Virtual Local Area Networks (VLANs),
`
`the present ACA utilizes
`
`5
`
`the VLAN identification.
`
`in conjunction. with.
`
`the MAC
`
`for
`
`address lockup in the cache.
`
`A cyclic redundancy code is generated for each address
`
`The CRC code is
`to be locked up in the cache by the ACA.
`used as the index for lockup within the cache. As noted,
`the
`
`cache is 4—way associative, meaning that there are four sets
`of addresses and data for each CRC code.
`If the thrash rate
`
`exceeds a predetermined threshold, software can rewrite the
`
`CRC table values,
`
`thus causing alternative CRC codes to be
`
`generated, and avoiding the high thrash rate problem.
`
`Finally, owing to the fact
`
`that
`
`the cache itself is
`
`accessed at most once out of every four clock cycles for a
`
`the presently disclosed ACA provides
`single address search,
`four cache lockup units, each being provided with access to
`
`the cache every four clock cycles.
`
`Each cache lockup unit
`
`consists of a cache lockup queue for storing the CRC code of
`
`the address to be searched, and a cache lockup controller for
`
`pulling codes from the queue concatenated with a set number
`from a least—recently-used table value for the code,
`and
`
`compares the network address associated with the code to that
`of
`the
`identified set.
`If
`the
`addresses match,
`the
`
`controller signals that the lookup is complete and moves on
`
`to the next code in the queue. Otherwise, when all sets are
`
`the controller reports the failure
`exhausted without a match,
`to match and moves on to the next code in the ueue.
`Thus,
`the presently disclosed ACA. ggggiés. network
`address lockup to permit frame processing atAhigh speed while
`enabling flexible modification_and diagnosis of the cache via
`software instituted functions.
`
`10
`
`
`
`25
`
`a/BO
`
`WEINGARTEN, SCHWGIN,
`GAGNEEIN & HAYES LLP
`TEL (617) 542-2290
`FAX (617) 451-0313
`
`3
`
`
`
`~4__
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`The invention will be more fully understood by reference
`to the following Detailed Description of
`the Invention in
`
`conjunction with/the drawings of which:
`
`Fig.
`l
`s pictorial diagram of
`a network device in
`accordance/with the present
`invention illustrating Network
`Interface Modules and a Motherboard mounted to a backplane;
`Fig. i is a Vblock diagrani of
`a network device in
`accordance with the present
`invention illustrating one
`
`Network Interface Module coupled to the Motherboard via a
`backplane;flf/
`Fig.
`3 is a block diagram of an Address Cache ASIC (ACA)
`
`according to the present invention,
`
`interfaced to an Address
`
`Cache, Frame Processor, and plural Network Interface Modules;
`
`Fig.
`
`4A illustrates the organization of at
`
`least
`
`a
`
`portion of the Address Cache of Fig.
`Cache;
`I
`W
`Fig.
`4B illustrates the organization of at
`least
`a
`portion of the Address Cache of Fig.
`3 configured as a Route
`
`3 configured as a Bridge
`
`I
`Cache;
`5 illustrates the format of words used to address
`Fig.
`the Address Cache of Fig. 3;
`Figs.
`6A,
`6B and 6C are block diagrams of various
`
`yr
`
`physical configurations of nemory devices comprising the
`Address Cache of Fig.
`3
`in conjunction the ACA;
`Figs. 2A; 7B and 7C collectively comprise a flow diagram
`the nethod of operation of
`the ACA and Address Cache
`of
`according tO/the present
`invention;
`a packet
`content of
`Fig.
`g
`illustrates the data
`transmitted from a Network Interface Chip (NIC), associated
`with a Network Interface Module,
`to the ACA;
`Figs.
`9A and 9B illustrate the data content of bridged
`routed frame packets
`transmitted.
`fronl
`the LACA
`to a
`
`and.
`
`Network Interface Chip (NIC);
`Fig. £0 illustrates bridge cache beat formats;
`Figs. Li and 12 illustrate route cache beat formats;
`
`Fig.
`
`i/
`
`illustrates specifics of various fields of Figs.
`
`lO
`
`
`
`
`25
`
`3O
`
`35
`
`WEINGARTEN, scams IN,
`GAGNEBIN & HAYES LIP
`:-TEL (617) 542-2290
`FAX (617) 451-0313
`
`4
`
`
`
`5
`
`10
`
`
`
`10 — 12; and 1/
`
`Fig. 1, provides a description of the status bits sent
`/
`from the ACA to a Network Interface Chip.
`
`DETAILED DESCRIPTION OF THE INVENTION
`
`Figs.
`
`1 and 2 Overview
`
`a bridge/router network
`1 and 2,
`Referring to Figs.
`device lOQfor use in a telecommunications network includes
`a motherboard 12 and at least one network interface module
`
`Each of the network interface modules 14 interfaces to
`14.
`the motherboard 12 through a backplane 16.
`Each network interface module 10 includes at least one
`
`input port
`
`18
`
`through. which. data units such. as
`
`frames,
`
`packets and cells are received and at least one output port
`
`20 through which data units are forwarded downstream for
`
`receipt by another network device.
`
`In particular,
`
`the ports
`
`provide connections via communication links to other devices
`
`in the network.
`
`Incoming data units may be bridged, routed,
`
`translationally routed or filtered.
`
`In one embodiment of
`
`the presently disclosed network
`
`device 10,
`
`four slots are provided for network interface
`
`modules 14.
`Each slot may be populated with an Ethernet,
`FDDI or
`an ATM UNI
`interface module.
`In a
`'preferred
`embodiment, each 10/100 megabyte Ethernet network interface
`
`25
`
`module 14 includes six input/output ports, each FDDI network
`
`interface module 14 includes six input/output ports, and each
`
`gigabit Ethernet network interface module 14 includes one
`
`input/output port. An ATM network interface module
`
`14
`
`30
`
`includes four 0C3 input/output ports or one OC12 input/output
`port.
`
` Elements in the motherboard 12 and interface modules
`
`14 are responsible for data unit reception and transmission,
`
`parsing of data link and. network layer headers within
`
`received frames,
`
`look—up of source and destination Media
`
`35
`
`Access Control
`
`("MAC”) and network layer addresses and for
`
`making forwarding decisions regarding the received frames.
`
`WEINGARTEN, SCHU'RGIN,
`GAGNEBIN & HAYES LLP
`TEL (5177 542-2290
`FAX (617) 451—0313
`
`5
`
`
`
`_6_
`
`The motherboard 12
`
`includes
`
`an address
`
`cache ASIC
`
`("ACA") 26 with an associated cache 28, a Frame Processor 30,
`
`an application processor 31 and a Master Buffer ASIC (”MBA")
`32.
`
`5
`
`The ACA 26 serves to perform look-ups on source and
`
`destination addresses (SAs and DAs) passed to the ACA from
`
`("RHP") within the respective
`a Receive Header Processor
`network interface modules 14.
`The ACA 26
`is capable of
`looking up MAC addresses for bridging support and Network
`
`10
`
`nr-
`
`Layer destination addresses for routing support.
`The MBA 32,
`located on the motherboard l2,
`serves to
`provide global data buffer management of frames which reside
`in. buffers in the Buffer RAM 22 disposed on respective
`Network Interface Modules 14.
` Each network interface module 14 includes a receive ASIC
`
`and a transmit ASIC, both of which are specific to the type
`of data traffic supported by the respective network interface
`module (such as Ethernet, ATM and FDDI).
`
` a hardware age table, searching the hardware address table
`
`f
`Address Cache ASIC (ACA) Overview
`The principle functions of the Address Cache ASIC (ACA)
`
`26 include maintaining hardware address tables, maintaining
`
`for
`
`layer
`
`2
`
`and layer
`
`3 addresses provided by network
`
`interfaces,
`
`returning address
`
`search results including a
`
`25
`
`destination. port
`
`identifier
`
`for
`
`received frames
`
`to the
`
`network interfaces, and providing a mechanism for software
`
`to manipulate the hardware address table and age table.
`A 4—way set associative cache 28 (discussed subsequently)
`employed to store network addresses.
`
`is
`
`30
`
`The main function of the ACA 26 is to look up network
`
`addresses
`
`in the cache
`
`28.
`
`The ACA receives packets
`
`containing two addresses, a source address and a destination
`
`The
`(NICs) 100.
`from the Network Interface Chips
`address,
`ACA 26 searches the cache 28 for each of these addresses and
`
`35
`
`responds to the Network Interface Module (NIM)
`
`l4, and more
`
`specifically to the respective NIC that sent the packet after
`
`the searches are completed, by assembling a response packet
`
`WEINGARTEN, SCHURC-IN,
`GAGNEBIN & HAYES LLP
`,TEL (617) 542-2290
`FAX (517‘; 451-0313
`
`6
`
`
`
`-7-
`
`The
`to the respective NIC 100.
`and forwarding the packet
`response packet indicates whether or not the addresses were
`
`the packet
`found in the cache 28 and, for each found address,
`includes data stored in association with the network address.
`
`A more detailed explanation of
`subsequently.
`
`this process
`
`is provided
`
`The ACA 26 also provides an interface for software to
`manipulate and diagnose the cache 28. A Frame Processor (FP)
`30 allows software to install and remove network addresses
`
`and associated data from the cache 28. Although the ACA does
`
`not
`
`learn addresses
`
`by
`
`itself,
`
`it
`
`provides
`
`hardware
`
`(PIO) register
`through Programmable Input/Output
`assistance,
`sets 102,
`in installing addresses in the cache at a rate of
`
`approximately 500K addresses per second.
`
`The PF 30 also
`
`provides a software path (soft path) accessible into an age
`
`RAM and hardware assistance during the aging process.
`
`speeds approximately'
`the frame
`The ACA. 26
`runs at
`to avoid a costly and complex
`reception rate (wire speed)
`queuing system for received frames.
`To achieve wire speed,
`
`the ACA handles 4 million packets per second from the NICs
`
`100.
`
`In contrast,
`
`received frames handled by software can
`
`be processed at approximately 1/2 million packets per second.
`Since each packet
`from a NIC 100 contains two addresses (SA
`
`and DA),
`
`the ACA 26 must be able to search the entire cache
`
`8 million times per second, or 4.5 cycles per search at an
`ACA clock rate of 37.5 MHZ.
`
`The
`
`cache
`
`28
`
`stores data
`
`link and network layer
`
`addresses. As described later,
`
`the cache can be configured
`
`as a bridge only cache or as a split bridge/route cache. The
`
`bridge cache store both data link layer SAs and DAs, gaile
`the route cache store network layer DAs and link layer-BAs:
`If a frame is intended for a destination address which
`
`is not
`
`in the cache 28,
`
`the frame is transmitted out all
`
`ports under software control.
`
`If a reply frame is received
`
`from the device having the uncached MAC address,
`
`the port
`
`associated with the previously unknown DA address becomes
`known and the frame is forwarded via a high—speed hardware
`
`lO
`
`
`
`
`
`25
`
`3O
`
`35
`
`WEINSARTEN, SCHURC-IN,
`GAGNEBIN & HAYES LL?
`WTEL (6177 5é2-229C
`FAX (517} 45L-0313
`
`7
`
`
`
`-8—
`
`.
`
`forwarding path within the device 10 since the reply frame
`destination address
`(the original source address)
`is known
`
`and included in the cache.
`
`Software updates the cache to
`
`the station having the original,
`include the address of
`unknown destination address (the subsequent source address).
`Software also updates the cache 28 if the ACA 26 determines
`
`that the source address data received does not agree with
`the source'address data stored in the cache. This situation
`
`may arise in the event a device is moved such that frames it
`
`originates are received at a different port of the device 10.
`The ACA 26 also makes
`a determination of whether
`the
`
`destination address and the source address are on the same
`
`indicating this
`if so, constructs a packet
`and.
`segment,
`situation to the RFP 46, which will
`inhibit retransmission
`of the frame.
`
`In a preferred embodiment,
`the address cache 28 is 4—way
`set associative, can vary in depth up to a maximum of 16K
`rows, and is 88 bits wide.
`Each cache row contains four
`
`cache entries,
`
`one per set.
`
`The
`
`cache
`
`28
`
`itself
`
`is
`
`implemented in SRAMS and the ACA 26 supports both X8 and X16
`devices.
`For a cache organized as bridgeionly cache,
`there
`are two storage elements or "beats" per bridge cache set
`
`(Fig. 4A).
`
`For a bridge—route cache,
`
`there are four beats
`
`per route cache set
`
`(Fig. 4B).
`
`The specific format of these
`
`beats are found in Figs. 10, 11 and 12.
`
`Fig.
`
`13 provides
`
`details of what is stored in various fields in Figs. 10, 11
`and 12.
`
`At a high level, a cache lookup occurs as follows.
`address to be searched (MAC SA, MAC DA, or network DA)
`
`An
`is
`
`received by the ACA 26.
`
`A Cyclic Redundancy Code
`
`(CRC)
`
`process is performed by a CRC engine 104 on the address to
`
`generate a CRC. This code is then used to identify a cache
`
`row. There being four sets associated with each cache row,
`
`algorithm to
`the ACA uses
`a Least Recently‘ Used
`(LRU)
`and a valid
`identify a set order for address comparison,
`table is referenced to identify whether any of the sets are
`
`invalid.
`
`From the latter two values, a most likely, valid
`
`lO
`
`
`
`25
`
`30
`
`35
`
`WEINGARTEN , SCEURGIN,
`GAGXEBIN & HAYES LLP
`TEL (617) 542-2290
`FAX (617) 451-0313
`
`8
`
`
`
`-9-
`
`3‘
`
`set
`
`in the identified cache row is chosen, and the address
`
`value stored therein is compared against
`
`the address from
`
`which the CRC was generated.
`
`If a match occurs,
`
`the data
`
`stored in conjunction with that address in the cache set is
`
`5
`
`If no
`retrieved and returned to the respective NIC 100.
`match occurs,
`the next valid set in the row is selected and
`
`compared to the received address. When all valid sets have
`been searched without
`a match,
`an indication thereof
`is
`forwarded to the frame processor 30 so that
`the received
`
`10
`
`frame can be processed under software control.
`
`
`
`
`
`
`
`25
`
`With reference to Fig. 5,
`
`the cache address generated
`
`by the ACA 26 for the purpose of addressing the cache 28 has
`
`A CRC generated from the address of interest
`three parts.
`is known as a "hash."
`The hash serves as a row index into
`
`the cache, and identifies a row of four sets, i.e. sets 0—3
`
`(see also Figs. 4A and 4B).
`
`The set index identifies one of
`
`A. Cache Line Index (CLI)
`the row.
`four cache sets in.
`identifies a line (or "beat") withinra cache set; the CLI is
`one bit
`for the bridge cache and two bits for the route
`
`cache.
`
`For bridge/route addresses, each half of the cache
`
`28 is addressed differently. The most significant bit of the
`cache address indicates whether the address is for the bridge
`or route cache searching.
`
`The ACA.26 accesses the cache 28 once per cycle.
`
`In the
`
`the ACA must read and compare to the address beat
`worst case,
`of all
`four
`sets before
`a match
`(or
`"hit”)
`occurs.
`
`Subsequent to a hit occurring, a cycle is needed to read the
`
`associated data beat,
`
`for a total of five cycles.
`
`This is
`
`more than the 4.5 cycle maximum required to operate at wire
`
`30
`
`speed. However,
`
`if the hit occurs on the first, second or
`
`third compare,
`
`the total time is up to 4 cycles, below the
`
`maximum.
`
`On average,
`
`the search time will be below the 4.5
`
`cycle threshold.
`
`However,
`
`the presently disclosed ACA 26 is provided_with
`
`35
`
`The number of addresses queued up to be
`a fallback option.
`searched is monitored, and if it reaches a given threshold,
`
`the ACA 26 operates in a degraded mode where the maximum
`
`WEINGARTEN, SCHURGIN,
`GAGNEBIN & HAYES LLP
`TEL (617) 54272290
`FAX {617) 451-0313
`
`9
`
`
`
`5
`
`10
`
`
`
`
`
`
`
`
`25
`
`—lO—
`
`”
`
`number of sets searched is reduced.
`
`When the number of
`
`enqueued searches falls below the threshold again,
`
`the normal
`
`search mode involving the search of up to all four sets is
`resumed.
`
`While each route address set
`
`is provided with four
`
`beats,
`
`two for address and two for data, not every lookup of
`
`a route address will require comparison against both address
`
`If the route network layer destination address is
`beats.
`less than ten bytes in length,
`the second address beat is not
`
`used. Both data beats are used regardless of the length of
`the address compared.
`
`For a short route address
`
`(less than ten bytes),
`
`the
`
`worst case search requires four address compares,
`
`and two
`
`cycles for data retrieval. All route frames have a MAC layer
`
`SA which has a worst case search of five cycles, as above.
`
`Therefore,
`
`the ACA 26 processes route frames with. short
`
`addresses in eleven cycles, worst case.
`In degraded mode,
`limiting the number of sets searched per row,
`the ACA 26 does
`
`three searches on the MAC layer SA and three searches on the
`network layer DA.
`The maximum search time is then reduced
`
`to an acceptable nine cycles.
`
`For
`
`long route addresses
`
`(eleven bytes or more),
`
`the
`
`worst case search involves two address beat reads per set
`(eight cycles) plus two data reads,
`for
`a
`total of
`ten
`cycles.
`Five more cycles are added by the MAC SA address
`search for a overall total of fifteen cycles.
`In degraded
`
`mode,
`
`three searches are performed at the MAC SA layer, and
`
`one
`
`search is performed for
`
`the network layer DA.
`
`The
`
`maximum search time,
`
`in degraded mode,
`
`for a long route
`
`30
`
`address,
`
`is therefore eight cycles.
`
`The ACA 26 implements a LeaSt Recently Used (LRU) policy
`
`for replacing and searching cache lines.
`
`An LRU table 106,
`
`stored in internal RAM, provides entries for each cache line.
`
`Each LRU table 106 entry provides an ordered series of four
`
`35
`
`numbers, zero through three,
`
`indicating the most recently
`
`used set
`
`in the respective line through the least recently
`
`used set of the same line.
`
`For instance, a value of "2013"
`
`WEINGARTEN, SU-X'JRGIN,
`GAGNEBIN & HAYES LLP
`TEL (517) 542-2290
`FAX (617) 451-0313
`
`10
`
`10
`
`
`
`-11-
`
`for
`
`a given cache line indicates that set
`
`"2" was most
`
`recently used, and set "3" was least recently used. When all
`
`four sets in a row are full, and the frame processor 30 is
`
`going to install a new address in one of the sets,
`
`the set
`
`5
`
`that was least recently used is chosen for replacement.
`
`In
`
`10
`
`
`
`25
`
`the foregoing example,
`
`the address and related data in set
`
`"3" would be overwritten. When a set, which is not the most
`
`the respective LRU bits are swapped
`is hit,
`recently uSed,
`with the previous set in the order.
`For example,
`if the LRU
`code is 2013 and a hit occurs on set 3,
`the set LRU order
`then becomes 2031.
`When
`a set
`is installed. via frame
`
`processor 30 intervention, it becomes the most recently used
`
`set.
`
`So if the LRU for a cache line is initially ”2013” and
`
`a new address and data are installed in set ”3”,
`
`the LRU
`
`entry for this line becomes ”3201". Finally, when a set is
`
`removed, it becomes the least recently used set.
`The LRU bits are also used to determine how the sets
`
`within the cache are searched.
`
`The set
`
`that has been most
`
`recently seen is searched first. The least recently searched
`set is tested last.
`The ACA stores VALID bits for each of
`
`the cache 28 entries in the same internal RAM as the LRU
`
`table.
`
`The VALID bits are used to optimize the address
`
`lookups.
`
`The VALID bits are read before the cache lockup
`
`begins,
`
`to provide an indication of which sets are available
`
`for searching. After eliminating sets according to the VALID
`bits, sets are searched according to the LRU bit order.
`The size of the cache is variable in the number of cache
`
`rows only.
`
`The size of the cache 28 is conveyed to the ACA
`
`26 via the PIO register sets 102;
`
`the ACA 26 uses the cache
`
`30
`
`size information to mask out hash bits from the CRC engine
`
`104.
`
`The LRU and VALID tables 106, 108 in the ACA 26 are
`
`directly related to the cache 28 size.
`
`The cache 28 is physically configured in one of three
`
`ways in the first embodiment,
`
`though other configurations are
`
`35
`
`also envisaged.
`
`These three configurations are shown in
`
`Figs.
`
`6A,
`
`6B and 6C.
`
`Each configuration supports multiple
`
`SRAM depths.
`
`The cache 28 configuration information.
`
`is
`
`WEINGARTEN, SCHURGIN,
`GAGNEEIN 5‘ HAYES LL?
`TEL (517‘, 542*2293
`FAX (517) 451-0313
`
`11
`
`
`
`-12-
`
`8
`
`required by the ACA 26 to properly generate SRAM addresses
`
`The ACA performs this by recognizing
`and control signals.
`the number of SRAM banks and the total depth of the cache in
`
`the PIO register sets 102.
`
`5
`
`The ACA 26
`
`is implemented in VHDL code in a first
`
`embodiment,
`
`though.other hardware implementations of the same
`
`functions
`
`described herein are
`
`employable
`
`in
`
`further
`
`embodiments.
`
`A block diagram of the ACA 26 is presented in
`
`Fig. 3.
`
`Each of four Network Interface Modules
`
`(NIMs)
`
`14
`
`10
`
`provides
`
`up
`
`to two Network Interface Chips
`
`(NICs)
`
`100
`
`
`
`selected according to the network which will be connected
`
`thereto. Exemplary networks include Ethernet, FDDI, and ATM.
`
`In.a first embodiment of the present invention,
`
`the interface
`
`elements illustrated in Fig. 2 associated with the NIM 14 are
`
`collectively disposed on a respective NIC 100,
`
`and up to
`
`eight NICs
`
`100
`
`are
`
`interfaced to the ACA 26
`
`on
`
`the
`
`motherboard 12.
`
`Four‘ bit connections are employed for
`
`conveying received frame information to the ACA 26 from each
`NIC 100.
`The ACA 26 controls each of
`these interfaces
`
`separately such that
`
`the eight NICs 100 are able to send
`
`frames simultaneously.
`
`The ACA.26 itself is comprised of a number of functional
`
`illustrated
`3.4% The
`shown in Fig.
`as
`circuit elements,
`elements are with respect to a single NIC 100, except where
`
`25
`
`noted herein.
`
`These elements are responsible for receiving
`
`information taken
`
`from a
`
`received frame,
`
`using
`
`this
`
`information to lookup data in the cache 28, and for returning
`
`the looked up data to the source NIC 100.
`
`In conjunction with the block diagram of Fig.
`
`3 and the
`
`7A,
`flow diagrams of Fig.
`frame
`information through.
`
`7B and 7C,
`the ACA.
`26
`
`flow of
`the typical
`is now described.
`
`the
`the NIC 100 strips information out of
`Specifically,
`received frame and creates a packet to be sent to the ACA 26.
`
`These
`
`packets,
`
`illustrated in Fig.
`
`8,
`
`contain
`
`such
`
`information as Protocol ID (identifying the routing protocol
`of
`the received frame),
`source address
`and. destination
`
`address, all taken from the received frame header.
`
`The ACA
`
`30
`
`35
`
`WEINGARI‘EN, SCHURGIN,
`GAGNEBIN & HAYES LLP
`TEL (517) 542-2290
`FAX (617) 45170313
`
`12
`
`12
`
`
`
`-13..
`
`m
`
`26 monitors the four bit input interface 110 from the NIC 100
`waiting for a non—zero value indicative of the start of a
`
`transfer from the NIC 100 to the ACA 26
`
`(step 200).
`
`The
`
`input
`
`is received from a Receive Header Processor
`
`(RHP) 46
`
`5
`
`which is responsible for examining the frame header
`
`to
`
`identify the frame encapsulation and the routing protocol,
`
`:The RHP 46 also constructs a packet, discussed
`if any.
`subsequently,
`including the SA, DA and protocol
`ID derived
`
`from the received frame.
`
`The ACA 26 knows the exact length
`
`10
`
`of the transfer so it knows when the transfer is complete and
`
`the next transfer can follow immediately without intervening
`
`
`
`25
`
`30
`
`35
`
`idle cycles.
`
`The packet of
`
`information is transferred from the NIC
`
`100 to a network interface register file 112 in the ACA as
`
`it comes in.
`
`The format of such a packet is illustrated in
`
`Fig. 8.
`
`The ACA 26 does not wait for the entire packet
`
`to
`
`As shown
`arrive before starting to operate on the packet.
`in Fig. 8, each packet has two addresses,
`the first always
`being a MAC layer Source Address
`(SA) and the second being
`
`either a MAC layer Destination Address
`
`(DA) or a network
`
`layer DA. The ACA 26 splits the input packet into two cache
`
`lookup operations, one for the SA and one for the DA.
`As the SA is received (step 202), it is applied to a CRC
`
`generator
`
`(CRC ENG)
`
`104
`
`(step 204)
`
`and a 16—bit CRC is
`
`The CRC is done four bits at a time
`generated (step 206).
`to match the arrival rate of the address so that clock cycles
`
`are
`
`not wasted.
`
`The
`
`CRC
`
`result
`
`is
`
`conjoined with
`
`configuration information from the PIO register sets and the
`
`The
`ID from the NIC 100 to form a "hash" value.
`protocol
`hash is used to select the cache row in which to search for
`
`the SA.
`
`The SA,
`
`the hash,
`
`the LRU value for the hash from
`
`the LRU table (LRU TEL) 106, and the VALID value for the hash
`
`from the VALID table (VLD TEL)
`
`108 are packetized by a
`
`packetization engine 114 (step 208). This packet is placed
`in_a cache lookup queue 116 for further processing by a cache
`
`lookup controller 118 (step 210).
`The destination address next becomes available from the
`
`WEINGARTBN, SCHURGIN,
`GAGNEEIN & HAYES LL?
`”TEL (517) 542-2290
`FAX (517) 451-0313
`
`13
`
`13
`
`
`
`-14-
`
`NIC 100 in the network interface input register file 112
`
`(step 212).
`
`The CRC engine 102 operates on the DA.
`
`to
`
`generate its hash value (step 214) and another cache lockup
`
`packet is created (step 216) and stored in the cache lockup
`
`queue 116 (step 218), awaiting processing by the cache lockup
`controller 118.
`
`The
`
`cache
`
`lockup queue
`
`116
`
`and
`
`the
`
`cache
`
`lockup
`
`controller 118 collectively comprise the cache lockup unit
`120. The controller 118 is responsible for stripping packets
`
`off the cache lockup queue 116 (step 220) and carrying out
`
`the cache lockup. There are four cache lockup units 120 in
`
`the ACA 26, one for every two NICs 100, or in other words,
`
`one for every NIM 14.
`
`In Fig. 3,
`
`the input
`
`to the cache
`
`lockup
`cache
`to the
`the output
`and
`116
`lockup queue
`controller 118 are illustrated, and are connected to another
`
`packetization
`
`engine
`
`and
`
`output
`
`packet
`
`assembly,
`
`respectively.
`
`The cache lockup controllers 118 are time
`
`sliced into the cache 28 such that each gets access to the
`cache 28 every four clock cycles.
`Four cycles are required
`
`for each lockup.
`
`In the first clock cycle, a cache 28 read
`
`is performed by the cache lockup controller 118 based upon
`
`hashed address information retrieved from the cache lockup
`
`queue
`
`116.
`
`In the
`
`second
`
`clock cycle,
`
`the
`
`address
`
`information from that
`
`location is loaded into a register
`
`along with the address from the cache lockup queue 116.
`
`In
`
`the third clock cycle,
`
`these two addresses are compared. The
`
`results are processed by the cache lockup controller in the
`
`fourth clock cycle,
`
`including making the decision of whether
`
`to forward lockup results to the output packet assembly 122
`
`and whether there is another address enqueued in the cache
`
`lockup queue 116. Each of the four cache lockup controllers
`
`118 are 90 degrees out of phase. Thus,
`
`if all of the cache
`
`lockup queues
`
`116 have addresses
`
`to be
`
`looked. up,
`
`pipeline to the cache 28 will be
`
`fully’ utilized.
`
`the
`
`For
`
`purposes of simplicity and the maintenance of
`
`frame order,
`
`there is no sharing of unused clock cycles between cache
`
`lockup units 120 in the presently disclosed embodiment.
`
`lo
`
`
`
`
`25
`
`3O
`
`35
`
`WEINGARTKN, SCHU'RGIN,
`GAGNEEIN & HAYES LLP
`TEL (617) 542-2290
`FAX (617) 451-0313
`
`14
`
`14
`
`
`
`-15_
`
`The controller 118, during its allotted clock cycle
`
`(step 224),
`
`reads the cache (step 226) using the hash value
`
`stored in the queue 116 concatenated with the set number
`
`generated using the LRU table 106 and VALID tables 108 values
`
`(step 222)
`
`for the address to be searched.
`
`If the address
`
`parsed from the frame header and the address stored in the
`
`identified by the cache lookup queue 116 output match
`set
`(step 228),
`the data associated with that set is read out by
`the cache lookup controller 118 (step 230) and the controller
`
`118 signals that the lookup is complete.
`If the addresses
`do not match,
`the next set
`(step 232),
`identified by the LRU
`and VALID values,
`is searched. When all sets are exhausted
`
`without an address match,
`
`the cache lookup controller 118
`
`signals that the lookup is complete and moves on to the next
`
`address to be searched in the cache lookup queue 116 (step
`234).
`
`Similar to the input side,
`
`the ACA 26 has a four bit
`
`the eight NICs 100 for sending the
`connection to each of
`search results to the RHP 46 and REP 48. The ACA 26 controls
`
`each of these interfaces separately so that output packets
`
`can be sent to the NICs 100 simultaneously.
`
`”
`
`With respect again to Fig. 3, each cache lookup unit 120
`
`is bound to two NIC 100 output interfaces,
`
`the same two that
`
`are connected as inputs to the unit 120. This avoids frame
`
`re—ordering or the need for complex timerstamping to avoid
`
`re—ordering.
`The ACA collects the results from each of the SA and DA
`
`lookups, packages them in an output packet, and sends them
`to the appropriate NIC 100.
`When
`a 'cache lookup is
`completed,
`the cache lookup controller 118 signals one of the
`
`two output packet assemblies 122 bound to it (step 236). The
`
`proper assembly 122
`
`is identified by a bit
`
`in the cache
`
`lookup packet that identifies which input interface the frame
`arrived on.
`
`When
`
`the SA lookup is complete,
`
`the output packet
`
`assembly 122 starts filling the network interface output
`
`register file 124 with information that identifies the frame
`
`10
`
`
`
`
`
`
`25
`
`3O
`
`35
`
`WEZNGARTEN, SCEURC-IN,
`GAGNEBIN & HAYES LLP
`' TEL (617) 542-2290
`FAX (517‘, 45143313
`
`15
`
`15
`
`
`
`..16_
`
`to the NIC 100
`
`(step 238).
`
`The SA lockup status is then
`
`written to the register file 124 and,
`
`if the lookup was
`
`successful, associated data fron1the cache 28 is subsequently
`written to the register file 124 (step 240). When the source
`
`5
`
`address portion of the output packet is complete,
`
`the output
`
`10
`
`
`
`25
`
`packet assembly 122 waits for the DA lockup to complete (step
`242) .
`
`the output packet
`the DA