throbber
I|l|||||||||||||||||||||||||||||||||l|||||||||||||||||||||||||||||||l||||||
`
`US0061 1537873
`
`United States Patent
`Hendel et al.
`
`[19]
`
`[11] Patent Number:
`
`6,115,378
`
`[451 Date of Patent:
`
`Sep. 5, 2000
`
`MULTI-LAYER DISTRIBUTED NETVVORK
`ELEMENT
`
`OTHER PUBLICATIONS
`
`International Search Report, PC.‘T7US 98713203.
`Microsoft Press, “Microsoft Computer Dictionary Fourth
`lfidilion”, Microsoft Corporation, 1999, 4 pages.
`International Standard IS07ll:‘.C 10038, ANS17lEljlE Std 802.
`ID, First Edition, 1993.
`“Load Balancing for Multiple Interfaces for Transmission
`Control Prtotocol71nternet Protocol
`for VMIMVS”,
`IBM
`Technical Disclosure Bulletin, 38(9): 7-9 (Sep., 1995}.
`T. Nishizono ct al., “Analysis on a Multilink Packet Trans-
`mission System”, Elcctron. Commun. JPN 1, C‘ommun.,
`(USA), 68(9): 98-104 (Sep., 1985).
`International Search Report, PC'I'7US 98713380.
`“IP On Speed“, Erica Roberts, lnternet—Dra[t, Data Corn-
`Irtunications on the Web, Mar. I997, 12 pages.
`"Multilayer Topology”, White Paper,
`lnternet—Draft, 13
`pages, downloaded from website http:77wwwhaynetwork-
`s.oom on Apr. 18, 1997.
`
`(List continued on next page.)
`
`Printary Exannner-—lluy D. Vu
`Arrorrtey; Agent’, or Ft'nn—Blakeiy Sokolofl‘ Taylor &
`Zafman
`
`[57]
`
`ABS'1‘R/\C'I”
`
`A distributed multi—laycr network clement delivering Layer
`2 (data link layer) wire-speed performance within and across
`subnctworks, allowing queuing decisions to be based on
`Layer 3 (network layer} protocol and endstalion information
`combined with layer 2 topology information. The network
`element performs packet
`relay functions using multiple
`switching subsystems as building blocks coupled to each
`other to form a larger switch that acts as both a router and
`a bridge. Each switching subsysterrt
`includes a hardware
`forwarding search engine having a switching element
`coupled to a forwarding memory and an associated memory.
`The switching subsystems and their fully meshed intercon-
`nection allow the network element to scale easily without
`compromising packet forwarding speed and without signifi-
`cantly incrcasing the storage requirements of each forward-
`ing memory.
`
`I541
`
`I751
`
`I731
`
`Inventors: Ariel Hcndel, Cupcrtino; Sliimon
`Muller, Sunnyvale, both of Calif.
`
`Assignee: Sun Mierosystems, Inc., Mountain
`View, Calif.
`
`[31]
`
`App}. No; 087884319
`
`[22]
`
`[511
`[521
`I58]
`
`[561
`
`Filed:
`
`Jun. 30, 1997
`
`Int. Cl.7 ............................. .. H04] 3702; H04-L 1302
`U.S. Cl.
`........................................... .. 3707392; 3707400
`Fiat-Id of Search ................................... .. 3707400, 401,
`3707402, 403, 404, 405, 389, 392, 351,
`410, 466, 467, 469, 409; 395720068, 200.72,
`200.73, 200.74, 200. 79, 200.8, 200.5
`
`References Cited
`
`U.S. 1’/\l‘EN'I' DOCUMENTS
`
`971985 1.)cBt'tIIer ............................... .. 3647200
`4,539,637
`4,627,052 1271986 I-loarc et al.
`3707402
`4,641,302
`271987 Miller
`.... .. 370760
`4.737.953
`471988 Koch el al.
`.. 3707401
`5,130,977
`771992 May et al.
`.... .. 370760
`5,159,685
`1071992 Kong
`3957575
`5,163,046
`1171992 Hahnc et al.
`3711779
`..
`5,309,437
`571994 Perlnian ct al.
`3407827
`..
`5,365,514 1171994 Hershey et al.
`.. 370,717
`5,402,415
`371995 Tumer ..................................... .. 370760
`5,420,862
`571995 Perlman .
`5,425,026
`671995 Mori ........................................ .. 370760
`5,490,260
`271996 Miller et al.
`.......................... .. 3957427
`5,493,564
`271996 Mulian .................................... .. 370754
`5,500,860
`371996 Pcrlrnan ct al.
`..
`370785.13
`5,509,123
`471996 Dobbins ct al.
`..
`395720051
`5,517,488
`571996 Miyazaki ct al.
`.... .. 370716
`5,550,816
`371996 Hardwiek et al.
`.. 370760
`5,553,067
`971996 Walker et a1.
`.... .. 370760
`5,557,610
`971996 Calanwokis el al.
`3707601
`5,563,878
`1071996 Blakeley et al.
`..
`.... .. 370760
`5,566,170 1071996 Bakkc 61 al.
`..
`.... .. 370,-'60
`5,574,861
`1171996 Lorvig et al.
`..................... . 39572(l0.0t)
`
`
`
`.
`
`
`
`(List continued on next page.)
`
`25 Clnims, 4 Drawing Sheets
`
`
`
`1
`I
`
`
`
`"‘
`
`'
`
`'3l¢E|
`5wI‘.:.'III'.fi
`I
`.
`A,,_,,.
`Elrmrttt
`:
`H
`l
`t.__.
`. ____ __
`//
`r
`
`To I'.ud.cs and rl|..i.<IMii5e|x
`
`ill;
`
`l
`:
`:
`
`/
`
`I
`
`|
`:
`I :11’
`t.__
`
`MW‘
`I
`'7'
`51:-'|Ii:huu;
`Mn-:t
`.
`:
`gm]
`Ell'!l’:rlI'.
`I
`___—. ——.g——
`
`Q
`
`\_
`
`2- 2::
`
`1
`
`ARISTA 1007
`ARISTA 1007
`
`

`
`6,115,378
`Page 2
`
`US. PATENT DOCUMENTS
`
`....................... .. 3711,1395
`1171993 Bonomi el al.
`1271998 Chin ...................................... .. 37137401
`171999 Yang cl al.
`........................... .. 3707411
`H1999 parks ______ __
`‘_ 3793395
`31999 Tsukamom _
`_ 395,131 1
`291999 chin
`3739393
`2,.r1999 Mcmmcn ,3; ;,|_
`395,r132_g2
`271999 Kirk
`.. 7117129
`‘_ 375,397
`3171999 Casey
`
`571999 Marimmhu ‘
`I)5l.f2(]]_79
`..
`471999 Suzuki el 31.
`395120943
`471999 Harriman et 3|.
`.................... .. 3737390
`3J.r[999 Va"-“ae1a|_
`37[;r,r395
`1
`OTHER PUBLICATIONS
`
`
`
`5,838,681
`5,852,607
`5,355,977
`5359349
`5:357:67}
`5,372,}-33
`5,372,994
`5,375,454
`5,373,043
`5,373,232
`5,392,912
`5,393,637
`539313930
`
`“Foundry Producls”, downloaded from Website hllp:;’7'ww-
`w.fcIundrym:l.c0m1’ on Jun. 19, 1997.
`Anlhony J. McAuley 8; Paul Francis, “Fasl Routing Table
`I-09kUP‘ Using CAMS”, IEEE, 1993,91?‘ 1333-1390‘
`“G1gab1l E1hemc1", Network Stralegy Report, The Burton
`Group, v2, May 8, 1997 40 pages.
`
`(cid:21)
`
`371997 Daiet=d- -------------------------- - 393200-17
`5,615,340
`411997 Gallagher e: :11.
`.................... .. 3711.-1394
`5,619,497
`471997 Cutlon el al.
`37157381
`5,623,489
`571997 Mandniel al.
`3647514
`5,633,710
`5,689,506 1171997 Chiussi not al.
`37117388
`5,689,518 1171997 (ialand ctal.
`.. 371737.}
`5,724,348
`371998 Basso et 21].
`3707384
`.
`5,734,651
`371998 Blakeley et al.
`370.3392
`53487631
`511998 Bcrganiiflo cl 91'
`3701393
`5,751,971
`5f1998 Dobbins er al-
`-
`- 3953200-68
`5,754,774
`511998 Bitlinger el :11.
`3951-"2OU.33
`5,784,559
`7.11998 Frazier et 3|.
`................... .. 395.-‘Z0013
`5,302,273
`971993 Isfeld el 51.
`. 399200.112
`5,812,527
`971998 Kiine el al.
`.. 3713,1232
`5,815,737
`771998 Buckland
`395M15
`__ 370,409
`..
`5,822,319 1071998 Nagami el al.
`5,825,767 10;’1998 l\«'1iz11koshi el :11.
`570,395
`.....................9: 37373915
`5,825,772
`1071993 Dobbins cl :1].
`5_,335_.491
`1171993 13391»; el al.
`.......................... .. 37117335
`5,838,677 1171998 Kozaki er al.
`........................ .. 3‘.-‘D7389
`
`
`
`..
`..
`
`
`
`

`
`QMU
`
`._mma
`
`MS
`
`M
`
`1cm,
`
`873
`
`E
`
`Pmaoniumm
`
`HH550m
`
`WSm
`
`BEDn_o§8n_000000
`
`(cid:22)
`
`

`
`U.S. Patent
`
`Sep. 5, 2000
`
`Sheet 2 of4
`
`6,115,378
`
`EN
`
`can
`
`_m.::mU
`
`x5552
`
`wcfiuurfim
`
`Emfiflm
`
`~
`
`W\pm!\ll
`
`k
`
`(cid:23)
`
`awNmmDUE
`
`m:o:fim.fi:....Vimmats:OF
`
`
`
`
`

`
`U.S. Patent
`
`M
`
`4
`
`00HI3,51J,6
`
`
`
`.5oEm_2wBw_uomm<
`
`
`
`GSMQnflfluommwa
`
`
`
`%_oE£>_m§§E£
`
`QEDHmvmmiv
`
`
`
`aw.
`
`xhcm%can.tomfishumwflmesommmm_UA305
`
`
`
`..mwDmundomn:BNREC.
`
`W.hummwmEmQmMatom8.5.....mm
`
`mmMDUE
`
`@an
`
`(cid:24)
`
`
`
`

`
`cmU
`
`mm3P9:.
`
`D;mm
`
`2
`
`6,115,378
`
`4WWuuuun:IanM_EH.0...mmmmHm_5.32mmmmHmm_Ewuommmw_.
`
`Ewfiflm_Ewfiflm_w:Eu:>>m_E32w§EB.r$m_6mNH_ms}.muamNH:Eu_
`ruuuuuuIL
`
`ofiu
`
`(cid:25)
`
`wMMDUE
`
`
`

`
`6,115,378
`
`1
`M ULTI-LAYFJR DISTRIBUTED N ETWORK
`l'*}I.I£MEN'I'
`
`BACKGROUND
`
`1. Field of the Invention
`
`I0
`
`15
`
`30
`
`40
`
`‘this invention is generally related to communication
`between computers using a layered architecture and, more
`specifically, to a system and method for forwarding packets
`using multi-layer inforrnation.
`2. Description of the Related Art
`Communication between computers has become an
`important aspect of everyday life in both private and busi-
`ness environments. Computers converse with each other
`based upon a physical medium for transmitting the messages
`back and forth, and upon a set of rules implemented by
`electronic hardware attached to and programs running on the
`computers. These rules, often called protocols, define the
`orderly transmission and receipt of messages in a network of
`connected computers.
`A local area network (LAN) is the most basic and simplest
`network that allows communication between a source com-
`puter and destination computer. The LAN can be envisioned
`as a cloud to which computers (also called end stations or
`end-nodes) that wish to communicate with one another are “
`attached. At least one network element will connect with all
`of the end stations in the LAN. An example of a simple
`network element is the repeater which is a physical layer
`relay that forwards bits. The repeater may have a number of
`ports, each end station being attached to one port. The
`repeater receives bits that may form a packet of data that
`contains a message from a source end station, and blindly
`forwards the packet bit-by-bit. The bits are then received by
`all other and stations in the LAN, including the destination.
`A single LAN, however, may be insufficient to meet the ‘
`requirements of an organization that has many end stations,
`because of the limited number of physical connections
`available to and the limited message handling capability of
`a single repeater. Thus, because ofthese physical limitations,
`the repeater-based approach can support only a
`limited
`number of end stations over a limited geographical area.
`The capability of computer networks, however, has been
`extended by connecting different subnetworks to form larger
`networks that contain thousands of endstations communi-
`cating with each other. These LANS can in turn be connected
`to each other to create even larger enterprise networks,
`including wide area network (WAN) links.
`To facilitate communication between subnets in a larger
`network, more complex electronic hardware and software
`have been proposed and are currently used in conventional
`networks. Also, new sets of rules for reliable and orderly
`com munication among those end stations have been defined
`by various standards based on the principle that
`the end
`stations interconnected by suitable network elements deline
`a network hierarchy, where end stations within the same
`suhnet have a common classification. A network is thus said
`to have a topology which defines the features and hierar-
`chical position of nodes and end stations within the network.
`The interconnection of end stations through packet
`switched networks has traditionally followed a peer-to-peer
`layered architectural abstract. In such a model, a given layer
`in a source computer communicates with the same layer of
`a pier end station (usually the destination) across the net-
`work. By attaching a header to the data unit received from
`a higher layer, a layer provides services to enable the
`operation of the layer above it. A received packet will
`
`50
`
`55
`
`60
`
`65
`
`(cid:26)
`
`2
`typically have several headers that were added to the origi-
`nal payload by the different layers operating at the source.
`There are several layer partition schemes in the prior art,
`such as the Arpanet and the Open Systems Interconnect
`(OSI) models. The seven layer OSI model used here to
`describe the invention is a convenient model for mapping
`the functionality and detailed implementations of other
`models. Aspects of the Arpanet, however, (now redefined by
`the Internet Engineering Task Force, or IETF) will also be
`used in specific implementations of the invention to be
`discussed below.
`
`layers for background purposes here are
`The relevant
`Layer
`1
`(physical), Layer 2 (data link), and Layer 3
`(network), and to a limited extent Layer 4 (transport).Abrief
`summary of the functions associated with these layers
`follows.
`
`The physical layer transmits unstructured bits of infor-
`mation across a communication link. The repeater is an
`example of a network element that operates in this layer. The
`physical layer concerns itself with such issues as the size and
`shape of connectors, conversion of bits to electrical signals,
`and bit-level synchronization.
`Layer 2 provides for transmission of frames of data and
`error detection. More importantly,
`the data link layer as
`referred to in this invention is typically designed to “bridge,"
`or carry a packet of information across a single hop, i.e., a
`hop being the journey taken by a packet in going from one
`node to another. By spending only minimal time processing
`a received packet before sending the packet
`to its next
`destination, the data link layer can forward a packet much
`faster than the layers above it, which are discussed next. The
`data link layer provides addressing that may be used to
`identify a source and a destination between any computers
`interconnected at or below the data link layer. Examples of
`Layer 2 bridging protocols include those defined in IEEE
`802 such as CSMAICD, token bus, and token ring (including
`Fiber Distributed Data Interface, or FDDI).
`Similar to Layer 2, Layer 3 also includes the ability to
`provide addresses ofcomputers that communicate with each
`other. The network layer, however, also works with topo-
`logical information about the network hierarchy. The net-
`work layer may also be configured to "route" a packet from
`the source to a destination using the shortest path. Finally,
`the network layer can control congestion by simply dropping
`selected packets, which the source might recognize as a
`request to reduce the packet rate.
`Finally, Layer 4, the transport layer, provides an applica-
`tion program such as an electronic mail program with a "port
`address” which the application can use to interface with
`I.ayer 3. Akey difference between the transport layer and the
`lower layers is that a program on the source computer carries
`a conversation with a similar program on the destination
`computer, whereas in the lower layers,
`the protocols are
`between each computer and its immediate neighbors in the
`network, where the ultimate source and destination endsta-
`lions may be separated by a number of intermediate nodes.
`Examples of Layer 4 and Layer 3 protocols include the
`Internet suite of protocols such as TCP {Transmission Con-
`trol Protocol) and IP (Internet Protocol).
`Endstations are the ultimate source and destination of a
`packet, whereas a node refers to an intennediate point
`between the endstations. A node will typically include a
`network element which has the capability to receive and
`forward messages on a packet-by-packet basis.
`Generally speaking,
`the larger and more complex net-
`works typieally rely on nodes that have higher layer {Layers
`
`

`
`6,115,378
`
`I0
`
`15
`
`30
`
`40
`
`3
`3 and 4) functionalities. A very large network consisting of
`several smaller subnetworks must typically use a Layer 3
`network element known as a router which has knowledge of
`the topology of the subnetworks.
`A router can form and store a topological map of the
`network around it based upon exchanging information with
`its neighbors. If a LAN is designed with Layer 3 addressing
`capability,
`then routers can be used to forward packets
`between I./\Ns by taking advantage of the hierarchical
`routing information available from the endstations. Once a
`table of endstaticn addresses and routes has been compiled
`by the router, packeLs received by the router can be for-
`warded after comparing the packet's Layer 3 destination
`address to an existing and matching entry in the memory.
`In comparison to routers, bridges are network elements
`operating in the data link layer (Layer 2) rather than Layer
`3. They have the ability to forward a packet based only on
`the Layer 2 address of the packet’s destination, typically
`called the medium access control (MAC) address. Generally
`speaking, bridges do not modify the packeLs. Bridges for-
`ward packets in a flat network having no hierarchy without
`any cooperation by the endstations.
`Hybrid forms of network elements also exist, such as
`brouters and switches. A hrouter is a router which can also
`perform as a bridge. The term switch refers to a network “
`element which is capable of forwarding packets at high
`speed with functions implemented in hardwired logic as
`opposed to a general purpose processor executing instruc-
`tions. Switches come in many flavors, operating at both
`Layer 2 and Layer 3.
`Having discussed the current technology of networking in
`general, the limitations of such conventional techniques will
`now be addressed. With an increasing number of users
`requiring increased bandwidth from existing networks due ‘
`to multimedia applications to run on the modern day
`Internet, modern and future networks must be able to
`support a very high bandwidth and a large number of users.
`Fu rthennore, such networks should be able to support mul-
`tiple trallic types such as dial voice and video which
`typically require dilferent bandwidths. Statistical studies
`show that
`the network domain, i.e., a group of intercon-
`nected LANS, as well as the number of individual endsta-
`tions connected to each LAN, will grow at a faster rate in the
`future. Thus, more network bandwidth and more eflicient
`use of resources is needed to meet these requirements.
`Building networks using Layer 2 elements such as bridges
`provides fast packet forwarding between l..ANs but has no
`flexibility in tralfic isolation, redundant
`topologies, and
`end-to-end policies for queuing and access control. For
`example, although endstatiorts in a subnet can invoke con-
`versations based on either Layer 3 or Layer 2 addressing, the
`higher layer functionalities are not supported by bridges. As
`bridges forward packets based on only Layer 2 parsing, they
`provide simple yet speedy forwarding services. However.
`the bridge does not support the use of high layer handling
`directives including queuing, priority, and forwarding con-
`straints between endstations in the same subnet.
`
`50
`
`55
`
`A prior art solution to enhancing bridge-like conversa-
`tions within a subnet relies on a network element that uses
`a combination of Layer 2 and upper layer headers. In that
`system, the Layer 3 and Layer 4 information of an initial
`packet are examined, and a "flow" of packets is predicted
`and identified using a new Layer 2 entry in the forwarding
`memory, with a lixed quality of service (005). Thereafter,
`subsequent packets are forwarded at Layer 2 speed (with the
`fixed 008) based upon a match of the Layer 2 header with
`
`60
`
`65
`
`(cid:27)
`
`4
`the Layer 2 entry in the forwarding memory. Thus, no entries
`with Layer 3 and Layer 4 headers are placed iii the forward-
`ing memory to identify the flow.
`However, consider the scenario where there are two or
`more programs communicating between the same pair of
`endstations, such as an electronic mail program and a video
`conferencing session. If the programs have dissimilar QOS
`needs, the prior art scheme just presented will not support
`different QOS characteristics between the same pair of
`endstations, because the prior art scheme does not consider
`information in I.ayer 3 and Layer 4 when forwarding. Thus,
`there is a need for a network element that is flexible enough
`to support independent priority requests from applications
`running on endstations connected to the same subnet.
`The latter attributes may be met using Layer 3 elements
`such as routers. But packet forwarding speed is sacrificed in
`return for
`the greater
`intelligence and decision making
`capability provided by the router. Therefore, networks are
`often built using a combination of Layer 2 and layer 3
`elements.
`
`The role of the server has multiplied with browser-based
`applications that use the Internet, thus leading to increasing
`variation in traffic distribution. When the role of the server
`was narrowly limited to a file server, for example,
`the
`network was designed with the client and the file server in
`the same subnet to avoid router bottlenecks. However, more
`specialized sewers like World Wide Web and video servers
`are typically not on the client’s subnet, such that crossing
`routers is unavoidable. Therefore. the need for packets to
`traverse routers at higher speeds is crucial. The choice of
`bridge versus router typically results in a significant trade-
`off, lower functionality when using bridges, and lower speed
`when using routers. Furthermore, the service characteristics
`within a network are no longer homogenous, as the perfor-
`mance of a server becomes location dependent if its traffic
`patterns involve routers.
`Therefore, there is a need for a network element that can
`handle changing network conditions such as topology and
`message trafiic yet make efficient use of high performance
`hardware to switch packets based on their Layer 2, Layer 3,
`and Layer 4 headers. The network element should be able to
`operate at bridge—likc speeds, yet be capable of routing
`packets across different subnctworks and provide upper
`layer functionalities such as quality of service.
`SUMMARY
`
`The invention lies in a multi-layer distributed network
`element {MI_Dl"-Ill) system that provides good packet for-
`warding performance regardless of its location and role in a
`network. More specifically, the invention uses a distributed
`architecture to build a larger network element system made
`up of srnailer identical network element subsystems that
`remain transparent
`to neighboring network elements and
`endstations. The multi-layer distributed network element
`(MLDNE) delivers Layer 2 wire-speed performance within
`and across subnetworks, while allowing queuing decisions
`to be based on Layer 3 protocol and topology information,
`endstation information, and Layer 2 topology information.
`The invention’s MLDNE includes a plurality of network
`element subsystems fully meshed and interconnected by
`internal links. Each network clement subsystem includes a
`hardware search engine included in a switching element
`coupled to a forwarding memory and an associated data
`memory. The switching element has a rtumber of internal
`and external ports, the internal ports coupling the internal
`links and the external ports coupling a number of connec-
`
`

`
`6,115,378
`
`5
`lions external to the MLDNE. Packets are received from and
`forwarded to neighboring nodes and end stations by the
`MLDNE through the external connections.
`The forwarding and associated memories contain entries
`needed for forwarding the packets. The forwarding memory
`contains entries having header data obtained from Layer 2
`headers of received packets. The forwarding memory also
`contains Layer 3 and 4 information configured by the CPS
`of the MLDNE to be matched with the headers of received
`packets. The associated memory identities internal and
`external ports of the switching element that are associated
`with an entry in the forwarding memory, as well as quality
`of service (QOS) information. When forwarding, the head-
`ers of a received packet are compared to entries in the
`forwarding memory to find a matching entry, and the asso-
`ciated data of a matching entry is used to pass the packet
`towards its destination.
`
`I0
`
`15
`
`The forwarding memory only contains entries given by
`the following three groups: MAC addresses directly con-
`nected to the external connections of the subsystems, Layer
`2 bridged ‘‘conversations’‘ between an external port of a
`subsystem and an internal link, and sequences of packets
`known as flows defined by the MLDNE as a Layer 3i’Layer
`4 end-to-end conversation (Layer 3 entries). The dominant
`contribution, however, comes from the MAC addresses that ~
`connect with the external connections. Therefore,
`in the
`MLDNE architecture, the required depth of the forwarding
`memory does not multiply with the number of subsystems.
`The forwarding memory and associated memory designs
`attempt
`to minimize the number of forwarding memory
`entries that are replicated on more than one network element
`subsystem. This helps make more eificient use of the
`memory resources, and minimize the number of places that
`a forwarding decision is made to yield faster packet relaying.
`Furthermore, the distributed architecture eliminates the need ‘
`for one network element subsystem to know about
`the
`details of another network element subsystem,
`including
`details such as the number of external and internal ports in
`each switching element, and the specific external port or
`ports of another switching element through which a packet
`is to be forwarded outside the MLDNE.
`
`30
`
`40
`
`6
`control of and maintained by a central processing unit, and
`contains a copy of the individual forwarding memories. The
`communication between the CPS and the various network
`element subsystems occurs through a bus. The topology of
`the internal links and the hardware search engines in the
`various network element subsystems is known to the CPS,
`so that
`the CPS can optimally define a path throng an
`internal link for a data packet to travel in order to achieve
`any desired static load balancing between multiple internal
`links coupling two network element subsystems.
`When forwarding a packet through two subsystems, all
`forwarding attributes, such as queuing priority,
`tagging
`format, routing versus bridging, route and VLAN header
`replacement, except for the ports in the outbound subsystem,
`are determined by the header matching cycles of the inbound
`subsystem. In addition to being storage eficient with respect
`to the forwarding memory as discussed above, such a
`scheme can also accommodate a useful model of using
`Layer 3 and Layer 4 information for queuing, routing, and
`policy decisions, while using Layer 2 for topology decisions.
`Another embodiment of the invention will support flows,
`where the outbound subsystem has the ability to forward the
`packet based on Layer 3 queuing.
`routing and policy
`information, rather than the relatively rigid Layer 2 forward-
`ing scheme. Because the Layer 3 forwarding capability,
`including quality of service mapping, of a subsystem is
`implemented in hardwired logic within each subsystem,
`forwarding based on a Layer 3 matching cycle should be
`comparable in speed to forwarding using Layer 2 matching
`cycles. Such an enhancement comes at the expense of using
`an additional Layer 3 entry in the outbound subsystem
`forwarding memory.
`
`BRIISF Dii-lS(IRlP’I'I()N OF THE [)RAWlN(iS
`
`The foregoing aspects and other features of the invention
`will be better understood by referring to the figures. detailed
`description, and claims where:
`FIG.
`I
`is high level view of an exemplary network
`application of a multi-layer distributed network element
`(MLDNE) of the invention.
`FIG. 2 illustrates a block diagram of the MLDNE system
`according to an embodiment of the invention.
`FIG. 3 illustrates exemplary forms of the entries in the
`forwarding and associated memories of a MLDNE sub-
`system in accordance with another embodiment of the
`invention.
`
`The network element subsystems in MIDNE are fully
`interconnected and meshed by internal links coupling inter-
`nal ports in each subsystem. In other words, each subsystem
`is directly connected to another subsystem via at least one
`internal link. In this way, a packet forwarded by MLDNE is
`delayed in no more than two locations, once at the inbound
`network element subsystem, and at most a second time in the
`outbound network element subsystem.
`With a more centralized approach, increasing the number
`of external connections would be expected to increase
`storage requirements in a central high performance forward-
`ing memory. However, in the invention, the header classi-
`fications for forwarding the packets are primarily done in the
`inbound subsystem, the increase in required storage space
`due to additional subsystems is absorbed by the forwarding
`memory of each subsystem itself, and there is no need to
`significantly increase the depth of the other forwarding
`memories in the other subsystems.
`Also, the additional external connections will increase the
`matching cycle search time in a system having a centralized
`forwarding memory. With the MLDNE, however, the addi-
`tional matching cycle searches are only performed by the
`new subsystem itself.
`The MIDNIE. also contains a central memory (CM) as part
`of a central processing subsystem (CPS). The CM is under
`
`FIG. 4 is a block diagram of an embodiment of the
`MLDNE having only two subsystems.
`
`50
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`55
`
`60
`
`65
`
`the
`As shown in the drawings by way of illustration,
`invention helps define a device that can be used to inter-
`connect a number of nodes and endstations in a variety of
`different ways. For example, an application of MLDNE
`would be switching packets over a homogenous data link
`layer such as the IEEE 802.3 standard, also known as an
`Ethernet
`link. FIG. 1 illustrates the invention’s use in a
`network where the MLDNE system is coupling a router and
`a number of different endstations, depicted as servers and
`desktop units, through external connections. The MLDNE
`system is capable of providing a high performance commu-
`nication path between servers and desktop units as well as
`commu nications via conventional router or bridge. Thus, the
`invention’s MLDNE is a multi-purpose network element.
`
`(cid:28)
`
`

`
`6,115,378
`
`7
`the invention's distributed
`In a preferred embodiment,
`architecture is designed to handle message traflic in accor-
`dance with the Internet suite of protocols, more specifically
`TCP and II’ (Layers 4 and 3, respectively) over the Ethernet
`LAN standard and MAC data link layer. However, one
`skilled in the art will recognize that other particular struc-
`tures and methods to implement the invcntion’s architecture
`can be developed using other protocols.
`The invention’s MLDNE has network element functions
`that are distributed,
`i.e., different parts of a function are
`performed by different MLDNE subsystems. These network
`element functions include forwarding,
`learning, queuing,
`and buffering. As will be appreciated from the discussion
`below and FIG. 2, MLDNE has a scalable architecture
`which allows for easily increasing the nu mber of subsystems
`210 as a way of increasing the number of external
`connections, thereby allowing greater flexibility in defining
`the surrounding network environment.
`An embodiment of the MIDNE 201 is illustrated in block
`diagram form in FIG. 2. A number of MLDNE subsystems
`210 are fully meshed and interconnected using a number of
`internal links 241 to create a larger network element. Each
`MLDNE subsystem 210 is preferably defined to be the
`largest non-blocking switching unit that is cost ellective to
`produce with modern integrated circuit manufacturing tech-
`niques.
`forwarding
`Each MLDNE subsystem 210 includes a
`memory 213 which will
`include selected header data
`arranged as type 2 and type 1 entries used to match with the
`header portion of packets received by the subsystem 210, as
`shown in FIG. 3. In the preferred embodiment shown in FIG.
`3,
`type 2 entries 321 include Layer 3 and Layer 4
`information, whereas the type 1 entries 301 includes Layer
`2 information. The forwarding memory 213 is preferably
`implemented as a content addressable memory (CAM)
`which indexes the associated memory being a
`random
`access memory (RAM). Ofcourse, the forwarding memories
`213 anclfor the associated memories in the dilferent sub-
`systems may be implemented as a single hardware structure.
`A number of external ports (not shown) interfacing external
`connections 21‘? are used to connect with nodes and end-
`stations outside MIDNE 201 such as those shown in FIG.
`1, i.e., desktops, servers, and packet switching elements such
`as bridges and routers. Internal ports in the MLDNE sub-
`system couple the internal links, where any two subsystems
`share at least one intemal link.
`
`the external and internal
`In its preferred embodiment,
`ports lie within the switching element 211. The MLDNI3 201
`also includes a central processing system (CPS) 260 that is
`coupled to the individual subsystems 210, through a com-
`munication bus 251 such as a Peripheral Components Inter-
`connect (PCI) bus. The communication between the CPS
`and the individual subsystems need not be as fast or reliable
`as the internal links between subsystems, because, as appre-
`cialed below, the CPS is not normally relied upon to forward
`the majority of traflic through the MLDNE. Rather, the CPS
`nonrrally serves to add entries and associated data to the
`forwarding and associated memories, respectively.
`The CPS 260 includes a central processing unit (CPU)
`261 coupled to a CM 263 and other memory [not shown).
`CM 263 includes a copy of the entries contained in the
`individual forwarding memories 213 of the various sub-
`systems. The CPS has a direct control and communication
`interface to each MLDNE subsystem 210. Ilowcver, the role
`of the CPS 260 in packet processing includes setting up data
`path resources such as packet buffers inside each subsystem,
`
`I0
`
`15
`
`“
`
`30
`
`35
`
`40
`
`45
`
`50
`
`_
`
`60
`
`65
`
`8
`entering and managing type 2 entries in the forwarding
`memories, and some other special cases such as routing with
`options which cannot be routinely handled by and between
`the subsystems.
`Although the (TM 263 will contain a copy of the data in
`the individual
`forwarding memories,
`the performance
`requirements for the CM are less stringent than those for the
`individual forwarding memories, because the CPS and CM
`need not be designed to forward the packets at the speeds
`obtainable by the hardwired switching logic in e

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket