`Bhaskaran
`
`(10) Patent N0.:
`(45) Date of Patent:
`
`US 6,266,335 B1
`Jul. 24, 2001
`
`US006266335B1
`
`(54) CROSS-PLATFORM SERVER CLUSTERING
`USING A NETWORK FLOW SWITCH
`
`(75) Inventor: Sajit Bhaskaran, Sunnyvale, CA (US)
`
`_
`-
`(73) Asslgnee' CyberIQ Systems’ San Jose’ CA (Us)
`
`9/1998 Hashimoto .................... .. 395/200.68
`5,815,668
`11/1998 Hess .............................. .. 395/18208
`5,835,696
`5,835,710 * 11/1998 Nagami et a1. ..
`....... .. 709/250
`578627338 * 1/1999 Walker et al'
`395000-54
`5,920,699 * 7/1999 Bare ........................... .. 709/225
`5,936,936 * 8/1999 Alexander, Jr. et a1. .......... .. 370/395
`5,949,753 * 9/1999 Alexander, Jr. et a1. .......... .. 370/216
`
`
`
`iititi?ineftail?' ' ' ' ' ' ' ' ' ' ' ' * 5:999:536 * 12/1999 Kawafuji ct.
`
`
`
`' ' ' "
`
`37O/4O1
`
`709/226
`6,006,264 * 12/1999 Colby et a1.
`709/223
`6,047,319 * 4/2000 Olson ..... ..
`.. 709/201
`6,097,882 * 8/2000 Mogul ........ ..
`6,101,616 * 8/2000 Joubert et a1. ....................... .. 714/11
`
`FOREIGN PATENT DOCUMENTS
`9 321789
`12/1997 (JP)
`HO4L/12/46
`WO 99/32956
`7/1999 (W0) ........................ u GO6F/O/OO
`
`OTHER PUBLICATIONS
`
`Internet. “Quasi—Dynamic Load—Balancing (QDBL) Meth
`9915-” Apr- 25, 1995,1111 2 and 5
`* Cited by examiner
`
`Primary Examiner—Douglas Olms
`Assistant Examiner—Phirin Sam
`(74) Attorney,
`Agent,
`or Firm—Skjerven Morrill
`Macpherson LLP; Alan H' Macpherson; Pablo E‘ Marine
`
`(57)
`
`ABSTRACT
`
`A network ?ow switch is provided for connecting a pool of
`IP routers to a cluster of IP servers sharing a single IP
`address without requiring translation of the IP address.
`Rather, all IP servers have the same IP address. The network
`?ow switch routes packets to individual servers by writing
`the Data Link Layer address of the destination IP server in
`the destmatlon Data Llnk Layer address ?eld of the Packet~
`However, no Data Link Layer address translation is required
`for packets transmitted from the IP servers to the IP routers.
`Since in a typical client-server environment, the number of
`packets sent from the server to the client is much greater
`than the number of packets sent from the client to the server,
`the Data Link Layer address translation requires very little
`Overall processing time'
`
`35 Claims, 11 Drawing Sheets
`
`( * ) Notice:
`
`
`
`Subject to any disclaimer, the term of this patent is extended or adjusted under 35
`
`U-S-C- 154(b) by@ days~
`
`(21) APPL NO; 08/994,709
`
`(22) Filed:
`
`Dec. 19, 1997
`
`(51) Int. Cl.7 ........................... .. H04L 12/28; H04L 12/56
`(52) US. Cl. ........................................... .. 370/399, 370/389
`(58) Field of Search ................................... .. 370/399, 397,
`370/402, 360, 372, 353, 389, 396, 400,
`401, 409, 419, 420, 421, 423, 901, 902,
`903, 908, 910, 912, 911, 392, 422; 395/115,
`182.07, 200.3, 200.31, 200.32, 200.33,
`200.48, 200.49, 200.57, 200.68
`
`(56)
`
`References Clted
`U_S_ PATENT DOCUMENTS
`_
`_
`2/1994 Georgiadis et a1. ............... .. 395/650
`5,283,897
`5,301,226 * 4/1994 Olson et a1. ..... ..
`379/8818
`5,473,599
`12/1995 Li et a1. .................... .. 370/16
`5,513,314
`4/1996 Kandasamy et a1. .
`.. 395/18204
`5,583,940
`12/1996 Vidrascu et a1.
`....... .. 380/49
`370/404
`5,586,121
`12/1996 Moura et a1- -
`5,608,447 * 3/1997 Parry et a1
`~~~~~ " 348/7
`576127865 * 3/1997 Dasgupta
`364/184
`Eiege "" "
`5j652:892 * 7/1997 U
`"""""""""""""""""" "
`5,655,140
`8/1997 Haddock ....................... .. 395/200.76
`576667487 * 9/1997 Goodman et a1
`709/246
`5,687,369
`11/1997 Li ,,,,,,,,,,,,, ,,
`395/619
`5,740,375 * 4/1998 Dunne et a1. ..
`709/238
`709/227
`5,754,752 * 5/1998 Sheh et al
`577647895 * 6/1998 Chung ~~~~~~~~~~ ~~
`370/402
`5774560 * 6/1998 Brendefl et a1‘
`39500031
`5,774,668 * 6/1998 Choquier et a1. ..
`370/480
`5,796,941
`8/1998 Lita .............. ..
`395/187 01
`5,805,804 * 9/1998
`5,812,819 * 9/1998
`
`Mhwk Router
`
`IBM / Softlayer v. ZitoVault
`Ex. 1006 / Page 1 of 19
`
`
`
`U.S. Patent
`
`Jul. 24, 2001
`
`Sheet 1 of 11
`
`US 6,266,335 B1
`
`
`
`Lo.
`5
`C\1
`07
`<~
`
`—
`
`1.0
`_._
`Lo.
`
`c\1
`
`Q.
`*
`
`<9
`
`"'7
`(N
`on
`*“
`
`—
`
`<3
`
`3
`Q?
`%
`o
`
`<1;
`Z
`
`.—<1)
`O
`*3
`
`_\<
`
`E
`c1_>
`Z
`
`QC
`
`-5
`O
`
`Q)
`Z
`
`O
`
`Ԥ/x_,44
`<1
`.
`
`15>
`\/
`&'Q_
`
`Ex. 1006/ Page 2 of 19
`
`LG
`
`.)
`C.)
`:
`
`*5
`as
`
`8
`L
`
`g:
`_
`Q)
`(/3
`
`5
`>
`5
`(f)
`&
`
`5
`E
`U7
`
`C_L
`
`__
`g
`
`U7
`
`53
`>
`
`U‘)
`c_L
`
`3
`>
`5
`(/7
`Q_
`
`LO.LO
`Lo
`,_'
`N’)
`c\1
`on
`“
`
`LC?
`:3
`C\l
`03
`«-
`E
`
`LO
`.-—
`<0.
`
`C\l
`
`g
`
`C‘!If)
`L0
`._‘
`
`E
`r—
`D.
`
`1.0
`‘C.
`.—
`M)
`C\l
`on
`-—
`
`E
`
`Ex. 1006 / Page 2 of 19
`
`
`
`U.S. Patent
`
`Jul. 24, 2001
`
`Sheet 2 of 11
`
`US 6,266,335 B1
`
`02323.5%n:0225:22¢_371_.£._22¢_
`
`C22:_,f22n:
`Q22Ezfizn:
`
`£29;3:H2:
`
`ES912
`
`SfiHZ:
`
`H.
`
`E92
`
`om
`
`em
`
`53mm2atom¢_‘III
`
`aimn__
`
`siom
`
`E322n:§.§2¢_
`
`3522ES22
`
`$En§¢_
`
`339%
`
`02EN
`
`am
`
`stemvtofwzE81vtoiwz
`
`Ezomvtofmz
`
`NN»:
`
`Ex. 1006/ Page 3 of 19
`
`E5,3:Efiz
`
`am
`
`Ex. 1006 / Page 3 of 19
`
`
`
`
`U.S. Patent
`
`Jul. 24, 2001
`
`Sheet 3 0f 11
`
`US 6,266,335 B1
`
`22 0% B05 25 . . is 2: NE 2%:
`
`on an 3., 2m 0m 5, E N N N N N N N
`
`6% 22 ‘
`
`8% $2 \
`
`Fm 6E
`
`am an an N N N
`
`
`
`6% N ‘ 1 m2: @ was @ ‘
`
`mm .QE.
`
`
`
`25 $062 cozoéwwa 9% $225 850m 912 1/
`
`
`
`
`
`
`
`A 8% a $3 2 x mg: iv
`
`3.,
`
`Ex. 1006 / Page 4 of 19
`
`
`
`U.S. Patent
`
`Jul. 24, 2001
`
`Sheet 4 of 11
`
`US 6,266,335 B1
`
`gm
`
`MW;
`
`Ev
`
`m?
`
`2,
`
`3.5.3
`
`a__2E8
`
`#mEm£m_
`
`a__o_E8
`
`3»62
`
`Ex. 1006/ Page 5 of 19
`
`Ex. 1006 / Page 5 of 19
`
`
`
`
`
`
`
`US. Patent
`
`Jul. 24, 2001
`
`Sheet 5 6f 11
`
`US 6,266,335 B1
`
`i
`
`Start
`
`)
`
`ti
`Ethernet cord
`receives packet ~ 420
`
`Copy packet
`to Memory
`
`t
`Perform Load Balancing
`/ (OPTIONAL)
`435
`"
`
`Re—write MAC
`f destination address
`440
`
`Route packet
`to Server
`
`~445
`
`t!
`End
`
`FIG. 4B
`
`Ethernet card
`receives packet
`
`-» 450
`
`Re—write MAC
`destination address
`(OPTIONAL)
`
`ti
`Route packet
`to Network Router ~ 465
`
`i
`End
`
`FIG. 4C
`
`Ex. 1006 / Page 6 of 19
`
`
`
`U.S. Patent
`
`Jul. 24, 2001
`
`Sheet 6 of 11
`
`US 6,266,335 B1
`
`o_¢
`
`mom
`
`\\2:ma_
`
`ago
`
`zomammxN_m
`
`2,29%
`
`en
`
`E:2%052.5
`
`Nmmmm
`
`mmumo
`
`Nmmmm
`
`mN-mQ
`
`Noow¢-_o
`
`_o=o_#coo
`
`Noowv-H0
`
`_m=o_#coo
`
`Noom¢-_o
`
`_®=o_#coo
`
`Noow¢-Ho
`
`_o=o_#:oo
`
`m2_
`
`§<mcom
`
`o¢m
`
`mz_
`
`§<moom
`
`mz_
`
`2<mcom
`
`mz_
`
`2<mogm
`
`"36E
`
`canommo_m
`
`Ex. 1006/ Page 7 of 19
`
`Ex. 1006 / Page 7 of 19
`
`
`
`
`
`
`
`U.S. Patent
`
`Jul. 24, 2001
`
`Sheet 7 of 11
`
`US 6,266,335 B1
`
`o_¢
`
`o o
`
`m
`
`zomlmmxN_m
`
`2<mmzw
`
`SW0:mzm
`
`m c
`
`m
`
`3&0
`
`oc_¢xFe
`
`Nmmmm
`
`nmumo
`
`Nmmmm
`
`mmumo
`
`Ex. 1006/ Page 8 of 19
`
`Ex. 1006 / Page 8 of 19
`
`
`
`U.S. Patent
`
`Jul. 24, 2001
`
`Sheet 8 of 11
`
`US 6,266,335 B1
`
`o 9
`
`.,3.5
`
`mom
`
`o_¢
`
`com
`
`éEEE
`
`Emu
`
`oomvnm_e
`
`o_o¢©-_o
`
`Nmmmm
`
`mmuma
`
`mmmmm
`
`mmnmo
`
`Ex. 1006/ Page 9 of 19
`
`Ex. 1006 / Page 9 of 19
`
`
`
`U.S. Patent
`
`Jul. 24, 2001
`
`Sheet 9 of 11
`
`US 6,266,335 B1
`
`Li‘?
`C)
`<\l
`
`\
`
`29%EE3%
`
`5%E:a__§§
`
`§<mmzw
`
`onomm
`
`
`
`smogmzmo_o¢©:Ho_m=o:coo
`
`beam:
`
`§_e<25252$:Es
`
`Ex.1006/Page10of19
`
`Ex. 1006 / Page 10 of 19
`
`
`
`U.S. Patent
`
`Jul. 24, 2001
`
`Sheet 10 of 11
`
`US 6,266,335 B1
`
`o_¢
`
`223e_E
`
`now
`
`\\2:mgm
`
`ooN¢-m_o_mmmmm
`
`mmumo
`
`:2:ea
`
`mmmNnmmm
`
`oa$»5
`
`mmumo
`
`Noow¢-_o
`
`_m=o_#coo
`
`Noowv-_o
`
`Noow¢-Ho
`
`
`
`_w=o_#=oo_m=o.#=oo
`
`Noow¢-_o
`
`_o=o__coo
`
`m2_
`
`2<mcom
`
`mg,
`
`:<mcam
`
`m2_
`
`2<mcam
`
`m:_
`
`§<mcow
`
`owncanowno_m
`
`Ex.1006/Page11of19
`
`Ex. 1006 / Page 11 of 19
`
`
`
`
`
`
`U.S. Patent
`
`Jul. 24, 2001
`
`Sheet 11 0f 11
`
`US 6,266,335 B1
`
`zsaw 02 31% 22
`
`0; an) 22
`
`gm § am an
`
`~ ~ x
`
`N am,
`
`am 9mm x N
`
`6%86
`
`E5
`
`
`
`ECU gomwmuel E00 LOmmmoOkm
`
`
`
`
`
`mow
`
`
`
`zwewgcoz E2381
`
`éweomocoz
`
`
`
`
`
`
`
`
`
`“m0; “0523M #mOL 3:556 am am 25 2E2; E5 25% x ./
`
`
`
`Ex. 1006 / Page 12 of 19
`
`
`
`US 6,266,335 B1
`
`1
`CROSS-PLATFORM SERVER CLUSTERING
`USING A NETWORK FLOW SWITCH
`
`CROSS REFERENCE TO APPENDIX
`
`Appendix A, Which is part of the present application, is a
`set of architectural speci?cations for a network ?oW sWitch,
`according to one embodiment of the invention.
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`The present invention relates generally to computer net
`Works and more speci?cally, to high-bandWidth netWork
`sWitches.
`2. Description of the Related Art
`The increasing traf?c over computer netWorks such as the
`Internet, as Well as corporate intranets, WANs and LANs,
`often requires the use of multiple servers to accommodate
`the needs of a single service provider or MIS department.
`For example, a company that provides a search engine for
`the Internet may handle over 80 million hits (i.e., accesses
`to the company’s Web page) every day. A single server
`cannot handle such a large volume of service requests Within
`an acceptable response time. Therefore, it is desirable for
`high-volume service providers to be able to use multiple
`servers to satisfy service requests.
`For example, the Internet Protocol (IP), Which is used to
`identify computers connected to the Internet and other
`global, Wide or local area netWorks, assigns a unique IP
`address to each computer connected to the netWork. Thus,
`When multiple servers are used, each server must be
`accessed using the server’s oWn IP address.
`On the other hand, it is desirable for users to be able to
`access all servers of a service provider using a unique IP
`address. OtherWise, the users Would have to keep track of the
`servers maintained by the service provider and their relative
`Workloads in order to obtain faster response times. By using
`a single “virtual” IP address (i.e., an IP address that does not
`correspond to any one of the IP servers, but rather designates
`the entire group of IP servers), service providers are able to
`divide service requests among the servers. By using this
`scheme, IP servers may even be added or removed from the
`group of IP servers corresponding to the virtual IP address
`to compensate for varying traf?c volumes. Multiple servers
`used in this fashion are sometimes referred to as a “cluster.”
`FIG. 1 illustrates a prior art cluster of IP servers. A server
`load balancer 100 routes packets among IP servers 110, 120,
`130, 140 and 150 and netWork routers 160, 170 and 180.
`Each of IP servers 110, 120, 130, 140 and 150 and netWork
`routes 160, 170 and 180 has a distinct IP address; hoWever,
`any of IP servers 110, 120, 130, 140 and 150 can be accessed
`via a virtual IP address (not shoWn) from netWorks con
`nected to netWork routers 160, 170 and 180. When a packet
`addressed to the virtual IP address is received by server load
`balancer 100, the virtual IP address is translated into the
`individual IP addresses of one of the IP servers and the
`packet is routed to that IP server. The translation, hoWever,
`involves generating a neW checksum for the packet and
`re-Writing the source/destination IP address and the check
`sum ?elds of the IP header ?eld, as Well as of the TCP and
`UDP header ?elds. Both the IP header checksum, Which is
`the ISO Layer 3 or NetWork Layer header, and the TCP or
`UDP header checksums, Which are the ISO Layer 4 or
`Transport Layer header checksums, need to be recalculated
`for each packet. Typically, these operations require inter
`vention by a processor of the server load balancer.
`
`15
`
`25
`
`35
`
`45
`
`55
`
`65
`
`2
`When a high volume of requests is processed, the over
`head imposed by the translation has a signi?cant impact on
`the response time of the IP servers. In addition, if a large
`number of IP servers are used, the time required to perform
`the translation creates a bottleneck in the performance of the
`server load balancer, since the IP address of each packet
`transmitted to and from the IP servers must be translated by
`the sWitch. Therefore, there is a need for a faster method for
`sharing a single IP address among multiple IP servers.
`In other cases, When multiple IP addresses are used and a
`client typically tries to access a primary IP server. If the
`primary IP server does not respond Within a ?xed time
`period, the client tries to access backup IP servers, until a
`response is received. Thus, When the primary IP server is
`unavailable, the client experiences poor response time. Cur
`rent server replication systems such as those used in DNS
`and RADIUS servers are affected by this problem. There is
`thus a need for a method of accessing multiple IP servers
`Which does not experience poor response time When the
`primary IP server is unavailable.
`Another potential draWback of the prior art is that each
`replicated server requires a unique IP address physically
`con?gured on the server. Since all IP netWorks are subject to
`subnet masking rules (Which are often determined by an
`external administrator) the scalability of the replication is
`severely limited. For example, if the subnet pre?x is 28 bits
`of a 32-bit IP address, the maximum number of replicated
`servers is 16 (26248)). There is a need for a method of
`replicating servers that alloWs replication of IP servers
`independent of subnet masking rules.
`IP version 4 addresses are currently scarce on the Internet,
`so any method of IP server replication that requires a
`proportional consumption of these scarce IP addresses is
`inherently Wasteful. For example, an example of prior art is
`Domain Name Service (DNS) based load balancing. DNS
`servers are used for resolving a server name (e.g.,
`WWW.companyname.com) to a globally unique IP address
`(e.g., 192.45.54.23). In DNS based server load balancing,
`many unique IP addresses per server name are kept and
`doled out to alloW load balancing. HoWever, this reduces the
`number of available IP version 4 addresses. There is thus a
`need for a method of clustering IP servers that minimiZes
`consumption of the scarce IP address space.
`Furthermore, When the IP payload of a packet is encrypted
`to provide secure transmissions over the Internet, IP address
`translation cannot be performed Without ?rst decrypting the
`IP payload (Which contains the TCP or UDP header
`checksums). In the current frameWork for IP Security,
`referred to as IPSEC, the transport layer is part of the
`netWork layer payload Which Will be completely encrypted
`in a netWork application that implements IPSEC. IPSEC is
`described in RFCs 1825—1827 published by the Internet
`Engineering Taskforce. Encryption is performed by the
`client, and decryption is performed by the server, using
`secret crypto-keys Which are unique to each client-server
`link. Therefore When such encryption is performed in client
`server communications, as in IPSEC, prior art server load
`balancers Will not be able to perform load balancing opera
`tions Without violating IPSEC rules. This is because server
`load balancers cannot access the transport layer information
`(encrypted as part of the IP payload) Without ?rst decrypting
`the IP payload. Since, the crypto-keys set up betWeen client
`and server are by de?nition not public, the IP payload cannot
`be decrypted by the server load balancer in compliance With
`IPSEC (indeed, for all practical purposes, the server load
`balancer Will not Work at all for encrypted packets).
`There is thus a need for a system that not only alloWs for
`transmissions of encrypted data packets according to the
`
`Ex. 1006 / Page 13 of 19
`
`
`
`US 6,266,335 B1
`
`3
`IPSEC model, but also allows network administrators to
`perform both server load balancing and IPSEC in their
`networks. Furthermore, current server load balancers typi
`cally operate on TCP packets only. By contrast, IP headers
`have an 8-bit protocol ?eld, theoretically supporting up to
`256 transport protocols at ISO layer 4. There is thus a need
`for a server load balancing system that supports transport
`protocols at ISO layer 4 other than TCP (e.g., UDP, IPiini
`IP, etc.).
`Prior art systems allow for load balancing and,
`sometimes, fault tolerance of network traf?c only in the
`inbound direction (i.e., client-router-server). Load balancing
`and fault tolerance in the reverse (outbound) direction (i.e.,
`server-router-client) is not supported. Speci?cally if mul
`tiple router links are provided for the server to return
`information to clients, no attempt is made to load balance
`traf?c ?ow through the router links. Also, when a speci?c IP
`server is con?gured to use a speci?c default router IP
`address in the outbound transmissions, no fault tolerance or
`transparent re-routing of packets is performed when the
`router fails. There is thus a need for a system that allows for
`traf?c ?ow clustering services, in both the inbound and the
`outbound directions.
`The prior art solutions are hardware devices con?gured to
`appear as IP routers to the cluster of servers being load
`balanced. As a result, one more classes of IP router devices
`are added to the router administrator’s domain of managed
`IP routers. This constrains future evolution of the router
`network, both in terms of adding new vendors’ routers in the
`future and adding new and more sophisticated routing
`features. Debugging and troubleshooting of routing prob
`lems also becomes more difficult. It would thus be preferable
`to employ a completely transparent piece of hardware, such
`as a LAN switch or hub, as a load balancing device. In the
`related art, the servers and any external routers are con
`nected to the load balancing device using shared media
`Ethernet, (i.e., a broadcast media network). There is a need
`for a better solution that allows use of switched circuits (e.g.,
`switched Ethernet, SONET), as switched circuits inherently
`provide (a) dedicated bandwidth and (b) full-duplex (i.e.,
`simultaneous transmit and receive operations) to call con
`nected devices.
`
`SUMMARY OF THE INVENTION
`The present invention provides a network ?ow switch
`(and a method of operation thereof)for connecting a pool of
`IP routers to a cluster of IP servers sharing a single IP
`address, without requiring translation of the IP address, and
`providing bi-directional clustering. The network ?ow
`switch, by operating transparently at the 150 layers 2 and 3,
`enables cross-platform clustering of servers and routers,
`these routers being the so-called “?rst-hop” routers used by
`the servers to communicate with the outside world. This
`means the servers within any single cluster can come from
`any manufacturer of computer hardware and run any oper
`ating system (e.g., Microsoft WINDOWS NT, Unix,
`MACOS). WINDOWS NT is a registered trademark of
`Microsoft Corp. of Redmond, Wash.; MACOS is a regis
`tered trademark of Apple Computer, Inc. of Cupertino, Calif.
`It also means the routers can come from any vendor of
`routing equipment. The network ?ow switch therefore,
`allows customers freedom of choice in server operating
`systems as well as router systems in designing their server
`clustering schemes. The only requirements on these servers
`and routers is that they all implement standard TCP/1P
`communications protocols, or some other protocol stack in
`conformance with the ISO/OSI 7-layer model for computer
`
`10
`
`15
`
`25
`
`35
`
`45
`
`55
`
`65
`
`4
`communications. The network ?ow switch routes packets to
`individual servers by writing the Data Link Layer address of
`the destination IP server in the destination Data Link Layer
`address ?eld of the packet. Packets transmitted from the IP
`servers to the IP routers, on the other hand, do not require
`modi?cation of the Data Link Layer address ?eld.
`Since in a typical client-server environment the majority
`of the packets ?owing through the network ?ow control
`switch are transferred from the server to the client, elimi
`nating processor intervention in routing outbound packets
`allows for signi?cant performance enhancements. As a
`result, the likelihood of the network ?ow switch becoming
`a bottleneck is greatly reduced.
`Multiple clusters (one or more PI servers sharing a single
`IP address) are supported in a single network ?ow switch.
`On any single link attached to each of the IP servers,
`multiple clusters can be supported if the IP server’s operat
`ing system supports multiple IP addresses on a physical link.
`In some embodiments, the network ?ow switch, in addi
`tion to routing of the packets, performs load balancing and
`fault tolerance functions. In these embodiments, a processor
`of the network ?ow switch periodically eXecutes a load
`balancing routine to determine the relative workload of each
`of the IP servers. When the network ?ow switch receives a
`packet destined to the cluster of IP servers, the packet is
`routed to the IP server with an optimal workload, so as to
`ensure that the workload is evenly distributed among the IP
`servers. In addition, if a failure of a network router is
`detected, a packet addressed to that network router is
`re-routed to a different network router by re-writing the Data
`Link Layer destination address of the packet. Since the
`network ?ow switch continuously monitors the status of the
`IP servers, no lengthy time delay is introduced in client
`server communications when an IP server is disabled.
`Since the IP header is not modi?ed, the network ?ow
`switch of the present invention operates on packets encoded
`according to any ISO layer 4 protocol and, unlike prior art
`server load balancers, is not limited to TCP encoded packets.
`In addition, the network ?ow switch can also handle
`re-routing, load balancing and fault tolerance of encrypted
`packets transparently to both server and client.
`In some embodiments, load balancing is also performed
`for outbound packets so as to route packets to the router with
`an optimal workload.
`Thus, a method and apparatus are provided to allow
`bi-directional clustering for load balancing and fault toler
`ance in the inbound direction (i.e., client-router-server), as
`well as in the outbound direction (i.e., server-router-client).
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`FIG. 1 illustrates a prior art cluster of IP servers, each
`having a distinct IP address, and a prior art network ?ow
`switch for translating a virtual IP addressed shared by all IP
`servers in the cluster into the individual IP addresses of the
`IP servers.
`FIG. 2 illustrates a cluster of IP servers and a network
`?ow switch, according to an embodiment of the invention.
`Each IP server has a same IP address. A Data Link Layer
`address is used to identify each IP server within the cluster.
`FIG. 3A illustrates the format of a packet routed to/from
`the cluster of IP servers by the network ?ow switch 205 of
`FIG. 2.
`FIG. 3B shows the format of link ?eld 320 of FIG. 3A.
`FIG. 4A illustrates the structure of the network ?ow
`switch 205 of FIG. 2.
`
`Ex. 1006 / Page 14 of 19
`
`
`
`US 6,266,335 B1
`
`5
`FIG. 4B is a How diagram of the process of routing
`packets from one of the network clients to one of the IP
`servers of FIG. 2 via the network ?oW sWitch 205 of FIG.
`4A, according to an embodiment of the invention.
`FIG. 4C is a How diagram of the process of routing
`packets from one of the IP servers to one of the netWork
`clients of FIG. 2 via the netWork ?oW sWitch 205 of FIG. 4A,
`according to an embodiment of the invention.
`FIG. 5A is a block diagram of a netWork ?oW sWitch
`implemented using multiple general-purpose circuit boards,
`according to an embodiment of the invention.
`FIG. 5B is a block diagram of a netWork ?oW sWitch
`implemented using a general-purpose CPU board and a
`special-purpose netWork board, according to an embodiment
`of the invention.
`FIG. 5C is a block diagram of a netWork ?oW sWitch
`implemented using tWo special-purpose circuit boards,
`according to an embodiment of the invention.
`FIG. 5D is a block diagram of a netWork ?oW sWitch
`implemented using a single special-purpose circuit board,
`according to an embodiment of the invention.
`FIG. SE is a block diagram of a netWork ?oW sWitch
`implemented using a combination of special-purpose and
`general purpose circuit boards, according to an embodiment
`of the invention.
`FIG. 5F is a block diagram of a netWork ?oW sWitch
`implemented using a crossbar sWitch, according to an
`embodiment of the invention.
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`The method and apparatus of the present invention alloW
`multiple IP servers to share a same IP address and use a
`netWork ?oW sWitch to route packets among the IP servers
`based on the Data Link Layer address of the IP servers (e.g.,
`the destination address of the packets is translated into the
`Data Link Layer address of one of the IP servers). Since IP
`netWorks ignore the source Data Link Layer address ?eld of
`packets transmitted over the netWork, Data Link Layer
`address translation is performed only for packets ?oWing
`from an IP client to an IP server. In the reverse ?oW
`direction, that is, from an IP server to an IP client, no Data
`Link Layer address translation is required, thus alloWing for
`very fast throughput through the netWork ?oW sWitch.
`Acluster of IP servers 200 and a netWork ?oW sWitch 205,
`according to an embodiment of the invention, are shoWn in
`FIG. 2. Network How sWitch 205 routes packets among IP
`servers 210, 220, 230,240 and 250 and netWork routers 260,
`270 and 280. IP servers 210, 220, 230,240 and 250 are
`con?gured identically and have a virtual IP address 290. In
`addition, each of IP servers 210, 220, 230, 240 and 250 has
`a distinct Data Link Layer address, and a distinct link name.
`The link name is used to identify the unique server Within
`the cluster of servers sharing a same IP address. As
`explained beloW, the Data Link Layer address is used to
`translate a virtual Data Link Layer address to a physical Data
`Link Layer address, after an IP server is selected by netWork
`?oW sWitch 205 to receive the packet. IP address 290 is
`visible to devices communicating With the cluster 200, While
`the individual Data Link Layer addresses of each of the IP
`servers are not. Network How sWitch 205, in fact, performs
`a proxy Address Resolution Protocol (ARP) function that
`returns a “virtual” Data Link Layer address (not shoWn) to
`a netWork connected device in response to a standard ARP
`query. As a result, netWork connected devices see the cluster
`
`6
`200 as having a single IP address 290 and a single Data Link
`Layer address (not shoWn).
`NetWork routers 260, 270 and 280, on the other hand,
`each have a distinct IP address and a distinct Data Link
`Layer address. The routers are used to connect cluster 200 to
`external netWorks (not shoWn) via netWork ?oW sWitch 205.
`Thus, in order to transmit packets of information to cluster
`200, a device connected to one of the external netWorks
`(e.g., a router) issues a standard ARP query to netWork ?oW
`sWitch 205 to obtain the virtual Data Link Layer address of
`cluster 200; netWork ?oW sWitch 205 returns a Data Link
`Layer address of the selected receiving device (e.g., one of
`the IP servers) to the requesting device (e.g., the router). The
`netWork connected device then transmits a series of packets
`to netWork ?oW sWitch 205 (e.g., through one of netWork
`routers 260, 270 or 280 connected to the external netWork).
`The packets are then re-routed by netWork ?oW sWitch 205
`to exactly one of IP servers 210, 220, 230, 240 and 250.
`Since all embodiments of the netWork ?oWsWitch ensure
`that no tWo servers in the same cluster are on the same
`?oWsWitch part, broadcast isolation of the replicated servers
`is enabled. Therefore, IP address con?icts are avoided by the
`active intervention of the ?oWsWitch in the event of ARP
`query packets being received by the netWork ?oWsWitch, as
`described above.
`The format of a packet 300 transmitted over the external
`netWork is illustrated in FIG. 3A. Packet 300 has a header
`?eld 310, a link ?eld 320, an IP header 330, a TCP header
`340, a data payload 350, a CRC ?eld 360 and a trailer 370.
`Header 310 and trailer 370 are 8-bit Wide private tag-?elds:
`these are not transmitted over the external netWork but used
`only inside the netWork ?oW sWitch. IP header 330 and TCP
`header 340 are standard IP and TCP headers. IP header 330
`includes, among other information, a destination IP address
`and a source IP address for packet 300. CRC ?eld 360
`contains a checksum correction code used to verify that
`packet 300 has been transmitted Without error. If IP header
`330 Were modi?ed, as required by prior art methods for
`sharing a single IP address among multiple IP servers, the
`checksum for CRC ?eld 360 Would have to be recalculated,
`an operation requiring processor intervention. In addition, if
`encrypted information is transmitted according to the IPSEC
`security frameWork, decryption of the IP payload is required.
`Thus, by eliminating the need to recompute the checksum
`for each packet, the netWork ?oW sWitch of the present
`invention achieves better throughput than prior art devices.
`NetWork oWners can further deploy IPSEC security mecha
`nisms transparently and Without fear of communications
`being broken.
`FIG. 3B illustrates the format of link ?eld 320. Link ?eld
`320 has a Data Link Layer source address ?eld 380, a Data
`Link Layer destination address ?eld 390 and type ?eld 395.
`Since link ?eld 320 is not part of the IP protocol, there is no
`need to recalculate the checksum for CRC ?eld 360 When
`link ?eld 320 is modi?ed. Accordingly, re-routing of packets
`according to the present invention is accomplished by
`re-Writing the Data Link Layer destination address in Data
`Link Layer destination address ?eld 390 of packet 300.
`Neither IP header 330 nor CRC ?eld 360 are modi?ed,
`reducing the processing time required to route packets to and
`from the cluster of IP servers.
`An embodiment of netWork ?oW sWitch 205 (FIG. 2) is
`illustrated by the block diagram of FIG. 4A. Network How
`sWitch 205 has a CPU board 400 and four ethernet cards 415,
`416, 417 and 418 connected by a PCI bus 410. CPU board
`400, in turn, has a CPU 402, a memory 404, and a memory
`
`10
`
`15
`
`25
`
`35
`
`45
`
`55
`
`65
`
`Ex. 1006 / Page 15 of 19
`
`
`
`US 6,266,335 B1
`
`7
`controller 406 for controlling access to the memory 404.
`Each of ethernet cards 415, 416, 417 and 418 has an ethernet
`controller and tWo input/output ports 411 and 413.
`A network ?oW sWitch according to one embodiment of
`the invention can be constructed entirely from off-the-shelf
`ASICs (Application Speci?c Integrated Circuits), controlled
`by a general purpose CPU executing a softWare program.
`Since many commercially available Ethernet sWitches pro
`vide general purpose CPUs for sWitch management (e.g., for
`executing SNMP and IEEE 802.1D Spanning Tree
`Protocols) a netWork sWitch according to an embodiment of
`the invention can be easily implemented on such hardWare
`platforms. The only requirement is that the ASIC be able to
`support some form of “CPU intervention” triggered When a
`packet With a particular destination Data Link Layer address
`is routed through the netWork ?oW sWitch. ASICs that
`support this form of CPU intervention are available from,
`among others, Galileo Technology Ltd. of Kormiel, Israel,
`MMC Networks, Inc. of Sunnyvale, Calif. and I-Cube, Inc.
`of Campbell, Calif.
`The process of routing a packet 300 (FIG. 3A) received by
`one of netWork routers 260, 270 or 280 to one of IP servers
`210, 220, 230, 240 or 250 of FIG. 2 is illustrated by the How
`diagram of FIG. 4B. Initially, a packet is received on a port
`of one of ethernet cards 415, 416, 417 or 418, in stage 420.
`In stage 425, ethernet controller 412 then checks a CPU
`intervention bit to determine Whether the packet needs to be
`sent to the CPU board 400 for further processing. In such a
`case the packet is transferred to CPU board 400 over PCI bus
`410 and stored in memory 404 by memory controller 406, in
`stage 430. If the CPU intervention bit is not set, hoWever, the
`processing proceeds to stage 445. Stage 435 performs an
`optional load balancing operation to determine Which of IP
`servers 210, 220, 230, 240 or 250 packet 300 is to be routed
`to. The load balancing operation of stage 435 attempts to
`divide packets to be processed among the IP servers accord
`ing to the capacity and the current utiliZation of each server.
`A load balancing scheme suitable for use in the present
`invention is described in a related application titled
`“DYNAMIC LOAD BALANCER FOR MULTIPLE NET
`WORK SERVERS” by Sajit Bhaskaran and Abraham
`MattheWs, having Ser. No. 08/992,038, Which is herein
`incorporated by reference in its entirety. Stage 440 then
`reWrites the Data Link Layer destination address ?eld of
`packet 300 to indicate Which of IP servers 210, 220, 230, 240
`or 250 packet 300 is to be routed to. Finally, the packet is
`transferred the one of ethernet cards 415, 416, 417 or 418 to
`Which the IP server speci?ed by the Data Link Layer
`destination address ?eld of packet 300 is connected, in stage
`445.
`The process of routing a packet 300 (FIG. 3A) from one
`of IP servers 210, 220, 230, 240 or 250 to one of netWork
`routers 260, 270 or 280 (FIG. 2) is illustrated by the How
`diagram of FIG. 4C. Initially, a packet is receive