`Adelman et al.
`
`US006078957A
`[11] Patent Number:
`[45] Date of Patent:
`
`6,078,957
`Jun. 20, 2000
`
`5,832,222 11/1998 DZiadosZ et al. ..................... .. 709/216
`5,848,233 12/1998 Radia et al. .......................... .. 713/201
`$151991 :1 ‘111- -
`8/1999 Chung '
`
`oura e a .
`
`.
`
`,
`
`,
`
`579407597
`
`[54] METHOD AND APPARATUS FOR A TCP/IP
`LOAD BALANCING AND FAILOVER
`PROCESS IN AN INTERNET PROTOCOL (IP)
`NETWORK CLUSTERING SYSTEM
`
`[75] Inventors: Kenneth Allen Adelman, Corralitos;
`David Lyon Kashtan, La Selva Beach;
`William L_ Palter; Derrel] I)_ Piper,
`II, both of Santa Cruz, all of Calif
`
`[73] Assignee: Network Alchemy, Inc., Santa Cruz,
`Calif
`
`[21] APPL No. 09/197 018
`’
`NOV- 20, 1998
`[22] Filed:
`[51] Int. c1.7 .................................................... .. G06F 13/00
`5
`U S C]
`09 224
`I:
`.-
`-
`- ............................................................ .. 7 /
`[58] Fleld 0f Search ................................... .. 709/200, 201,
`709/209> 221’ 223’ 224’ 226’ 238
`C, d
`R f
`e erences lte
`
`56
`I
`I
`
`OTHER PUBLICATIONS
`
`Andresen, D., et al., “Toward a Scalable Distributed WWW
`Server on Workstation Clusters,” Journal of Parallel and
`Distributed Computing, Article No. PC971305, vol. 42,
`1997 at 91—100
`Cardellini, V., et al., “Efficient State Estimators for Load
`Control Policies in Scalable Web Server Clusters,” 22nd
`Annual International Computer Software & Applications
`Conference (Compsac ’98), Aug. 19—21, 1998 at 449—455.
`CardoZa, W., et al., “Overview of Digital UNIX cluster
`System Archltecmre’ Proceedmgs of COMPCON 96’ Feb'
`25—28, 1996 at 254—259.
`Chun,
`et 81', “Virtual Network Transport Protocols for
`Myrinet,” IEEE Micro, Jan/Feb' 1998 at 53_63~
`Cisco Systems Inc., “LocalDirector Hot—Standby Faiilover:
`Product Overview,” www.cisco.com/univercd/cc/td/doc/
`product/iaabu/localdir/ld20rns/ldicgd/ld3ich5.htm, 1998.
`
`Us‘ PATENT DOCUMENTS
`
`(List continued on neXt page.)
`
`571387615
`5,301,337
`
`ggrzman et al. .......................
`8/1992 Lampgggtzlimj
`4/1994 Wells et al. .... ..
`
`370/400
`709/104
`
`Primary Examiner I loustafa M‘ Meky
`Attorney, Agent, or Firm—l.\/Iorrison & Foerster LLP; Erwin
`J - Basinski
`
`5,448,723
`
`9/1995 Rowett . . . . . . . . .
`
`. . . . . . .. 714/4
`
`5,553,239
`9/1996 Heath et al. .......................... .. 713/201
`5,586,121 12/1996 Moura et al. ......................... .. 370/404
`
`[57]
`ABSTRACT
`The present invention is a method and apparatus for moni_
`
`5,612,865
`
`3/1997 Dasgupta . . . . .
`
`. . . . .. 700/79
`
`709/217
`5 666 486 9/1997 Al?eri et al
`709024
`576947547 12/1997 Subramania'n
`5j699:500 12/1997 Dasgupta _______________________ __ 714/1
`5,708,659
`1/1998 Rostoker et a1,
`370/392
`5,712,981
`1/1998 McKee et al.
`709/241
`370/227
`5,737,311
`4/1998 Wyld ------------------ -
`5,778,185
`7/1998 Gfegerson et ‘11-
`~
`709/226
`5’793’968
`8/1998 Gfegerson et a1‘
`709/209
`giiscgcil'et' ' ' ' ' ' ' '
`370/236
`5js2sjs55 10/1998 Moura et al.
`5,828,876 10/1998 Fish et al. ................................. .. 707/1
`
`' ' '
`
`.
`
`.
`
`.
`
`.
`
`toring packet loss activity in an Internet Protocol (IP)
`network clustering system which can provide a useful,
`discrete and tangible mechanism for controlled failover of
`the TCP/IP network cluster system. An adaptive interval
`value is determined as a function of the average packet loss
`in the system, and this adaptive interval value used to
`determine when a cluster member must send a neXt kee
`palive message to all other cluster members, and wherein the
`keepalive message is used to determine network packet loss.
`
`25 Claims, 12 Drawing Sheets
`
`MASTER GOT
`CLIENT KEEPALIVE
`
`0
`83
`
`YES
`
`THIS
`CLIENT IN MY
`CLUSTER
`'?
`
`K 833
`.1
`SEND EX'I
`CLUSTER
`
`]
`EXIT
`
`834
`
`{-835
`CALCULATE AND STORE
`PACKET LOSS AVERAGE
`(USING SEQUENCE NUMBER
`OF KEEPALIVE AND ADAPTIVE
`KEEPALIVE INTERVAL)
`
`837
`[
`RESET WATCHDOG
`FOR THIS CLIENT
`
`y
`
`834
`
`EXIT
`
`IBM / Softlayer v. ZitoVault
`Ex. 1004 / Page 1 of 22
`
`
`
`6,078,957
`Page 2
`
`OTHER PUBLICATIONS
`
`Damani, O., et al., “ONE—IP: techniques for hosting a
`serviceon a cluster of machines,” Computer Networks and
`ISDN Systems, vol. 29, 1997 at 1019—1027.
`Fox, A., et al., “Cluster—Based Scalable Network Services,”
`16th ACM Symposium on Operating systems Principles,
`Oct. 5—8, 1997 at 78—91.
`Ghormley, D.P., et al. “GLUnix: A Global Layer Unix for a
`Network of Workstations,” Software—Practice and Experi
`ence, vol. 28, No. 9, Jul. 25, 1998 at 929—961.
`Huang, Y. et al., “Software Implemented Fault Tolerance:
`Technologies and Experience,” 23rd International Sympo
`sium on Fault—Tolerant Computing, Jun. 22—24, 1993 at
`2—9.
`Hunt, G.D.H., et al., “Network Dispatcher: a connection
`router for scalable Internet services,” Computer Networks
`and ISDN Systems, vol. 30, 1998 at 347—357.
`Hwang, K., et al., Scalable Parallel Computing, WCB
`McGraw—Hill, 1998 at 366—377, 453—564.
`KurcewicZ, M., et al., “A Distributed WWW Cache,” Com
`puter Networks and ISDN Systems, vol. 30, 1998 at
`2261—2267.
`
`Mendiratta, V.B., “Reliability Analysis of Clustered Com
`puting Systems,” 9th International Symposium on Software
`Reliability Engineering, Nov. 4—7, 1998 at 268—272.
`Parker, T., “QualixHA+,” Unix Review, Mar. 1998 at 59—61.
`Short, R., et al., “Windows NT Clusters for Availability and
`Scalabilty,” Proceedings, IEEE COMPCON ’97, Feb.
`23—26, 1997 at 8—13.
`Taschek, J., “ZD Internet Lab; A Well—Balanced Web,
`”www.Zdnet.com/icom/Zdlabs/load.balance/, Feb. 23, 1998.
`Thaler, D.G., et al., “Using Name—Based Mappings to
`Increase Hit Rates,” IEEE/ACM Transactions on Network
`ing, vol.6, No.1, Feb. 1998 at 1—14.
`Valence Research, Inc., “Convoy Cluster Software; Product
`Overview,” www.valence.com/convoy/main.html.
`Venkataraman, S., et al., “Memory Management for Scal
`able Web Data Servers,” 13th International Conference on
`Data Engineering, Apr. 7—11, 1997 at 510—519.
`Vogels, W., et al., “The Design and Architecture of the
`Microsoft Cluster Service—A Practical Approach to
`High—Availability and Scalability,” 28th annual Interna
`tional Symposium on Fault—Tolerant Computing, Jun.
`23—25, 1998 at 422—431.
`
`Ex. 1004 / Page 2 of 22
`
`
`
`U.S. Patent
`
`Jun. 20,2000
`
`Sheet 1 0f 12
`
`6,078,957
`
`TYPICAL
`100
`NTERNET NETWORK
`CONFIGURATION
`
`BRANCH
`OFFICE
`
`I105
`
`a
`
`GATEWAY/
`TUNNEL
`SERVER
`
`CLIENTS
`
`'
`
`107
`
`GATEWAY /
`TU N N EL
`SERVER
`/111
`113
`F
`
`112
`F
`
`114
`F
`
`SERVER
`1
`
`SERVER
`2
`
`SERVER
`3
`
`>/115
`
`r116
`
`r117
`
`r118
`LOCAL CLIENTS
`
`Ex. 1004 / Page 3 of 22
`
`
`
`U.S. Patent
`
`Jun. 20,2000
`
`Sheet 2 0f 12
`
`6,078,957
`
`TYPICAL GENERAL 200
`PURPOSE COMPUTER/ \
`CLUSTER-MEMBER
`CONFIGURATION
`
`21s
`F
`DISPLAY
`
`M
`
`K201
`K203
`K205
`1/0
`5
`CPU I207
`I209
`
`MEMORY
`
`\
`
`[211
`MFELGSEY
`CARD
`
`K215
`(EJTHER
`
`NITS
`
`V CDROM I217
`DRIvE
`
`219
`
`221
`
`PROGRAM
`
`223
`
`DISK
`STORAGE
`
`7
`
`KEYBOARD
`K225
`
`OTHER
`UNITS
`
`FIG--2
`
`f303
`
`FLASH MEMORY - 300
`CONTENTS \
`CRYPTOGRAPHICALLY f301
`SIGNED KERNEL
`CONFIGURATION
`FILES
`WHERE TO LOG
`ERROR MESSAGES
`AUTHENTICATION
`CERTIFICATE
`
`f305
`
`f307
`
`Ex. 1004 / Page 4 of 22
`
`
`
`U.S. Patent
`
`Jun. 20,2000
`
`Sheet 3 0f 12
`
`6,078,957
`
`400 TYPICALIP
`/ NETWORK
`CLUSTER
`
`107
`
`INTERNET
`
`[-401
`
`4
`
`r03
`
`4
`
`r05
`
`4 7
`
`f0 r“Huang
`
`i
`
`40
`
`UNIT1
`
`UN|T2
`
`
`
`UNITB 411
`
`UNITN
`
`4
`
`._
`
`OTHER
`NETWORK
`
`UNITS
`
`Ex. 1004 / Page 5 of 22
`
`
`
`U.S. Patent
`
`Jun. 20, 2000
`
`Sheet 4 0f 12
`
`6,078,957
`
`GENERAL MEMORY MAP
`TYPICAL IP NETWORK
`CLUSTER MEMBER
`
`500 \
`OPERATING
`SYSTEM KERNEL
`
`f501
`
`503
`f
`
`TCP / IP
`STACK
`CLUSTER MANAGEMENT [505
`ROuTINES
`APPLICATION
`#1
`APPLICATION
`#2
`
`f507
`
`f509
`
`f 511
`
`f513
`
`APPLICATION
`#3
`APPLICATION
`#4
`WORK ASSIGNMENT f515
`TABLE (MASTER)
`THIS uNIT APPLICATION f5 17
`STATE TABLE
`OTHER UNITS
`APPLICATION
`STATE TABLE
`INCOMING
`MSG STORE
`
`f519
`
`f521
`
`DATA HANDLERS - I523
`OTHER UNITS
`
`FIG._5
`
`Ex. 1004 / Page 6 of 22
`
`
`
`U.S. Patent
`
`Jun. 20,2000
`
`Sheet 5 0f 12
`
`6,078,957
`
`601
`
`CLUSTER
`ESTABLISHMENT
`
`START
`
`7
`609
`[60 GOT “EXIT
`REQUEST” / :
`CL'ENT A SUCCESS)
`
`,
`JOINING
`SEND JOIN REQUESTS =
`
`(603
`
`I
`
`611j
`
`MASTER KEEPALIVE
`
`WATCHDOG
`615\
`
`1
`
`MAST
`
`617/‘
`613/"
`OFFER MASTER:
`SEND "OFFER MASTER” PACKETS
`
`621\
`
`619 SUCCESS
`\
`"
`ARPS:
`SENDARPS
`
`T "
`
`642/
`
`623\
`7
`
`"
`MASTER
`
`GOT “EXIT”
`RE UEST”
`O
`
`641
`f V
`
`625 GOT “MASTER”
`KEEPALIVE"
`
`WIN
`TIE BREAKER
`'?
`YES
`
`635
`
`637 \
`
`SEND ”OTHER MASTER EXISTS”
`
`A
`
`FIG__ 6
`
`SEND ARPS
`
`Ex. 1004 / Page 7 of 22
`
`
`
`U.S. Patent
`
`Jun. 20,2000
`
`Sheet 6 0f 12
`
`6,078,957
`
`FIG... 7
`
`FIG._ 7A
`
`j 700
`
`TCP FAILOVER STATE
`
`/ 702
`Initial State (Only need to send once)
`
`Source IP Address + Port
`Destination IP Address + Port
`Maximum Segment Size
`MSS + Options Size
`
`/ 701
`Essential State (Send on each state change)
`
`Flags: No Delay, No Options, Request Window Scaling,
`Receive Window Scaling, Request Timestamp,
`Receive Timestamp, Permit Selective ACK
`Send “Next” Sequence Number
`Window Update Segment Sequence Number
`Window Update Segment Acknowledgement Number
`Send Window
`Receive “Next" Sequence Number
`Receive “Advertized“ Window
`Send Window Scaling
`Receive Window Scaling
`Recent Timestamp Echo Data
`
`Ex. 1004 / Page 8 of 22
`
`
`
`U.S. Patent
`
`Jun. 20, 2000
`
`Sheet 7 0f 12
`
`6,078,957
`
`/ 703
`Calculable State (Don’t Send)
`
`State = ESTABLISHED
`Retransmit Time = None
`Probe Time = Now
`TCP Keepalive Time : Now
`ZMSL‘Fme = None
`Retransmit Time Shift = 0
`Current Retransmit = lnitial Value
`Consecutive Duplicate Acks Received = 0
`Force Output: 0;
`Send “Unacknowledged" Sequence Number: Send “Next” Sequence
`Number
`Send “Urgent Pointer" : Send “Unacknowledged‘ Sequence Number
`Highest Sequence Number Sent : Send “Next” Sequence Number
`Send Initial Segment Sequence Number: 0
`Receive Window : Amount of space left in socket receive buffer
`Receive “Urgent Pointer” : Receive “Next” Sequence Number
`Receive Initial Segment Sequence Number = O
`Congestion Control Window = Initial Value
`Congestion Control Window Linear/Exponential Threshold : Initial Value
`Inactivity Time = 0
`Estimated Round Trip Time = 0
`Sequence Number Being Timed = O
`Smoothed Round Trip Time : Initial Value
`Variance In Round Trip Time = Initial Value
`Minimum Round Trip Time Allowed = initial Value
`Largest Window Offered by Peer = 0
`Out Of Band Data = None
`Send Pending Window Scaling = Send Window Scaling
`Receive Pending Window Scaling : Receive Window Scaling
`Timestamp Echo Data Update Time = 0
`Last Ack Sent Sequence Number: Receive “Next” Sequence Number
`Send Connection Count: 0
`Receive Connection Count = 0;
`Connection Duration = 0;
`Number of Round Trip Time Samples : 0;
`Number of TCP Keepalive Probes = Initial Value
`Interval Between TCP Keepalive Probes = Initial Value
`'Fme Before First TCP Keepalive Probe = Initial Value
`Maximum Idle Time : Initial Value
`
`FIG._ 7B
`
`Ex. 1004 / Page 9 of 22
`
`
`
`U.S. Patent
`
`Jun. 20,2000
`
`Sheet 8 0f 12
`
`6,078,957
`
`Rpm
`
`l/mim.
`
`Sm
`A,
`
`
`
`EmEIwDJO .EXm
`
`
`
`"NEIL-0., 02mm,
`
`new
`
`mom
`
`Ex. 1004 / Page 10 of 22
`
`
`
`U.S. Patent
`
`Jun. 20, 2000
`
`Sheet 9 0f 12
`
`6,078,957
`
`833
`/
`SEND “EXIT
`CLUSTER”
`
`I
`EXIT
`
`834
`
`MASTER GOT
`CLIENT KEEPALIVE
`
`830
`
`THIS
`CLIENT IN MY
`CLUSTER
`'2
`
`FIG._8B
`
`835
`K
`CALCULATE AND STORE
`PACKET LOSS AVERAGE
`(USING SEQUENCE NUMBER
`OF KEEPALIVE AND ADAPTIVE
`KEEPALIVE INTERVAL)
`
`837
`" r
`RESET WATCHDOG
`FOR THIS CLIENT
`
`‘
`
`834
`
`EXIT
`
`850
`
`MASTER
`EVENTS
`
`PERIODIC TIMER (ADAPTIVE
`TO NETWORK PACKET LOSS)
`
`"
`
`K 851
`
`BROADCAST MASTER
`KEEPALIVE CONTAINING
`CLUSTER MEMBER LIST AND
`ADAPTIVE KEEPALIVE
`INTERVAL
`
`FIG._8C
`
`Ex. 1004 / Page 11 of 22
`
`
`
`U.S. Patent
`
`Jun. 20, 2000
`
`Sheet 10 0f 12
`
`6,078,957
`
`FIG 8 D
`. _
`
`PERIODIC TIMER
`(2 SECONDS)
`
`855
`
`K 857
`‘
`CALCULATE LOAD
`DIFFERENCE BETWEEN
`MOST LOADED AND LEAST
`LOADED MEMBER
`
`859
`
`WOULD
`MOVING ONE
`WORK UNIT LEAVE
`THE LEAST LOADED
`MEMBER WITH A LOAD s
`THE MOST LOADED
`MEMBER
`7
`
`865
`‘L
`
`863
`D
`SEND “WORK DE-ASSIGN"
`REQUEST TO MOST LOADED
`MEMBER WITH THE LEAST
`LOADED MEMBER AS TARGET
`
`/ 861
`
`IGNORE
`
`|
`
`v
`EXIT
`
`I
`
`WATCHDOG TIMER FOR )[870
`FIG-_8E C ACLIENT EXPIRES
`
`871
`f
`I
`DELETE CLIENT FROM
`CLUSTER DATA STRUCTURE
`
`FIG 8 H PERIODIC TIMER (ADAPTIVE
`._
`TO NETWORK PACKET LOSS)
`
`910
`
`SEND CLIENT KEEP ALIVE TO
`MASTER CONTAINING
`MONOTONICALLY
`INCREASING SEQUENCE #
`(FOR MEASURING NETWORK
`PACKET LOSS)
`
`‘
`
`Ex. 1004 / Page 12 of 22
`
`
`
`U.S. Patent
`
`Jun. 20, 2000
`
`Sheet 11 0f 12
`
`6,078,957
`
`875
`
`MASTER GETS CLIENT
`JOIN REQUEST
`877 K
`RESPOND WITH
`NAK REASON
`“OPERATION IN
`PROGRESS”
`
`NOTIFY
`APPLICATIONS
`OF JOIN
`REQUEST
`
`881
`
`APPLICATIONS FINISH
`WITH JOIN REQUEST
`
`JOIN
`ALLOWED
`?
`YES
`K887
`
`SEND
`
`ACKNOWLEDGMENT
`
`885 r
`SEND
`NAK
`AND
`REASON
`
`(
`
`V
`
`EXIT
`
`)
`
`WATCHING TIMER FOR 9'20
`MASTER EXPIRES
`
`K922
`‘
`EXIT CLUSTER AND
`START JOINING
`
`EXIT
`
`FIG._8I
`
`CLIENT
`EVENTS
`890
`
`CLIENT GOT
`MASTER KEEPALIVE
`
`UPDATE
`ADAPTIVE
`KEEPALIVE
`NTERvAL
`(CALCULATED
`BY MASTER)
`
`893
`
`HAVE
`WE LOST ANY
`MEM'SERS
`YES
`895 K
`NOTIFY
`APPLICATIONS
`
`897
`
`HAVE
`WE ADDED NEW
`MEMBERS
`?
`
`NOTIFY
`
`RESET
`WATCHDOG
`FOR MASTER
`
`/899
`
`II
`
`EXIT
`
`I
`
`I
`
`FIG._8G
`
`Ex. 1004 / Page 13 of 22
`
`
`
`U.S. Patent
`
`Jun. 20,2000
`
`Sheet 12 0f 12
`
`6,078,957
`
`RECEIVE AN
`IP PACKET
`
`901
`
`IP PACKET
`INPUT HANDLING
`
`CLUSTER
`IP ADDRESS
`
`907
`
`SHOULD
`PACKET BE
`FORWQRDED
`
`'
`
`YES f 913
`
`APPLY FILTER
`TO CLASSIFY
`WORK
`
`APPLY FORWARDING
`FILTER TO CLASSIFY
`WORK
`
`0
`
`915
`
`DO WE
`OWN TH|S
`WORK SET
`7
`
`YES
`917
`
`919 f
`PROCESS
`LOCALLY
`
`/939
`
`N
`
`O 921
`
`ARE
`WE MASTER
`
`NO
`923
`
`YES
`
`926
`
`927
`
`925 f
`DROP
`PACKET
`
`K- 909
`
`PROCEss
`LOCALLY
`
`‘V
`
`945
`
`EXIT
`
`EXIT
`
`I
`
`936
`
`931
`
`YES
`
`935
`
`IS THIS
`WORK sET
`ASSIGNED
`'2
`
`YES
`929
`
`ACCEPTED THIS YES
`WORK SET
`933
`v
`
`928
`
`NO
`937
`F
`ASSIGN WORK SET
`_ TO LEAST LOADED
`MEMBER
`
`940 NO
`941
`F
`SAVE PACKET UNTIL
`WORK SETACCEPTED
`
`" r 943
`
`FORWARD PACKET
`TO MEMBER
`
`945
`
`‘
`EXIT
`
`COULD
`MEMBER SEE
`PAQKET
`-
`
`N0
`
`FIG._ 9
`
`Ex. 1004 / Page 14 of 22
`
`
`
`1
`METHOD AND APPARATUS FOR A TCP/IP
`LOAD BALANCING AND FAILOVER
`PROCESS IN AN INTERNET PROTOCOL (IP)
`NETWORK CLUSTERING SYSTEM
`
`CROSS-REFERENCE TO RELATED
`APPLICATIONS
`
`This application is related to application Ser. No. 09/196,
`941 entitled “Method and Apparatus for an Internet Protocol
`(IP) NetWork Clustering System,” ?led Nov. 20, 1998, still
`pending.
`
`TECHNICAL FIELD
`
`This invention relates to the ?eld of Computer Systems in
`the general NetWork Communications sector. More
`speci?cally, the invention is a method and apparatus for a
`TCP/IP Load Balancing & Failover system in an Internet
`Protocol (IP) NetWork clustering system.
`
`10
`
`15
`
`BACKGROUND ART
`
`As more and more businesses develop electronic com
`merce applications using the Internet in order to market and
`to manage the ordering and delivery of their products, these
`businesses are searching for cost-effective Internet links that
`provide both security and high availability. Such mission
`critical applications need to run all day, every day With the
`netWork components being highly reliable and easily scal
`able as the message traf?c groWs. National carriers and local
`Internet Service Providers (ISPs) are noW offering Virtual
`Private NetWorks (VPN)—enhanced Internet-based back
`bones tying together corporate Workgroups on far-?ung
`Local Area NetWorks (LANs)—as the solution to these
`requirements. Similarly other companies are offering clus
`tered systems as a means to provide netWork scalability, and
`adaptive load balancing innetWork systems. HoWever, TCP/
`IP based systems present problems of message traf?c loss
`and netWork session reinitiation if ef?cient failover pro
`cesses are not implemented.
`A number of companies have recently announced current
`or proposed VPN products and/or systems Which variously
`support IPSec, IKE (ISAKMP/Oakley) encryption-key
`management, as Well as draft protocols for Point-to-Point
`Tunneling protocol (PPTP), and Layer 2 Tunneling protocol
`(L2TP) in order to provide secure traffic to users. Some of
`these products include IBM’s NWays Multiprotocol Routing
`ServicesTM 2.2, Bay Networks OptivityTM and CentillionTM
`products, Ascend Communication’s MultiVPNTM package,
`Digital Equipment’s ADI VPN product family, and Indus
`River’s RiverWorksTM VPN planned products. HoWever,
`none of these products are knoWn to offer capabilities Which
`minimiZes delay and session by a controlled fail-over pro
`cess.
`These VPNs place enormous demands on the enterprise
`netWork infrastructure. Single points of failure components
`such as gateWays, ?reWalls, tunnel servers and other choke
`points that need to be made highly reliable and scaleable are
`being addressed With redundant equipment such as “hot
`standbys” and various types of clustering systems.
`For eXample, CISCOTM Inc. noW offers a neW product
`called LocalDirectorTM Which functions as a front-end to a
`group of servers, dynamically load balances TCP traf?c
`betWeen servers to ensure timely access and response to
`requests. The LocalDirector provides the appearance, to end
`users, of a “virtual” server. For purposes of providing
`continuous access if the LocalDirector fails, users are
`
`45
`
`55
`
`65
`
`6,078,957
`
`2
`required to purchase a redundant LocalDirector system
`Which is directly attached to the primary unit, the redundant
`unit acting as a “hot” standby. The standby unit does no
`processing Work itself until the master unit fails. The
`standby unit uses the failover IP address and the secondary
`Media Access Control (MAC) address (Which are the same
`as the primary unit), thus no Address Resolution Protocol
`(ARP) is required to sWitch to the standby unit. HoWever,
`because the standby unit does not keep state information on
`each connection, all active connections are dropped and
`must be re-established by the clients. Moreover, because the
`“hot standby” does no concurrent processing it offers no
`processing load relief nor scaling ability.
`ValenceTM Research Inc. (recently purchased by
`Microsoft® Corporation) offers a softWare product called
`Convoy ClusterTM (Convoy). Convoy installs as a standard
`WindoWs NT netWorking driver and runs on an eXisting
`LAN. It operates in a transparent manner to both server
`applications and TCP/IP clients. These clients can access the
`cluster as if it is a single computer by using one IP address.
`Convoy automatically balances the netWorking traf?c
`betWeen the clustered computers and can rebalance the load
`Whenever a cluster member comes on-line or goes off-line.
`HoWever this system appears to use a compute intensive and
`memory Wasteful method for determining Which message
`type is to be processed by Which cluster member in that the
`message source port address and destination port address
`combination is used as an indeX key Which must be stored
`and compared against the similar combination of each
`incoming message to determine Which member is to process
`the message. Moreover, this system does not do failover.
`There is a need in the art for an IP netWork cluster system
`Which can easily scale to handle the exploding bandWidth
`requirements of users. There is a further need to maXimiZe
`netWork availability, reliability and performance in terms of
`throughput, delay and packet loss by making the cluster
`overhead as ef?cient as possible, because more and more
`people are getting on the Internet and staying on it longer. A
`still further need eXists to provide a reliable failover system
`for TCP based systems by ef?ciently saving the state infor
`mation on all connections so as to minimiZe packet loss and
`the need for reconnections.
`Computer cluster systems including “single-system
`image” clusters are knoWn in the art. See for eXample,
`“Scalable Parallel Computing” by Kai HWang & ZhiWei Xu,
`McGraW-Hill, 1998, ISBN 0-07-031798-4, Chapters 9 & 10,
`Pages 453—564, Which are hereby incorporated fully herein
`by reference. Various Commercial Cluster System products
`are described therein, including DEC’s TruClustersTM
`system, IBM’s SPTM system, Microsoft’s WolfpackTM sys
`tem and The Berkeley NOW Project. None of these systems
`are knoWn to provide ef?cient IP NetWork cluster capability
`along With combined scalability, load-balancing and con
`trolled TCP fail-over.
`
`SUMMARY OF THE INVENTION
`
`The present invention overcomes the disadvantages of the
`above described systems by providing an economical, high
`performance, adaptable system and method for netWork
`traffic monitoring and controlled failover in an IP NetWork
`cluster.
`The present invention is a method and apparatus for
`monitoring packet loss activity in an Internet Protocol (IP)
`netWork clustering system Which can provide a useful,
`discrete and tangible mechanism for controlled failover of
`the TCP/IP netWork cluster system. An adaptive interval
`
`Ex. 1004 / Page 15 of 22
`
`
`
`6,078,957
`
`3
`value is determined as a function of the average packet loss
`in the system, and this adaptive interval value used to
`determine When a cluster member must send a neXt kee
`palive message to all other cluster members, and Wherein the
`keepalive message is used to determine network packet loss.
`The present invention includes a cluster apparatus com
`prising a plurality of cluster members, With all cluster
`members having the same internet machine name and IP
`address, and each member having a general purpose
`processor, a memory unit, a program in the memory unit, a
`display and an input/output unit; and the apparatus having a
`?rst mechanism Which determines an adaptive interval value
`representing a time interval as a function of average packet
`loss in the system and a second mechanism for using the
`adaptive interval value to determine When a cluster member
`is to send a neXt keepalive message Wherein the keepalive
`message is used to determine netWork packet loss.
`The present invention further includes a method for
`monitoring netWork activity and for controlled failover in a
`TCP/IP Network cluster, the cluster having a plurality of
`cluster members, each cluster member having a processor, a
`memory, an input/output section, a display screen, and
`programs in the memory for operating the cluster member,
`the method comprising steps of determining an adaptive
`interval value representing a time interval as a function of
`average packet loss in the cluster, and using the adaptive
`interval value to determine When a cluster member is to send
`a neXt keepalive message, the keepalive message being used
`to determine a netWork packet loss estimate.
`Other embodiments of the present invention Will become
`readily apparent to those skilled in these arts from the
`folloWing detailed description, Wherein is shoWn and
`described only the embodiments of the invention by Way of
`illustration of the best mode knoWn at this time for carrying
`out the invention. The invention is capable of other and
`different embodiments some of Which may be described for
`illustrative purposes, and several of the details are capable of
`modi?cation in various obvious respects, all Without depart
`ing from the spirit and scope of the present invention.
`
`15
`
`25
`
`4
`tion for purposes of explanation, speci?c data and con?gu
`rations are set forth in order to provide a thorough under
`standing of the present invention. In the presently preferred
`embodiment the IP NetWork cluster is described in terms of
`a VPN tunnel-server cluster. HoWever, it Will be apparent to
`one skilled in these arts that the present invention may be
`practiced Without the speci?c details, in various applications
`such as a ?reWall cluster, a gateWay or router cluster, etc. In
`other instances, Well-knoWn systems and protocols are
`shoWn and described in diagrammatical or block diagram
`form in order not to obscure the present invention unnec
`essarily.
`
`OPERATING ENVIRONMENT
`The environment in Which the present invention is used
`encompasses the general distributed computing scene Which
`includes generally local area netWorks With hubs, routers,
`gateWays, tunnel-servers, applications servers, etc. con
`nected to other clients and other netWorks via the Internet,
`Wherein programs and data are made available by various
`members of the system for eXecution and access by other
`members of the system. Some of the elements of a typical
`internet netWork con?guration are shoWn in FIG. 1, Wherein
`a number of client machines 105 possibly in a branch office
`of an enterprise, are shoWn connected to a GateWay/hub/
`tunnel-server/etc. 106 Which is itself connected to the inter
`net 107 via some internet service provider (ISP) connection
`108. Also shoWn are other possible clients 101, 103 similarly
`connected to the internet 107 via an ISP connection 104,
`With these units communicating to possibly a home of?ce via
`an ISP connection 109 to a gateWay/tunnel-server 110 Which
`is connected 111 to various enterprise application servers
`112, 113, 114 Which could be connected through another
`hub/router 115 to various local clients 116, 117, 118.
`The present IP NetWork cluster is made up of a number of
`general purpose computer units each of Which includes
`generally the elements shoWn in FIG. 2, Wherein the general
`purpose system 201 includes a motherboard 203 having
`thereon an input/output (“I/O”) section 205, one or more
`central processing units (“CPU”) 207, and a memory section
`209 Which may have a ?ash memory card 211 related to it.
`The U0 section 205 is connected to a keyboard 226, other
`similar general purpose computer units 225, 215, a disk
`storage unit 223 and a CD-ROM drive unit 217. The
`CD-ROM drive unit 217 can read a CD-ROM medium 219
`Which typically contains programs 221 and other data. Logic
`circuits or other components of these programmed comput
`ers Will perform series of speci?cally identi?ed operations
`dictated by computer programs as described more fully
`beloW.
`Flash memory units typically contain additional data used
`for various purposes in such computer systems. In the
`preferred embodiment of the present invention, the ?ash
`memory card is used to contain certain unit “personality”
`information Which is shoWn in FIG. 3. Generally the ?ash
`card used in the current embodiment contains the folloWing
`type of information:
`Cryptographically signed kernel—(301)
`Con?guration ?les (such as cluster name, speci?c unit IP
`address, cluster address, routing information con?guration,
`etc.)—(303)
`Pointer to error message logs—(305)
`Authentication certi?cate—(307).
`Security policies (for eXample, encryption needed or not,
`etc.)—(309)
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`The features and advantages of the system and method of
`the present invention Will be apparent from the folloWing
`description in Which:
`FIG. 1 illustrates a typical Internet netWork con?guration.
`FIG. 2 illustrates a representative general purpose
`computer/cluster-member con?guration.
`FIG. 3 illustrates a representative memory map of data
`contained on a related Flash Memory card.
`FIG. 4 illustrates a typical IP NetWork cluster
`FIG. 5 illustrates a general memory map of the preferred
`embodiment of a cluster member acting as a tunnel-server.
`FIG. 6 illustrates a ?oW-chart of the general operation of
`the cluster indicating the cluster establishment process.
`FIG. 7 illustrates an exemplary TCP state data structure.
`FIGS. 8A—8I illustrate ?oW-charts depicting the events
`Which the master processes and the events Which the non
`master cluster members (clients) must process.
`FIGS. 9 illustrates a ?oW-chart depicting the normal
`packet handling process after establishing the cluster.
`
`45
`
`55
`
`BEST MODE FOR CARRYING OUT THE
`INVENTION
`Amethod and apparatus for operating an Internet Protocol
`(IP) NetWork cluster is disclosed. In the folloWing descrip
`
`65
`
`THE INVENTION
`The present invention is an Internet Protocol (IP) clus
`tering system Which can provide a highly scalable system
`
`Ex. 1004 / Page 16 of 22
`
`
`
`6,078,957
`
`5
`Which optimiZes-throughput by adaptively load balancing its
`components, and Which minimizes delay and session loss by
`a controlled fail-over process. A typical IP cluster system of
`the preferred embodiment is shoWn in FIG. 4 Wherein the
`internet 107 is shoWn connected to a typical IP cluster 401
`Which contains programmed general purpose computer units
`403, 405, 407, 409 Which act as protocol stack processors for
`message packets received. The IP cluster 401 is typically
`connected to application servers or other similar type units
`411 in the netWork. In this ?gure it is shoWn that there may
`be any number of units in a cluster (e.g. member “n” 409)
`hoWever for purposes of further illustration the cluster Will
`be depicted as having three units, understanding that the
`cluster of the present invention is not limited to only three
`units. Also for purposes of illustration the preferred embodi
`ment Will be described as a cluster Whose applications may
`be VPN tunnel protocols hoWever it should be understood
`that this cluster invention may be used as a cluster Whose
`application is to act as a FireWall, or to act as a gateWay, or
`to act as a security device, etc.
`In the preferred embodiment of the present invention,
`each of the cluster members is a computer system having an
`Intel motherboard, tWo Intel PentiumTM processors, a 64
`megabyte memory and tWo Intel Ethernet controllers, and
`tWo HiFn cryptographic processors. The functions per
`formed by each processor are generally shoWn by reference
`to the general memory map of each processor as depicted in
`FIG. 5. Each cluster member has an Operating System
`kernel 501, TCP/IP stack routines 503 and various cluster
`management routines (described in more detail beloW) 505,
`program code for processing application #1 507, Which in
`the preferred embodiment is code for processing the IPSec
`protocol, program code for processing application #2 509,
`Which in the preferred embodiment is code for processing
`the PPTP protocol, program code for processing application
`#3 511, Which in the preferred embodiment is code for
`processing the L2TP protocol, and program code for pro
`cessing application #4 513, Which in the preferred embodi
`ment is code space for processing an additional protocol
`such as perhaps a “Mobile IP” protocol. Detailed informa
`tion on these protocols can be found through the home page
`of the IETF at “http://WWW.ietf.org”. The folloWing speci?c
`protocol descriptions are hereby incorporated fully herein by
`reference:
`“Point-to-Point Tunneling Protocol—PPTP”, Glen Zorn,
`G. Pall, K. HamZeh, W. Verthein, J. Taarud, W. Little, Jul.
`28, 1998;
`“Layer TWo Tunneling Protocol”, Allan Rubens, William
`Palter, T. Kolar, G. Pall, M. LittleWood, A. Valencia, K.
`HamZeh, W. Verthein, J. Taarud, W. Mark ToWnsley, May
`22, 1998;
`Kent, S., Atkinson, R., “IP Authentication Header,” draft
`ietf-ipsec-auth-header-07.tXt.
`Kent, S., Atkinson, R., “Security Architecture for the
`Internet Protocol,” draft-ietf-ipsec-arch-sec-07.tXt.
`Kent, S., Atkinson, R., “IP Encapsulating Security Pay
`load (ESP),” draft-ietf-ipsec-esp-v2-06.tXt.
`Pereira, R., Adams, R., “The ESP CBC-Mode Cipher
`Algorithms,” draft-ietf-ipsec-ciph-cbc-04.tXt.
`Glenn, R., Kent, S., “The NULL Encryption Algorithm
`and Its Use With IPsec,” draft-ietf-ipsec-ciph-null-O1.tXt.
`Madson, C., DorasWamy, N., “The ESP DES-CBC Cipher
`Algorithm With EXplicit IV,” draft-ietf-ipsec-ciph-des
`eXpiv-02.tXt.
`Madson, C., Glenn, R., “The Use of HMAC-MD5 Within
`ESP and AH,” draft-ietf-ipsec-auth-hmac-md5-96-03.tXt.
`
`1O
`
`15
`
`25
`
`35
`
`45
`
`55
`
`65
`
`6
`Madson, C., Glenn, R., “The Use of HMAC-SHA-1-96
`Within ESP and AH,” draft-ietf-ipsec-auth-hmac-sha196
`03.tXt.
`Harkins, D., Carrel, D., “The Internet Key EXchange
`(IKE),” draft-ietf-ipsec-isakmp-oakley-08.tXt.
`Maughan, D., Schertler, M., Schneider, M., and Turner, J .,
`“Internet Security Association and Key Management Pro
`tocol (ISAKMP),” draft-ietf-ipsec-isakmp-10.{ps,tXt}.
`H. K. Orman, “The OAKLEY Key Determination
`Protocol,” draft-ietf-ipsec-oakley-02.tXt.
`Piper, D. “The Internet IP Security Domain of Interpre
`tation for ISAKMP,” draft-ietf-ipsec-ipsec-doi-10.tXt.
`Tunneling protocols such as the Point-to-Point Tunneling
`Protocol (PPTP) and Layer 2 Tunneling Protocol (L2TP)
`although currently only “drft” standards, are eXpected to be
`con?rmed as of?cial standards by the Internet Engineering
`Task Force (IETF) in the very near future, and these proto
`cols together With the Internet Security Protocol (IPSec),
`provide the basis for