throbber
United States Patent [191
`Attanasio et a1.
`
`USOO5371852A
`[11] Patent Number:
`[45] Date of Patent:
`
`5,371,852
`Dec. 6, 1994
`
`[54] METHOD AND APPARATUS FOR MAKING
`A CLUSTER OF COMPUTERS APPEAR AS A
`SINGLE HOST ON A NETWORK
`[75] Inventors: Clement R. Attanasio, Peekskill;
`Stephen E. Smith, Mahopac, both of
`NY.
`[73] Assignee: International Business Machines
`Corporation, Armonk, NY.
`[21] Appl. No.: 960,742
`[22] Filed:
`Oct.14,1992
`[51] Int. 01.5 ....................... .. G06F 13/00
`[52] US. Cl. ........................ .. 395/200;370/85.13
`[58] Field of Search ................ .. 395/200, 500; 370/60,
`370/92, 93, 94.1, 54, 85.13, 85.1, 85.6, 85.8,
`60.1, 110.1, 85.11, 95.1; 364/2844, 242.94,
`284.3, 284
`
`[56]
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`4,276,643 6/ 1981 Laprie et a1. ......................... .. 371/8
`4,667,287 5/1987 Allen et a1.
`. 364/200
`4,719,621 1/1988 May ......................... .. 370/85
`4,958,273 9/1990 Anderson et a1. .
`364/200
`5,023,780 6/1991 Brearley
`364/200
`5,079,765 1/1992 Nakamura ..
`370/85.13
`
`5,088,032 2/1992 Bosack . . . . . . . . . . .
`
`5,093,920 3/1992 Agrawal etal.
`5,109,515 4/1992 Laggis et a1.
`
`5,125,081 6/1992 Chiba . . . . . . . . .
`
`5,166,931 11/1992 Riddle
`
`. . . .. 395/200
`
`395/800
`395/725
`
`. . . .. 395/325
`
`._ 370/941
`
`5,185,860 2/1993 Wu . . . . . . . . . . . . .
`
`. . . .. 395/200
`
`5,224,205 6/1993 Dinkin etal. ..................... .. 395/200
`
`FOREIGN PATENT DOCUMENTS
`
`1002342 1/ 1991 Belgium .
`59-117842 7/1984 Japan .
`63-193739 8/ 1988 Japan .
`
`OTHER PUBLICATIONS
`D. E. Comer, “Internetworking with TCP/IP, Princi
`ples, Protocols and Architecture”, Prentice Hall, US,
`Chapter 7, pp. 91-97; Chapter 11, pp. 159-169; and
`Chapter 12, pp. 171-203.
`v
`“Network Services”, Network Programming, Sun Mi
`
`crosystems, Inc., 2550 Garcia Ave., Mountain View,
`Ca1if., Chapter 1, pp. 3-30, May 1988.
`OSF DCE Release 1.0 S3, DCE Release Notes, Open
`Software Foundation, 11 Cambridge Center, Cam
`bridge, Mass., Mar. 25, 1991, pp. i-v, 1-1 thru B-2.
`J. K. Ousterhout et al., “The Sprite Network Operating
`System”, IEEE Computer, Feb. 1988, US, pp. 23-36.
`D. R. Cheriton, “The V Distributed System”, Commu
`nication of the ACM, Mar. 1988, vol. 31, No. 3, pp.
`314-333.
`A. Bhide et al., “A Highly Available Network File
`Server”, USENIX Conference, Winter 1991, Dalas,
`Tex., p. 199.
`S. J. Mullender et al., “Amoeba A distributed Operating
`System for the 1990s”, IEEE Computer, May 1990, US,
`pp. 44-53.
`L. Peterson et al., “The X-kernel: A Platform for Ac
`cessing Internet Resources”, IEEE Computer, May
`1990, US, pp. 23-33.
`A. Litman, “The DUNIX Distributed Operating Sys
`tem”, Operating Systems Review, ACM Press, NY,
`vol. 22, No. 1, Jan. 1988, pp. 42-51.
`Primary Examiner—Dale M. Shaw
`Assistant Examiner—Moustafa M. Meky
`Attorney, Agent, or Firm-Louis J. Percello
`[57]
`ABSTRACT
`The present invention provides a method and apparatus
`for enabling a cluster of computers to appear as a single
`computer to host computers outside the cluster. A host
`computer communicates only with a gateway to access
`destination nodes and processes within the cluster. The
`gateway has at least one message switch which pro
`cesses incoming and outgoing port type messages cross
`ing the cluster boundary. This processing comprises
`examining certain information on the message headers
`and then changing some of this header information
`either to route an incoming message to the proper com
`puter node, port and process or to make an outgoing
`message appear as if originated at the gateway node.
`The message switch uses a table to match incoming
`messages to a particular routing function which can be
`run to perform the changes necessary to correctly route
`different kinds of messages.
`
`35 Claims, 13 Drawing Sheets
`
`NETWORK 120
`
`200
`\ ENUPS JED CLUSFER
`
`I05
`
`NODE
`
`110
`INTERCONNECT
`
`\07
`
`NODE
`
`Z10
`
`WEE
`
`125
`
`5g EXTERNAL INI'LRHEI
`
`
`GATEWAY 109
`I85 i
`
`48
`
`E
`
`Google Ex. 1123, pg. 1
`
`

`

`US. Patent
`
`Dec. 6, 1994
`
`Sheet 1 of 13
`
`5,371,852 ’
`
`NODE 2
`106
`
`1100a 3
`107
`
`NODE 2
`106
`
`NODE 3
`107
`
`10p
`
`I
`
`110
`
`NODE 1
`105
`
`NODE N
`108
`
`G
`109
`
`FIG. 1 A
`PRIOR ART
`
`125
`i
`I
`:127 120
`g
`l
`I
`
`NEfWORK
`H
`
`150
`
`H 130
`I
`I
`
`H
`
`130
`
`100
`
`l
`
`110
`
`NODE 1
`105 __,_1_2Z_|
`
`NEI'WORK 1
`H
`
`120
`
`130
`
`‘
`
`G
`109
`
`;127 12ONE|W£LRK 2
`' j
`130
`NODE N ""155
`l
`108
`127
`H 130
`
`120
`
`NEIWORK q
`
`Fl(3.1 B
`PRIOR ART
`
`Google Ex. 1123, pg. 2
`
`

`

`US. Patent
`
`Dec. 6, 1994
`
`Sheet 2 of 13
`
`5,371,852
`
`H
`
`130
`120
`
`120
`?r-
`
`127
`
`G1
`
`109
`
`NODE 2
`106
`
`MODE 3
`107
`
`‘
`
`1
`
`110
`
`120
`1‘ r
`
`N005 1
`105
`
`GP
`
`127
`
`127
`109
`_L_s_.
`127
`12° 4- 120
`120
`130
`
`H
`
`F|G.1 C
`PRIOR ART
`
`‘27
`
`H 130
`
`127
`
`G2
`E09
`127
`
`NODE N
`108
`
`120
`
`120
`q
`
`120
`
`Google Ex. 1123, pg. 3
`
`

`

`US. Patent
`
`Dec. 6, 1994
`
`Sheet 3 0f 13
`
`5,371,852
`
`Li
`Z
`g
`NEIWORK 120
`\ E Q
`g
`|._
`E5
`
`125
`\_
`
`2&0
`\ ENCAPS -TED CLUSTER
`105
`/
`NOD:
`
`GATEWAY 109
`
`REMOTE
`HOSTS
`
`'
`
`106
`
`/
`NODE
`
`/107
`
`MODE
`
`INTERCONNECT
`
`NODE
`
`MESSAGE
`SWITCH
`240
`
`ROUTING
`FUNCTION
`25o
`
`FIG. 2
`
`130
`
`PORTS if
`230 '1
`“210
`\~127
`
`23o
`
`130
`
`\
`22o
`
`230
`
`1
`21°
`
`130
`
`Google Ex. 1123, pg. 4
`
`

`

`US. Patent
`
`Dec. 6, 1994
`
`Sheet 4 of 13
`
`5,371,852
`
`CONCEPTUAL LAYERING
`APPLICATION
`
`308
`/
`
`PORT T0 PORT (PP)
`MACHINE To MACH'NE (MM)
`NEIWORK INTERFACE
`
`/306
`I
`G . 3A
`/so4
`PRIOR ART-PROTOCOL LAYERS
`'
`
`301 /
`
`FIG-35
`PRIOR ART -PROTOCOL HEADERS
`
`s37 /
`330
`\ APPLICATION DATA
`\
`
`336 \. PP HEADER PP DATAGRAM DATA AREA
`324 \
`/—325
`MM HEADER
`MM DATA AREA
`
`’
`
`312
`
`FRAME HEADER
`
`344
`
`341
`w
`24
`1s 19
`4 \ 8
`\ o
`VERS HLEN SERVICEIYPE
`TOTAL LENGTH
`IDENTIFICATION
`FLAGS FRAGMENT OFFSET
`‘ME TO LIVE PROTOCOL
`HEADER CHECKSUM
`/l‘souRcE IP ADDRESS
`341/ DESTINATION IP ADDRESS
`IP OPTIONS (IF ANY)
`
`/
`/ FRAME DATA AREA
`320’
`310/
`
`342
`31-)
`
`34a
`/
`\
`349
`
`pADmNG
`
`346
`
`DATA
`
`F|G.3C PRIOR ART-INTERNET PROTOCOL (IP) HEADER
`
`Google Ex. 1123, pg. 5
`
`

`

`US. Patent
`
`Dec. 6, 1994
`
`Sheet 5 of 13
`
`5,371,852
`
`353
`0
`356 ‘WP SOURCE PORT
`UDP MESSAGE LENGTH
`
`1s
`
`Egg
`
`3E2
`31/
`UDP DESTINATION PORT _\
`UDP CHECKSUM
`354
`
`357
`
`DATA
`
`FIG.3D PRIOR ART—UDP DATAGRAM
`
`3_s_o
`/ > 362
`363
`31/
`24
`10
`p \ 4
`DESTINATION PORT 'J\
`\souRcE PORT
`SEQUENCE NUMBER
`364
`
`1s
`
`3664
`
`ACKNOWLEDGEMENT NUMBER
`HLEN RESERVED CODE Brrs
`wmnow
`CHECKSUM
`URGENT POINTER
`
`_
`’
`367*
`
`OPTIONS(|F ANY)
`DATA
`
`PADDING
`
`FIG. 3E PRIOR ART-TCP DATAGRAM
`
`Google Ex. 1123, pg. 6
`
`

`

`US. Patent
`
`Dec. 6, 1994
`
`Sheet 6 of 13
`
`5,371,852
`
`T
`
`1
`
`5\
`
`2 H
`
`m T. N f_
`
`1
`
`W O 8 5
`
`E E _ Y m ADU O 0 M 0 2
`
`?/ m m/
`W 0 /m f_ 2_ 0 _ f_
`m w , N
`\ / a
`
`4 NE
`/ n~ _
`
`1 N
`
`4. _
`
`H q C s
`
`N\
`
`E 0 0 T8 3 3
`
`WW 2
`OT 1 1 0
`
`RH * * \1
`
`w W _| .
`
`_..l._ S E 3 mm
`M 4 R
`L E R N G C M 8 m H H mm P N
`$ 2M \ 5a W E 1|ll\ 1h‘ 0
`
`w P N ‘I
`
`110
`INTERCONNECT
`
`NODE 1
`
`PORT
`
`105
`
`NODE 2
`
`PORT
`
`106
`
`Google Ex. 1123, pg. 7
`
`

`

`US. Patent
`
`Dec. 6, 1994
`
`Sheet 7 of 13
`
`5,371,852
`
`TOP
`
`WAIT FOR MM MESSAGE
`505
`
`READ DESTINATTON ADDRESS (DADDR)
`IN MM HEADER
`510
`
`SEE FIGURE 6
`
`IS DADDR =
`CLUSTER ADDRESS
`515
`
`READ PROTOCOL TYPE (PROTO)
`520
`
`NO
`
`IS PRO-TO A
`PORT TYPE PROTOCOL
`525
`
`YES
`
`PROCESS MESSAGE
`IN GATEWAY IN
`“NORMAL” MANNER
`
`LOCATE AND READ THE DESTINATION PORT
`(DPORT) IN PP HEADER
`530
`
`(GO TO TOP )
`
`SEARCH MESSAGE SWTTCH TABLE FOR
`ENTRY MATCHING DPORT,PROTO PAIR535
`
`Google Ex. 1123, pg. 8
`
`

`

`Patent
`
`Dec. 6, 1994
`
`FIG.5B
`
`
`
`Sheet 8 of 13 FOUND MATCH ?
`
`540
`
`-
`
`NO
`
`NON-SPECIFIC
`PORT ?
`545
`
`NODLADDR=GATEWAY
`NODE___PORT=DPORT
`555
`
`ROUTING FUNCTION ?
`
`YES
`
`COMPUTE NODE ADDR
`FROM DPORT
`NODE__PORT=DPORT
`
`INVOKE ROUTING
`FUNCTION WHICH
`COMPUTES NODE__ADDR
`AND NODE__PORT 565
`
`NODE_ADDR=" NODE FIELD'
`NODE___PORT=DPORT
`560
`
`MODIFY DATAGRAM DESTINATION
`PORT FIELD TO NODE_PORT
`570
`
`I
`MODIFY DATAGRAM DESTINATION
`ADDRESS FIELD TO NODE ADDR
`575
`I
`SEND DATAGRAM TO NODE
`USING INTERCONNECT
`
`FIG.
`5A
`
`FIG.
`58
`
`FIG.5
`
`Google Ex. 1123, pg. 9
`
`

`

`US. Patent
`
`Dec. 6, 1994
`
`Sheet 9 0f 13
`
`5,371,852
`
`TOP:
`
`WAIT FOR MM MESSAGE
`605
`
`FIG'G
`
`READ DESTINATION ADDRESS (DADDR)
`IN MM HEADER
`
`610
`
`READ SOURCE ADDRESS(SADDR) N0
`IN MM HEADER
`
`620
`
`IS DADDR=
`CLUSTER ADDRESS
`61 5
`
`NO
`
`IS DADDR
`OUTSIDE CLUSTER
`AND SADDR
`INSIDE CLUSTER ?
`625
`
`YES (OUTBOUND)
`
`CHANGE SOURCE ADDRESS
`IN HEADER 0F OUTBOUND
`MESSAGE TO CLUSTER ADDRESS
`630
`
`FOWARD DATAGRAM TO DESTINATION
`640
`
`I GO TO TOP I
`
`Google Ex. 1123, pg. 10
`
`

`

`US. Patent
`
`Dec. 6, 1994
`
`Sheet 10 of 13
`
`5,371,852
`
`700
`\ ——~ 337
`
`IP
`
`UDP
`
`/PROJECTS
`
`- -
`
`s24 / 33s /
`
`\715
`
`CLUSTER EXPORT TABLE
`
`720
`
`FILESYSTEM NAME
`
`NODE
`
`/CONTRACTS
`
`/PROJECTS
`722/ /PROPOSAL
`
`725
`
`2
`
`3
`
`'
`
`73o
`
`MOUNT PORT TABLE
`
`NODE
`
`PORT N0.
`
`1
`
`2
`
`3
`
`4
`
`722
`
`820
`
`640 ‘\
`710
`735
`
`FIG 7 -
`
`Google Ex. 1123, pg. 11
`
`

`

`US. Patent ‘
`
`1111.6, 1994
`
`Sheet 11 01 13
`
`5,371,852
`
`LOCATE AND READ THE FILESYSTEM NAME,
`FSNJN THE MOUNT REQUEST MESSAGE
`805
`
`SEARCH CLUSTER EXPORT TABLE FOR ENTRY
`WHICH MATCHES FSN OR CONTAINS FSN
`810
`
`GT URN RETCODE=NOT OK
`820
`
`FOUND MATCHING
`ENTRY ?
`815
`
`YES
`
`READ NODE.N.FROM EXPORT TABLE
`825
`
`SEARCH MOUNT PORT TABLE FOR ENTRY MATCHING
`NODE AND READ MOUNT PORT NUMBER.P
`
`830
`
`FIG.8
`
`SET FUNCTION RETURN VARIABLES
`NODE_ADDR=N
`NODLPORT=P
`840
`
`850
`(RETURN REICODE=O© \
`
`Google Ex. 1123, pg. 12
`
`

`

`US. Patent
`
`Dec. 6, 1994
`
`Sheet 12 of 13
`
`5,371,852
`
`LOCATE AND READ NFS FILEHANDLE,FH,
`IN THE NFS REQUEST MESSAGE
`920
`
`LOCATE AND READ NODE_|D,N,|N NFS FILEHANDLE
`935
`
`SET FUNCTION RETURN VARIABLES
`NODE_ADDR=N
`NODE_PORT=2049
`
`940
`
`(RETURN RETCODE=OKD
`
`_’337
`
`9D0—-—H IP
`
`3247
`
`UDP
`
`\
`
`336
`
`NFS FILEHANDLE 0 o o
`
`\
`
`915
`
`NFS FILEHANDLE DATA
`
`NODE__|D N
`
`\925
`FIG.9 B
`
`\—92o
`
`Google Ex. 1123, pg. 13
`
`

`

`US. Patent
`
`Dec. 6, 1994
`
`Sheet 13 of 13
`
`5,371,852
`
`00
`2-
`
`CLUSTER GATEWAY
`
`109
`
`REMOTE
`125
`HOSTS
`\
`9.2.43.5
`1010
`: - 1019
`513
`:
`130
`1
`1
`1
`412
`9.24308
`\MEssAsE swrrcH TABLE 414 ‘"6 )3 :
`‘PORT PR0T0"'N0DE£FUNCT!0N-._418
`} "' ‘012%.,
`1004-—\
`1/006 )00
`:
`s13
`TOP
`1_ inc'onn
`:
`14002
`i
`2049
`UDP
`{
`
`MESSAGE SWITCH
`
`400
`
`0
`
`0
`
`LNFS
`
`E____
`REMoTE 1.00m ROUTING FUNC'ITON
`1020
`CLUSTER CONNECTION TABLE / —
`s__uddr
`orl node_£_ 1026
`1024
`
`1022 9.2.43.8 1022
`9.2.43.5 1019
`
`2
`1
`
`MODE 1
`
`r_login 513
`105
`
`NODE 2
`
`110
`C -
`513 1;_ login
`106
`
`\ 1040/’
`1030
`
`‘20
`
`FIG.1O
`
`Google Ex. 1123, pg. 14
`
`

`

`1
`
`METHOD AND APPARATUS FOR MAKING A
`CLUSTER OF COMPUTERS APPEAR AS A
`SINGLE HOST ON A NETWORK
`
`5,371,852
`2
`boundary 127, and which ?nally enters the cluster 100
`destined for one node (called a destination node) within
`the cluster 100, is called an incoming message. Like
`wise, a message which originates from a node (called a
`5 source node) within the cluster 100 and crosses the
`boundary 125 destined for a host 130 on the network
`outside the cluster is called an outgoing message. A
`message from a source node within the cluster 100 to a
`destination also within the cluster 100 is called an inter
`nal message.
`The prior art includes clusters 100 which connect to
`a network 120 through one of the computer nodes in the
`cluster. This computer, which connects the cluster to
`the network at the boundary 125, is called a gatewaY
`109. In loosely-coupled systems, gateways 109 process
`the incoming and outgoing messages. A gateway 109
`directs or routes messages to (or from) the correct node
`in the cluster. Internal messages do not interact with the
`gateway as such.
`FIG. 1B shows a prior art cluster 100, as shown in
`FIG. 1A, with the gateway 109 connected to a plurality
`(of number q) of networks 120. In this con?guration,
`each network 120 has a connection 127 to the gateway
`109. A cluster boundary 125 is therefore created where
`the gateway 109 connects to each network 120.
`FIG. 1C goes on to show another embodiment of the
`prior art. In this embodiment, the cluster 100 has more
`than one computer node (105 through 109) performing
`the function of a gateway 109. The plurality of gate
`ways 109, designated as G1 through Gp each connect to
`one or more networks 120. In FIG. 1C, gateway G1
`connects to a number r of networks 120, gateway G2
`connects to a number q of networks 120, and gateway
`Gp connects to a number s of networks 120. Using this
`con?guration, the prior art nodes within the cluster 100
`are able to communicate with a large number of hosts
`130 on a large number of different networks 120.
`All the prior art known to the inventors uses gate
`ways 109 to enable external hosts to individually com
`municate with each node (105 through 109) in the clus
`ter 100. In other words, the hosts 130 external to the
`cluster 100 on the network 120 have to provide informa
`tion about any node (105 through 109) within the clus
`ter 100 before communication can begin with that node.
`The hosts 120 external to the cluster also have to pro
`vide information about the function running on the
`node which will be accessed or used during the commu
`nication. Since communication with each node (105
`through 109) must be done individually between any
`external host 130 and any node within the cluster 100,
`the cluster 100 appears as multiple, individual computer
`nodes to hosts outside the cluster. These prior art clus
`ters do not have an image of a single computer when
`accessed by outside hosts. Examples of prior art which
`lacks this single computer image follow.
`DUNIX is a restructured UNIX kernel which makes
`the several computer nodes within a cluster appear as a
`single machine to other nodes within the cluster. Sys
`tem calls entered by nodes inside the cluster enter an
`“upper kerne ” which runs on each node. At this level
`there is an explicit call to the “switch” component,
`functionally a conventional Remote Procedure Call
`(RPC), which routes the message (on the basis of the
`referred to object) to the proper node. The RPC calls a
`program which is compiled and run. The RPC is used to
`set up the communication links necessary to communi
`cate with a second node in the cluster. A “lower kernel”
`
`BACKGROUND OF THE INVENTION
`1. Field of the Invention
`This invention relates to the ?eld of clustering com
`puters. More speci?cally, the invention relates to a
`computer cluster which appears to be a single host
`computer when viewed from outside the cluster, e.g.
`from a network of computers.
`2. Description of the Prior Art
`The prior art discloses many ways of increasing com
`puting power. Two ways are improving hardware per
`formance and building tightly coupled multiprocessor
`systems. Hardware technology improvements have
`provided an approximately 100% increase in computing
`power every two years. Tightly coupled systems, i.e.,
`systems with multiple processors that all use a single
`real main storage and input/output con?guration, in
`crease computing power by making several processors
`available for computation.
`However, there are limits to these two approaches.
`Future increases in hardware performance may not be
`as dramatic as in the past. Tightly-coupled multiproces
`sor versions of modern, pipelined and cached proces
`sors are difficult to design and implement, particularly
`as the number of processors in the system increases.
`Sometimes a new operating system has to be provided
`to make the tightly-coupled systems operate. In addi
`tion, overhead costs of multi-processor systems often
`reduce the performance of these systems as compared
`to that of a uniprocessor system.
`An alternative way of increasing computer power
`35
`uses loosely-coupled uniprocessor systems. Loosely
`coupled systems typically are independent and com
`plete systems which communicate with one another in
`some way. Often the loosely-coupled systems are linked
`together on a network, within a cluster, and/ or within a
`cluster which is on a network. In loosely coupled sys
`tems in a cluster, at least one of the systems is connected
`to the network and performs communication functions
`between the cluster and the network.
`In the prior art and also shown in FIG. 1A, clusters
`45
`100 comprise two or more computers (also called nodes
`or computer nodes 105 through 109) connected to
`gether by a communication means 110 in order to ex
`change information. Nodes (105 through 109) may
`share common resources and cooperate in doing work.
`The communication means 110 connecting the comput
`ers in the cluster together can be any type of high speed
`communication link known in the art, including: 1. a
`network link like a token ring, ethernet, or ?ber optic
`connection or 2. a computer bus like a memory or sys
`55
`tem bus. A cluster, for our purposes, also includes two
`or more computers connected together on a network
`120.
`Often, clusters of computers 100 can be connected by
`various known communications links 120, i.e., net
`60
`works, to other computers or clusters. The point at
`which the cluster is connected to the outside network is
`called a boundary or cluster boundary 125. The connec
`tion 127 at the boundary is bi-directional, i.e., there are
`incoming and outgoing messages at the boundary. In
`formation which originates from a computer (also
`called a host or host computer) 130 that is on the net
`work 120 outside the cluster, which then crosses the
`
`30
`
`40
`
`65
`
`Google Ex. 1123, pg. 15
`
`

`

`15
`
`40
`
`5,371,852
`3
`4
`running on the second node then processes the message.
`maintained, or has failed, the communication will fail. If
`DUNIX is essentially a method for making computers
`a new node(s) is added to the cluster, i.e., the cluster is
`within the cluster compatible; there is no facility for
`horizontally expanded, the new node will be unavail
`making the cluster appear as a single computer image
`able to communicate with other host computers outside
`from outside the cluster.
`the cluster without adding the proper access codes,
`Amoeba is another system which provides single
`protocols, and other required information to the outside
`computer imaging of the multiple nodes within the
`hosts.
`cluster only if viewed from within the cluster. To ac
`Accordingly, there has been a long felt need for a
`complish this, Amoeba runs an entirely new base oper
`cluster of computers which presents a single computer
`ating system which has to identify and establish commu
`image, i.e., looks like a single computer, to computers
`nication links with every node within the cluster.
`external to the cluster (gateway) boundary. A single
`Amoeba cannot provide a single computer image of the
`computer image cluster would have the capability of
`cluster to a host computer outside the cluster. Amoeba
`adding or deleting computers within the cluster; chang
`also has to provide an emulator to communicate with
`ing and/or moving processes, operating systems, and
`nodes running UNIX operating systems.
`data among computers within the cluster; changing the
`Sprite is a system which works in an explicitly distrib
`con?guration of cluster resources; redistributing tasks
`uted environment, i.e., the operating system is aware of
`among the computer within the cluster; and redirecting
`every node in the cluster. Sprite provides mechanisms
`for process migration, i.e., moving a partially completed
`communications from a failed cluster node to an operat
`ing node, without having to modify or notify any com
`program from one node to another. To do this, Sprite
`puter outside the cluster. Further, computers outside
`has to execute RPCs each time a new node is accessed.
`There is no single computer image of the cluster pres
`the cluster, would be able to access information or run
`ented to the network hosts outside these systems.
`processes within the cluster without changing the envi
`V is a distributed operating system which is able to
`ronment where they are operating.
`communicate only with nodes (and other clusters)
`Systems like DUNIX, Amoeba, Sprite, and V pro
`which are also running V. UNIX does not run on V.
`vide some degree of a single system image from within
`Other techniques for managing distributed system
`the cluster (i.e., within the gateway boundaries 125) by
`clusters, include LOCUS, TCF, and DCE. These sys
`writing new kernels (in the case of Amoeba, a totally
`terns require that the operating system know of and
`new operating system.) This requires extensive system
`establish communication with each individual node in a
`design effort. In addition, all the nodes of the cluster
`cluster before ?les or processes can be accessed. How
`must run the system’s modi?ed kernel and communicate
`ever, once the nodes in the cluster are communicating,
`with servers inside the system using new software and
`processes or ?les can be accessed from any connected
`protocols.
`node in a transparent way. Thus, the ?le or process is
`LOCUS, TCF and DCE provide single system im
`accessed as if there were only one computer. These
`ages only for computers which are part of their clusters
`systems provide a single system image only for the ?le
`and only with respect to ?le name spaces and process
`name space and process name space in these systems. In
`name spaces. In other aspects, the identities of the indi
`these systems, ?les and processes can not be accessed by
`vidual nodes are visible.
`host computers outside the cluster unless the host has
`established communication with a speci?c node within
`the cluster which contains the ?les and/or processes.
`3. Statement of Problems with the Prior Art
`Prior art computer clusters fail to appear as one entity
`to any system on the network communicating with
`them, i.e., the prior art does not offer the network out
`45
`side its boundary a single computer image. Because of
`this, i.e., because computers outside the boundary of the
`cluster (meaning outside the boundary 125 of any gate
`way 109 of the cluster 100) have to communicate indi
`vidually with each computer within the cluster, com
`munications with the cluster can be complicated. For
`example, computers outside the boundary of the cluster
`(hosts) have to know the location of and processes run
`ning on each computer within the cluster with which
`they are communicating. The host computers need to
`55
`have the proper communication protocols and access
`authorization for each node within the cluster in order
`to establish communication. If a node within the cluster
`changes its location, adds or deletes a program, changes
`communication protocol, or changes access authoriza
`tion, every host computer external to the cluster for
`which the change is relevant has to be informed and
`modi?ed in order reestablish communication with the
`altered node within the cluster.
`The prior art lack of a single computer image to
`outside host computers also limits cluster modi?cation
`and reliability. If hosts try to communicate with a node
`within the cluster which has been removed, is being
`
`OBJECTIVES
`An objective of this invention is an improved method
`and apparatus for routing messages across the boundary
`of a cluster of computers to make the cluster of comput
`ers on a network appear as a single computer image to
`host computers on the network outside the cluster.
`Also an objective of this invention is an improved
`method and apparatus for routing messages across the
`boundary of a cluster of computers to enable outside
`host computers on a network to use the same software
`and network protocols to access functions and informa
`tion within the computer cluster as they would use to
`access those functions and information on a single re
`mote host.
`Also an objective of this invention is an improved
`method and apparatus for routing messages across the
`boundary of a cluster of computers so that computer
`nodes within the cluster can communicate with outside
`hosts on networks such that, from the viewpoint of the
`outside host, the communication is with a single remote
`host, i.e., the cluster, rather than with the individual
`cluster nodes.
`A further objective of this invention is an improved
`method and apparatus for routing messages across the
`boundary of a cluster of computers so that work re
`quests from outside the cluster can be evenly distributed
`among the computer nodes in the cluster.
`
`60
`
`65
`
`Google Ex. 1123, pg. 16
`
`

`

`5,371,852
`6
`FIG. 5 is a flow chart showing the steps performed
`by the present invention to route an incoming message.
`FIG. 6 is a flow chart showing the steps performed
`by the present invention to route an outgoing message.
`FIG. 7 shows data structures used by a function in the
`message switch which processes a MOUNT request.
`FIG. 8 is a ?ow chart of the computer program per
`formed by the function in the message switch which
`processes a MOUNT request.
`FIGS. 9a—9b show data structures and a ?ow chart
`used by a function in the message switch which pro
`cesses NFS requests.
`FIG. 10 shows data structures used by functions in
`the message switch which process TCP connection
`service requests, in particular rlogin.
`
`5
`SUMMARY OF THE INVENTION
`This invention, called an encapsulated cluster, is a
`method and apparatus for routing information that
`crosses the boundary of a computer cluster. The infor
`mation is in the form of port type messages. Both in
`coming and outgoing messages are routed so that the
`cluster appears as a single computer image to the exter
`nal host. The encapsulated cluster appears as a single
`host to hosts on the network which are outside the
`cluster.
`The apparatus comprises two or more computer
`nodes connected together by a communication link,
`called an interconnect, to form a cluster. (Note that in
`one embodiment of the invention, the interconnect can
`be a network.) One of the computers in the cluster,
`serving as a gateway, is connected to one or more exter
`nal computers and/or clusters (hosts) through another
`communication link called a network. A gateway can
`be connected to more than one network and more than
`one node in the cluster can be a gateway. Each gateway
`connection to a network, i.e., boundary, has an address
`on the network. Each gateway has a message switch
`which routes incoming and outgoing messages by
`changing information on the message header based on
`running a speci?c routing function that is selected using
`port and protocol information in port type messages.
`Since all incoming messages are addressed to the
`gateway, the cluster appears as a single computer to
`hosts outside the cluster that are sending incoming mes
`sages to nodes within the cluster. When processing
`incoming messages, the gateway ?rst reads a protocol
`?eld in the message header and analyzes the message to
`determine if it is a port type message originating from a
`location outside the cluster. If the message is of port
`type, the location of the port number on the message is
`found. This port number and protocol type is used to
`search for a match to a port speci?c routing function in
`a table residing in memory within the message switch. If
`a table entry is matched, a routing function associated
`with the entry is selected and run. The routing function
`routes the message to the proper computer node within
`the cluster by altering information on the incoming
`message so that the message is addressed to the proper
`45
`node within the cluster.
`For outgoing messages, originating from a source
`node within the cluster, the message switch ?rst recog
`nizes that the message is a port type message that will
`cross the cluster boundary. The message switch then
`alters the message so that the source address is the gate
`way address rather than the address of the source node.
`In this way, computers external to the cluster perceive
`the message as coming from the gateway computer on
`the network rather than the sending node within the
`cluster.
`
`55
`
`DETAILED DESCRIPTION OF THE
`INVENTION
`FIG. 2 shows one embodiment of an encapsulated
`cluster 200, the present invention. The cluster com
`prises a plurality of computer nodes (105 through 109)
`one of which is a gateway 109. The nodes are connected
`together by a high speed interconnect 110, e.g., a net
`work or any other link that is commonly used in the art.
`The gateway is connected with a bidirectional commu
`nication link 127 to a network 120. A boundary 125 is
`de?ned at the connection point between the network
`120 and the gateway 109. Computers, called hosts 130,
`connect to the network 120 and can communicate with
`nodes within the cluster by passing messages through
`the gateway 109. An incoming message 210 is shown as
`being sent from a host 130, passing through the cluster
`boundary 125, a gateway port 230, a gateway message
`switch 240, a gateway routing function 250, the inter
`connect 110, and ultimately to the destination, the desti
`nation node 107 in the cluster 200. In a similar manner,
`an outgoing message 220, is shown originating at a
`source node 105 within the cluster 200; passing through
`the interconnect 110, gateway message switch 240,
`gateway port 230, cluster boundary 125, and ultimately
`to the destination host 130.
`Although FIG. 2 represents a single cluster 200 with
`a single gateway 109, it is readily appreciated that one
`skilled in the art given this disclosure could produce
`multiple embodiments using this invention. For exam
`ple, the cluster 200 might have multiple gateways 109
`each connected to one or more networks or single host
`computers. A single gateway 109 may also have a plu
`rality of network connections each of which being ca
`pable of communicating with one or more external
`hosts or one or more external networks. All these em
`bodiments are within the contemplation of the inven
`tion.
`The encapsulated cluster 200 connects 127 to a high
`speed communication link 120, here called a network
`120. Host computers 130, also connected to the network
`120, communicate with the encapsulated cluster 200,
`and the nodes (105 through 109) within the cluster, over
`the network 120. The host computers 130 used in the
`invention include any general purpose computer or
`processor that can be connected together by the net
`work 120 in any of the many ways known in the art.
`The preferred network 120 is token-ring or ethernet.
`However, this high speed communication link 120
`might also include other connections known in the art
`like computer system buses. A host computer 130 could
`also be an encapsulated cluster of computers 200, i.e.,
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`FIGS. 1A-1C show three embodiments of prior art
`computer clusters that are attached to external commu
`nication links like networks.
`FIG. 2 shows an embodiment of the present inven
`tion.
`FIGS. 3a-3e shows the general structure of an incom
`ing and outgoing message and a more speci?c message
`structure using the internet communication protocol.
`FIG. 4 shows a preferred embodiment of a message
`switch.
`
`60
`
`65
`
`Google Ex. 1123, pg. 17
`
`

`

`5,371,852
`7
`8
`the present invention, which gives the image of a single
`to control the communication links, route messages, and
`computer to the network 120.
`access appropriate host computers on the link.
`Nodes (105 through 109) in the encapsulated cluster
`As shown in FIG. 3A, these protocols can be concep
`tually viewed as being layered, with each protocol layer
`200 can also comprise any general purpose computer or
`making use of the services provided by the layer be
`processor. An IBM RISC SYSTEM/ 6000 was the
`neath it. The lowest layer, the Network Interface (302),
`hardware used in the preferred embodiment and is de
`deals at the hardware level, with the transmission of
`scribed in the book SA23-26l9, “IBM RISC SYS
`data between hosts on a single network of a particular
`TEM/6000 Technology.” (RISC SYSTEM/6000 is a
`type. Examples of network types are token-ring and
`trademark of the IBM corporation.) These nodes may
`ethernet. The Network Interface layer presents an inter
`be independent uniprocessors with their own memory
`face to the layers above it which supports the transmis
`and input/output devices. Nodes can also be conven
`sion of data between two hosts on one physical net
`tional multiprocessors whose processors share memory
`work, without having to deal with the requirements of
`and input/output resources.
`the speci?c network hardware being used.
`Nodes (105 through 109) within the cluster are con
`The next higher layer, The Machine-to-Machine
`nected together by a high speed communications link
`(MM) layer 304, provides the capability to communi
`called an interconnect 110. This interconnect includes
`cat

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket