`Ohran et al.
`
`[w]
`
`llllllllllllllllllllllllllllllllllllllllllllllll
`
`US005978565A
`[ii] Patent Number:
`[45] Date of Patent:
`
`5,978,565
`Nov. 2, 1999
`
`[54] METHOD FOR RAPID RECOVERY FROM A
`NETWORK FILE SERVER FAILURE
`INCLUDING METHOD FOR OPERATING
`CO-STANDBY SERVERS
`
`[75]
`
`Inventors: Michael R. Ohran, Orem; Richard S.
`Ohran, Provo; David Green, Pleasant
`Grove; John M. Winger, Alpine, all of
`Utah
`
`5,633,999
`5,666,479
`
`5/1997 Clowes et al
`9/1997 Kashimoto et al
`
`395/182.04
`395/180
`
`OTHER PUBLICATIONS
`
`Steinberg, "Diverting Date From Disaster", Digital Review,
`V8, N35, Nov. 1991.
`
`Primary Examiner—Meng-Al T. An
`Assistant Examiner—Walter D. Davis, Jr.
`Attorney, Agent, or Firm—Workman Nydegger & Seeley
`
`[73] Assignee: Vinca Corporation, Orem, Utah
`
`[57]
`
`ABSTRACT
`
`[21] Appl. No.: 08/848,139
`
`[22] Filed:
`
`Apr. 28, 1997
`
`Related U.S. Application Data
`
`[63] Continuation of application No. 08/441,157, May 15, 1995,
`abandoned, which is a continuation-in-part of application
`No. 08/094,755, Jul. 20, 1993, abandoned.
`[51]
`Int. CI.
`[52] U.S. CI
`
`G06F 11/20
`395/182.11; 395/182.04;
`395/182.08
`[58] Field of Search
`395/181, 182.02,
`395/182.04, 182.05, 182.08, 182.09, 182.11,
`500
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,157,663 10/1992 Major et al
`5,307,481
`4/1994 Shimazaki et al
`4/1995 Beshears et al
`5,408,649
`5,455,932 10/1995 Major et al
`5,488,716
`1/1996 Scheider et al
`5,533,191
`7/1996 Nakano
`
`395/182.08
`395/182.09
`395/182.08
`711/162
`395/182.08
`395/182.09
`
`A method for providing rapid recovery from a network file
`server failure through the use of a backup computer system.
`The backup computer system runs a special mass storage
`access program that communicates with a mass storage
`emulator program on the network file server, making the
`disks (or other mass storage devices) on the backup com¬
`puter system appear like they were disks on the file server
`computer. By mirroring data by writing to both the mass
`storage of the file server and through the mass storage
`emulator and mass storage access program to the disks on
`the backup computer, a copy of the data on the file server
`computer is made. Optionally, selected portions of the data
`read through the mass storage emulator program can be
`altered before being returned as the result of the read
`operation on the file server. Inthe event of failure of the file
`server computer, the backup computer can replace the file
`server, using the copy of the file server's data stored on its
`disks. A single backup computer can support a plurality of
`file server computers. Unlike other redundant file server
`configurations, this method does not require the backup
`computer system to be running the file server operating
`system.
`
`25 Claims, 4 Drawing Sheets
`
`MASS STORAGE ACCESS PROGRAM
`ON SECOND COMPUTER
`
`INSTALL MASS STORAGE EMULATOR ON
`FILE SERVER COMPUTER
`
`INITIATE MIRRORING OF DATA TO
`SECOND COMPUTER USING
`MASS
`STORAGE EMULATOR
`
`ÿ"
`
`WAIT FOR FAILURE OF FILE SERVER
`COMPUTER
`
`WAIT UNTIL SECOND COMPUTER IS
`CONNECTED TO NETWORK
`
`EXECUTE FILE SERVER OPERATING
`SYSTEM ON SECOND SERVER
`
` IPR2017-00006 Ex. 1005
`Broadsign International, LLC Petitioner
` 1
`
`
`
`9\
`?/i
`OC
`<1
`V©
`
`4
`
`of
`
`Sheet1
`
`1—'
`
`zo< 1N
`
`i
`
`ft=
`
`to
`
`CZJ
`
`DEVICE.
`-STORAGE
`MASS
`
`ÿ123
`
`CONTROLLER
`STORAGEÿ
`MASS
`
`FIG.1
`
`125-
`
`-115
`
`COMMUNICATIONS
`
`ATTACHMENT
`
`MEANS
`
`102-
`
`COMMUNICATIONS
`
`ATTACHMENT
`
`MEANS
`
`MASS
`
`CONTROLLER
`ÿSTORAGE
`MASS
`
`113—
`
`ÿ122
`
`ÿ121
`
`COMPUTER
`
`COMPUTER
`
`NETWORKINTERFACE
`
`NETWORKINTERFACE
`
`.-120
`
`r
`
`101
`
`110—
`
`11K
`
`"'"N
`
` 2
`
`
`
`U.S. Patent
`
`Nov. 2, 1999
`
`sheet 2 of 4
`
`5,978,565
`
`RUN MASS STORAGE ACCESS PROGRAM
`ON SECOND COMPUTER
`
`INSTALL MASS STORAGE EMULATOR ON
`FILE SERVER COMPUTER
`
`INITIATE MIRRORING OF DATA TO
`SECOND COMPUTER USING MASS
`STORAGE EMULATOR
`
`WAIT FOR FAILURE OF FILE SERVER
`COMPUTER
`
`WAIT UNTIL SECOND COMPUTER IS
`CONNECTED TO NETWORK
`
`EXECUTE FILE SERVER OPERATING
`SYSTEM ON SECOND SERVER
`
`FIG. 2
`
` 3
`
`
`
`!ZJsr
`
`<*>
`fD
`fD
`
`o
`
`Zo
`
`ft=
`
`G
`
`-329
`
`-328
`
`r327
`
`?/l
`OS
`in
`00
`<1
`SO
`
`DEVICE
`STORAGE
`MASS
`
`DEVICE
`STORAGE
`MASS
`
`CONTROLLER
`STORAGE
`MASS
`
`COMMUNICATIONS
`
`ATTACHMENT
`
`MEANS
`
`FIG.3
`
`302-
`
`o-ic
`
`-316
`
`DEVICE.
`(STORAGE-
`fMASS\
`
`DEVICE.
`dÿ'-STORAGE1
`/MASS\
`
`qi/
`
`COMMUNICATIONS
`
`ATTACHMENT
`
`MEANS
`
`CONTROLLER
`ÿSTORAGE
`MASS
`
`313ÿ
`
`COMPUTER
`
`COMPUTER
`
`NETWORKINTERFACEÿ
`
`ETWORKINTERFACE
`
`ÿ320
`
`301-
`
`5 N
`
`310—ÿ
`
`312ÿ
`
`311ÿ
`
` 4
`
`
`
`U.S. Patent
`
`Nov. 2, 1999
`
`sheet 4 of 4
`
`5,978,565
`
`NETWORK INTERFACE
`
`COMPUTER
`
`MASS
`- STORAGE
`CONTROLLER
`
`COMMUNICATIONS
`MEANS
`ATTACHMENT
`
`NETWORK INTERFACE
`
`NETWORK INTERFACES
`
`COMPUTER
`
`COMPUTER
`
`MASS
`' STORAGE
`CONTROLLER
`
`COMMUNICATIONS
`MEANS
`ATTACHMENT
`
`Communications
`MEANS
`ATTACHMENTS
`
`MASS
`STORAGE
`CONTROLLER
`
`lASSÿ
`STORAGE
`.DEVICES.
`
`ÿASSÿ
`STORAGE
`\DEVICEy
`
`NETWORK INTERFACE
`
`COMPUTER
`
`MASS
`- STORAGE
`CONTROLLER
`
`COMMUNICATIONS
`MEANS
`ATTACHMENT
`
`lASSÿ
`STORAGE
`
` 5
`
`
`
`5,978.
`,565
`
`1
`METHOD FOR RAPID RECOVERY FROM A
`NETWORK FILE SERVER FAILURE
`INCLUDING METHOD FOR OPERATING
`CO-STANDBY SERVERS
`
`CROSS-REFERENCES TO RELATED
`APPLICATIONS
`
`5
`
`This is a continuation of application Ser. No. 08/441,157,
`filed May 15, 1995, in the names of Richard S. Ohran,
`Michael R. Ohran, John M. Winger, and David Green for
`METHOD FOR RAPID RECOVERY FROM A NET¬
`WORK FILE SERVER FAILURE INCLUDING METHOD
`FOR OPERATING CO-STANDBY SERVERS, now
`abandoned, which is a continuation-in-part of application
`Ser. No. 08/094,755, filed Jul. 20, 1993, now abandoned in 15
`the names of Richard Ohran and Terry Dickson for
`METHOD FOR RAPID RECOVERY FROM A NET¬
`WORK FILE SERVER FAILURE, now abandoned.
`
`10
`
`BACKGROUND OF THE INVENTION
`
`20
`
`1. Field of the Invention
`This invention relates to network file server computer
`systems, and in particular to the methods used to recover
`from a computer failure in a system with a plurality of 25
`computer systems, each with its own mass storage devices.
`2. Description of Related Art
`It is often desirable to provide continuous operation of
`computer systems, particularly file servers which support a
`number of user workstations or personal computers. To
`achieve this continuous operation, it is necessary for the
`computer system to be tolerant of software and hardware
`problems or faults. This is generally done by having redun¬
`dant computers and mass storage devices, such that a backup
`computer or disk drive is immediately available to take over
`in the event of a fault.
`A number of techniques for implementing a fault-tolerant
`computer system are described inMajor et al., U.S. Pat. No.
`5,157,663, and its cited references. In particular, the inven- 4Q
`tion of Major provides a redundant network file server
`capable of recovering from the failure of either the computer
`or the mass storage device of one of the file servers. The file
`server operating system is run on each computer system in
`the network file server, with each computer system cooper- 45
`ating to produce the redundant network file server. This
`technique has been used by Novell to implement its SFT-III
`fault-tolerant file server product.
`There are a number of reasons why the use of a redundant
`network file server such as described in Major may be 50
`undesirable. As can be seen from the description in Major,
`the software needed to provide such a redundant network file
`server is considerably more complex than the software of the
`present invention. This can result in a lower reliability due
`the increased presence of programming errors ("bugs") in 55
`the complex software. Also, the processing time required to
`handle a client request may be increased by the complexity
`of the redundant network file server software, when com¬
`pared to a single-processor network file server. Finally,
`license restrictions or other limitations may make it infea- go
`sible or uneconomical to run a redundant network file server
`instead of a normal network file server.
`
`SUMMARY OF THE INVENTION
`It is an object of this invention to provide the rapid 65
`the
`recovery from a network file server failure without
`complex software of a redundant network file server. This is
`
`2
`achieved by having a second, backup computer system with
`its own mass storage device (generally a magnetic disk).
`This backup computer is connected by an appropriate means
`for communications to the file server computer, allowing the
`transmission of information (such as commands and data)
`between the two computers. A mass storage emulator, run¬
`ning like a device driver on the file server computer, sends
`information to a mass storage access program on the backup
`computer. The mass storage access program performs the
`requested operation (read, write, etc.) on the mass storage
`system connected to the backup computer, and returns the
`result to the mass storage emulator on the file server com¬
`puter.
`This makes the mass storage device on the backup com¬
`puter look like another mass storage device on the file server
`computer. The data mirroring option of
`the file server
`operating system can be activated (or, if the operating
`system does not support data mirroring, a special device
`driver that provides data mirroring can be used), so that a
`copy of all data written to the mass storage device directly
`connected to the file server will also be written to the mass
`storage device on the backup computer, through the mass
`storage emulator and mass storage access programs.
`When a failure is detected in the file server computer
`system, the backup computer become the file server. The
`mass storage device of the backup computer will contain a
`copy of the information on the mass storage device of the
`failed file server, so the new file server can start with
`approximately the same data as when the previous file server
`failed.
`It is a further object of this invention to allow a single
`backup computer to support a plurality of file server com¬
`puters. This is achieved by having each file server computer
`run a mass storage emulator. The backup computer can run
`either a single mass storage access program capable of
`communicating with a plurality of mass storage emulators.
`Alternatively, if the operating system on the backup com¬
`puter permits the running of multiple processes, the backup
`computer can run a separate mass storage access program
`for each mass storage emulator.
`It is a further object of this invention to improve the
`reliability of a redundant network file server computer
`system by reducing the complexity of the software when
`compared to the software of a redundant network file server.
`The programs for the mass storage emulator on the file
`server computer and the mass storage access on the backup
`computer can be considerably less complex than a full
`redundant file server operating system.
`Furthermore, while it is possible for the backup computer
`to be running the file server operating system (and acting as
`another file server), it is also possible to runthe mass storage
`access program under a simple operating system or as a
`stand-alone program, reducing the complexity and increas¬
`ing the performance of the backup computer system.
`These and other features of the invention will be more
`readily understood upon consideration of the attached draw¬
`ings and of the following detailed description of those
`drawings and the presently preferred embodiments of the
`invention.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`FIG. 1illustrates a computer configuration on which the
`method of the invention runs.
`FIG. 2 is a flow diagram showing the steps of one
`preferred embodiment of the method of the invention.
`FIG. 3 illustrates a computer configuration with two file
`server computers.
`
` 6
`
`
`
`5,978,565
`
`FIG. 4 illustrates a single backup computer supporting a
`plurality of file server computers.
`DETAILED DESCRIPTION OF THE
`INVENTION
`Referring to FIG. 1, which illustrates a representative
`computer configuration on which the method of the inven¬
`tion runs, it can be seen that there are two computer systems
`110 and 120. The first computer system 110 is running a file
`server operating system (such as Novell NetWare®). Com¬
`puter system 110 includes computer 112 connected to net¬
`work 101 through interface 111 (and its associated
`software), and mass storage device 114 connected through
`controller 113 (and its associated software). These represent
`the standard components of a network file server. Inthe case
`of NetWare, computer 112 may be a PC-compatible com¬
`puter based on an Intel 486 or Pentium processor, network
`101can be an ethernet (so that interface 111 is an ethernet
`interface), and mass storage device 114 can be a SCSI or
`IDE magnetic disk connected through an appropriate con¬
`troller 113. Computer 122 would also be a PC-compatible
`computer, so that it could also run the same NetWare file
`server operating system as computer 112. Network 101
`could also be implemented as a token ring, Arcnet, or any
`other network technology.
`The mass storage devices of the invention should not be
`viewed as limited to magnetic disk drives, but can also be
`implemented using optical discs, magnetic tape drives, or
`any other medium capable of handling the read and write
`requests of the particular computer system.
`Added to the standard network file server to support the
`method of this invention are a backup computer system 120
`and a means 102 for communicating between computer
`system 110 and computer system 120.
`Computer system 120 has components similar to com¬
`puter system 110. Computer system 120 can be connected to
`network 101 through interface 121, although it is not nec¬
`essary for computer system 120 to actually be connected to
`network 101 during normal operation. Computer 122 is
`connected to interface 121 and to mass storage device 124
`through controller 123.
`While it is not necessary for computer system 120 to have
`identical components to computer system 110, in many
`cases that will be the case. In other cases, computer system
`120 may be an older, slower system previously used as a file
`server but replaced with computer system 110. All that is
`required of computer system 120 is that it be capable of
`running the file server operating system in case of the failure
`of computer system 110, and that its mass storage device 124
`be of sufficient capacity to hold the data mirrored from mass
`storage device 114.
`Communications means 102 provides a link between
`computer systems 110 and 120. Computer 112 is connected
`to communications means 102 through attachment 115, and
`computer 122 is connected to communications means 102
`through attachment 125. Communications means 102 can be
`implemented using a variety of techniques, well-known to
`those skilled in the art. In the preferred embodiments, a
`high-speed serial point-to-point link is used. An alternative
`would be to use the serial communications ports of com¬
`puters 112 and 122, programmed to run at a high data rate,
`or the parallel interfaces of computers 112 and 122.Another
`alternative is for communications means 102 to be a virtual
`circuit or channel carried on network 101. Inthis latter case,
`communications means 102 would really be network 101,
`attachment 115 would really be interface 111, and attach¬
`ment 125 would really be interface 121.
`
`15
`
`25
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`It is important that communication means 102 provide
`data transfer at rates comparable to the data rate of mass
`storage device 124 so that it does not limit the performance
`of the system. The method of this invention is not dependent
`on the particular implementation of communications means
`102, although a communications means 102 dedicated only
`to the method of this invention will generally result in more
`efficient operation and simpler programs.
`FIG. 2 is a flow diagram showing the steps of the method
`of the invention. In step 201, a special program, the mass
`storage access program, is run on computer system 120.The
`mass storage access program receives commands from com¬
`puter system 110 over communications means 102. Based
`on those commands,
`the mass storage access program
`accesses mass storage device 124 to perform the operation
`specified in the command received from computer system
`110. The results of the accessing of mass storage device 124
`is returned to computer system 110 over communications
`means 102.
`The mass storage access program can be enhanced to
`provide a cache of data on mass storage device 124. The
`implementation of such a cache function is well-known in
`the art, consisting of keeping a copy of the most recently
`accessed information of mass storage device 124 in the
`memory of computer 122. When a read command is
`received, it is not necessary to access mass storage device
`124 if a copy of the data is in the cache. Since computer 122
`has a large memory (it must be large enough to run the file
`server operating system) and the mass storage access pro¬
`gram is quite small, there is a large amount of memory
`available for the cache, particularly if computer 122 is only
`running the mass storage access program. This means that
`many entries will be in the cache, and the chance of finding
`a block being read in the cache is higher than would be
`normal for a similar cache in a file server operating system.
`In step 202, coincidentally with the running of the mass
`storage access program on computer system 120, another
`program, the mass storage emulator, is installed on computer
`system 110. The mass storage emulator takes mass storage
`requests from the file server operating system running on
`computer system 110 and sends them as commands over
`communications means 102 to computer system 120,where
`they are processed by the mass storage access program, as
`discussed above.
`When results from a command are received from the mass
`storage access program over communications means 102 by
`the mass storage emulator, they are returned to the file server
`operating system, much as the result of a normal mass
`storage request would be returned. In this way, the mass
`storage access program and the mass storage emulator
`cooperate to make it appear to the file server operating
`system that mass storage device 124 is directly connected to
`computer 112 on computer system 110.
`In most cases, the results returned from a read operation
`will be the data stored at the specified mass storage location.
`However, in some embodiments of the invention it will be
`desirable to return an alternative value for special mass
`storage locations. For example, the first block on many mass
`storage systems contains informationsuch as volume names.
`It may be necessary to avoid duplicate volume names, so
`alternative data for the first block, containing a non-
`duplicate volume name,will be returned by the mass storage
`access program for a read of the first block.
`The alternative data could be stored as part of the mass
`storage access program, stored in a special location on the
`mass storage device accessed by the mass storage access
`
` 7
`
`
`
`5,978,
`565
`
`5
`program, or stored on another mass storage device. It can
`also be generated by the mass storage access program from
`the data stored in the special location, such as modifying a
`particular field. In any case, when one of the special loca¬
`tions is read, the mass storage access program transfers the 5
`alternative data to the mass storage emulator.
`In one embodiment of this invention, the mass storage
`access program is a conventional program running under the
`disk operating system of personal computer 122. The disk
`storage emulator is a NetWare Loadable Module (NLM), 1Q
`much like the device driver for a disk drive. Copies of the
`source code for the mass storage access program and the
`mass storage emulator are given in the microfiche appendix.
`In another embodiment of this invention, both computer
`systems 110 and 120 are running copies of the file server
`operating system. Computer system 120 can function as a
`file server while acting as a backup for computer system 110.
`The mass storage access program running on computer
`system 120 can be either a regular user program or a
`NetWare Loadable Module.
`Inyet another embodiment of this invention, illustrated in
`FIG. 3, both computer systems 310 and 320 are running
`copies of the file server operating system, and each is acting
`as a backup for the other. Computer system 310 is running
`a mass storage emulator allowing it to access mass storage 25
`device 324 on computer system 320 by communicating with
`the mass storage access program running on computer
`system 320. Likewise, computer system 320 including com¬
`puter 328 and network interface 327 is running a mass
`storage emulator 329 allowing it to access mass storage 30
`device 314 on computer system 310 by communicating with
`the mass storage access program running on computer
`system 310. Each file server is acting as a backup for the
`other using the present invention. Thus, if either file server
`goes down, the other can continue to serve the needs of the 35
`computer network without down time. And when neither file
`server is down, the users enjoy the benefits of fully utilizing
`the resources of their redundant file server capability. This is
`advantageous in comparison to utilizing a single dedicated
`backup file server which provides no services for users until 40
`the primary file server becomes unavailable.
`Ifboth computer systems 310 and 320 are running the file
`server operating system, there may be difficulties if the file
`server operating system uses special names in the labels of
`the disks. As illustrated in FIG. 3, file server 310 has mass 45
`storage devices 314 and 315, and file server 320 has mass
`storage devices 324 and 325. Mass storage devices 314 and
`324 are the normal system disks on computer systems 310
`and 320, respectively, and mass storage devices 315 and 325
`are used to backup the other file server.
`Often, an operating system such as NetWare will use a
`special disk label such as SYS for its main system disk. In
`the system of FIG. 3, mass storage devices 314 and 324, the
`normal system disks, will have the label SYS. However,
`because mass storage device 325 is a mirror of mass storage 55
`device 314, mass storage device 325 would normally also
`have the label SYS. Similarly, mass storage device 315 the
`mirror of mass storage device 324, would also have the label
`SYS. With many operating systems, such duplicate labels
`would cause difficulties.
`This problem can be overcome by altering the mass
`storage access programs running on computer systems 310
`and 320 to return alternative data when a read operation is
`performed on certain mass storage locations. To handle the
`duplicate label problem, each mass storage access program 65
`is configured to return an alternative label whenever the
`mass storage location containing the label is read.
`
`50
`
`2Q
`
`60
`
`6
`For example, mass storage device 315 might have a real
`label of SYS.LEE (indicating that it is a mirror copy of the
`SYS disk of file server LEE) but the mass storage access
`program on computer system 310 would be programmed to
`return a label of SYS to the mass storage emulator running
`on computer system 320 whenever the label location is read.
`This would mean that computer system 310 would see disks
`with different labels (SYS for mass storage device 314 and
`SYS.LEE for mass storage device 315). However, computer
`system 320 would see the label SYS on both mass storage
`device 324 and on mass storage device 315, the mirror for
`mass storage device 324. Similarly, the real label for mass
`storage device 325 might be SYS.DAN (mirror copy of disk
`SYS on server DAN) but a label of SYS would be seen by
`computer system 310.
`Returning to FIG. 2, in step 203, mirroring of data is
`initiated. When data is being mirrored on two or more mass
`storage devices, whenever data is to be written it is written
`to all mass storage devices taking part inthe mirroring, at the
`same location on each mass storage device. (The location
`may be relative to the start of the mass storage device, or to
`the start of a partition or contiguous portion of the mass
`storage device, as appropriate to the way the mass storage
`device has been formatted and is being used.) Data can be
`read from any mass storage device taking part
`in the
`mirroring, since each mass storage device contains identical
`data.
`Mirroring may be an integral function of the file server
`operating system, so that no special program is necessary for
`implementing disk mirroring as part of the method of this
`invention. Step 203 only requires the activation or starting of
`mirroring on the part of the file server operating system. This
`is the case in the preferred embodiments of the invention,
`operating with NetWare and using the mirroring facilities of
`that file server operating system.
`If the file server operating system does not provide
`mirroring, a separate mirroring module will have to be
`implemented. Such a mirroring module, whose implemen¬
`tation should be obvious to one skilled in the art, will take
`each write request and pass it to the driver for each mass
`storage device taking part inthe mirroring. For mass storage
`device 124 on computer system 120, the driver will be the
`mass storage emulator, discussed above. When successful
`completion of the write request has been received from all
`in the mirroring, the
`mass storage devices taking part
`mirroring module will indicate successful completion to the
`file server operating system.
`For read requests, the mirroring module can direct the
`read request to any of the mass storage devices, since all
`contain identical data. Generally, the read request will be
`directed to the mass storage device which is first available to
`handle the request.
`As part of the initiating of mirroring, it is necessary to
`assure that each mass storage device taking part inmirroring
`has the same contents at the start of mirroring. This can be
`done by designating one of the mass storage devices as the
`master, and making a copy of the master mass storage
`device's data to all other mass storage devices taking part in
`the mirroring. An alternative approach is to have a times-
`tamp indicating when the last change was made to the data
`on a mass storage device. Ifthe timestamp on a mass storage
`device is the same as the timestamp on the master mass
`storage device, it will not be necessary to make a new copy
`of the data.
`At step 204, the method of this invention waits until a
`failure of file server computer system 110 is detected. Such
`
` 8
`
`
`
`5,978.
`,565
`
`5
`
`7
`a failure could come from the failure of either hardware
`(such as computer 112 or mass storage device 114) or
`software (such as the file server operating system). Although
`means for automatically detecting such a failure may be
`used, such failure can also be detected by a system operator
`or workstation user noticing that file server requests are no
`longer being handled by computer system 110. It is not
`difficult for a user to determine there is a problem with file
`server computer system 110; in most cases, a user worksta¬
`tion will stop working and "hang" while it waits for a file to
`server request that will never be honored.
`In step 205, when a failure of computer system 110 has
`been detected, if computer system 120 is not currently
`connected to network 101through interface 121, it is con¬
`nected to network 121. This can be done either by activating :5
`interface 121 or physically connecting interface 121 to
`network 101, as appropriate.
`In step 206, when computer system 120 has been con¬
`nected to network 101, the file server operating system is
`loaded into computer 122 and executed if computer 122 is 20
`not already running the file server operating system, so that
`computer system 120 is a file server computer system. New
`file server computer system 120 now responds to requests
`received from network 101 as failed file server computer
`system 110 did before its failure. The file server operating 25
`system executing on computer 122 accesses mass storage
`device 124 to respond to the requests.
`Note that because mass storage device 124 received data
`through the mass storage emulator and mass storage access
`program while file server computer system 110 was
`operating, mass storage device 124 contains a copy of the
`data stored on mass storage device 114prior to the failure of
`computer system 120. (Because of timing, the last few write
`operations may not have occurred on all mass storage 3J
`devices taking part inmirroring, but the file server operating
`system is capable of handling these small differences.)
`Because a copy of the mass storage data of failed file server
`computer system 110 is immediately available to new file
`server computer system 120, the time necessary to recover
`from a file server failure is minimal.
`When the fault that caused the failure of computer system
`120 has been corrected, fault-tolerant operation can be
`restored. Depending on the relative capabilities of computer
`two techniques can be 45
`systems 110 and 120, one of
`employed. Both involve the same method steps as were
`discussed above.
`If the two computer systems have components of similar
`speed and capacity, there is no reason not to continue using
`computer system 120 as the file server computer. In this 50
`case, computer system 110 can now be treated as the backup
`computer system. The mass storage access program is run on
`computer system 110, the mass storage emulator is installed
`on computer system 120, and mirroring is initiated on the
`file server operating system running on computer system 55
`120. As part of the initiating of mirroring, any data written
`to mass storage device 124 during the time computer system
`110 was not available is now copied to mass storage device
`114 though the mass storage emulator, communications
`mean 102, and the mass storage access program.
`Alternatively, if computer system 120 is less capable than
`computer system 110, it will be desirable to make computer
`system 110 the file server computer system when the failure
`has been corrected. To accomplish this, two approaches are
`possible. In the first approach, computer system 110 is 65
`brought up as the backup computer system, running the mass
`storage access program, as discussed above. When mass
`
`go
`
`3Q
`
`4Q
`
`8
`storage device 114 contains a copy of the data on mass
`storage device 124, computer system 110 can be restarted as
`the file server (running the file server operating system) and
`computer system 120 can be restarted as the backup com¬
`puter in accordance with the method discussed above.
`The second approach is when the failure of computer
`system 110 has been corrected, computer system 120 is
`restarted as backup computer system, running the mass
`storage access program, and computer system 110 is
`restarted as the file server computer, running the file server
`operating system and the mass storage emulator. When
`mirroring is initiated, it will be determined by the times-
`tamps stored on each of mass storage devices 114 and 124
`that the data on mass storage device 114 is out of date. The
`file server operating system will read the data on mass
`storage device 124 (though the mass storage emulator,
`communications means 102, and the mass storage access
`program). It will also copy the data from mass storage
`device 124 to mass storage device 114 until they contain
`identical data.
`It is possible for a single computer system to act as the
`backup for a plurality of file server computers, not just a
`single file server as was discussed above. FIG. 4 illustrates
`one possible configuration. It shows three file server com¬
`puter systems 410, 420, and 430 serving networks 401, 402,
`and 403, respectively. They can communicate with backup
`computer system 440 through communications means
`attachments 415, 425, 435, and 445. Communications
`means attachments 445 can be a single device, or three
`identical devices interfaced to computer 442. Computer 442
`can also be attached to networks 401, 402, or 403 through
`network interfaces 441. Network interfaces 441 could be a
`single device switchable to networks 401, 402, or 403, as
`required, a single device capable of connecting to three
`networks, or three separate devices.
`Each file server computer 410, 420, and 430 runs a mass
`storage emulator as previously described. Backup computer
`440 can run either a single mass storage access program
`capable of communicating with a plurality of mass storage
`emulators. Alternatively, if the operating system on the
`backup computer permits the running of multiple processes,
`the backup computer can run a separate mass storage access
`