`Deitz et al.
`
`(10) Patent No.: US 6,578,158 B1
`(45) Date of Patent: Jun. 10, 2003
`
`US006578158B1
`
`(54) METHOD AND APPARATUS FOR
`PROVIDING A RAID CONTROLLER
`HAVING TRANSPARENT FAILOVER AND
`FAILBACK
`
`(75)
`
`Inventors:
`
`William G. Deitz, Niwot, CO (US);
`Keith Short, l,aFayette, CO (US)
`
`(73) Assignee:
`
`International Business Machines
`Corporation, Armonk, NY (US)
`
`(*) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 0 days.
`
`(21)
`
`Appl. No.: 09/429,523
`
`(22) Filed:
`
`Oct. 28, 1999
`
`Int. CI.7 ................................................. G06F 11/00
`(51)
`(52) U.S. Cl ............................................... 714/11; 714/5
`(58) Field of Search .............................. 714/6, 7, 8, 11,
`714/710, 5; 711/114
`
`(56)
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,237,658 A
`5,274,645 A
`5,367,669 A
`5,553,230 A
`5,757,642 A
`5,790,775 A
`5,812,754 A
`
`8/1993 Walker et al ............... 395/200
`* 12/1993 Idleman et al ............. 371/10.1
`11/1994 Holland ct al .............. 395/575
`* 9/1996 Petersen et al ............. 395/180
`5/1998 Jones ......................... 364/134
`* 8/1998 Marks et al ........... 395/182.07
`* 9/1998 Lui et al ................ 395/182.04
`
`7/1999
`5,922,077 A
`6,129,027 A1 * 2/2001
`6,219,753 B1 * 4/2001
`6,330,687 B1 * 12/2001
`
`Espy et al ..................... 714/7
`E1-Batal ..................... 370/222
`Richardson ................. 711/114
`Griffith .......................... 714/6
`
`* cited by examiner
`
`Primary Examiner~Robert Beausoliel
`Assistant Examiner~Marc Duncan
`(74) Attorney, Agent, or Firm~orsey & Whitney LLP
`
`(57)
`
`ABSTRACT
`
`A method and apparatus for controlling a memory system
`100 comprising a plurality of controllers 105 connected by
`a fibre channel arbitrated loop 145 to provide transparent
`failover and failback mechanisms for failed controllers. The
`controllers 105 are adapted to transfer data between a data
`storage system 120 and at least one host computer 110 in
`response to instructions therefrom. In the method, a unique
`identifier is provided to each controller 105. The operation
`of the controllers 105 is then monitored and, when a failed
`controller is detected, a failover procedure is performed on
`a succiving controller. The failovcr procedure includes dis-
`abling the failed controller and instructing the surviving
`controller to assume the identity of the failed controller.
`Thus, the surviving controller is capable of responding to
`instructions addressed to it and instructions addressed to the
`failed controller, and the failure of the failed controller is
`transparent to the host computer 110. A computer program
`and a computer program product for implementing the
`method are also provided.
`
`25 Claims, 4 Drawing Sheets
`
`110a
`
`110b
`
`HOST COMPUTER
`
`HOST COMPUTER
`
`100
`
`195a-~ ,
`
`I I m° " I
`
`’~(--200a
`FALLOVER
`I ~0~T [
`
`I
`
`200b-~ ~ ~195b
`F~LOVERI
`[
`I I ~0~T I
`I ~0~T I
`
`I
`
`~
`
`I
`
`I
`
`I ~05b
`
`/
`
`/
`
`1-
`,,0 k
`1;0
`
`IBM-Oracle 1008
`Page 1 of 12
`
`
`
`U.S. Patent
`
`Jun. 10, 2003 Sheet 1 of 4
`
`US 6,578,158 B1
`
`HOST COMPUTER
`
`110
`
`HBA
`
`~155a 155bj
`
`HBA
`
`100
`
`HUB
`
`ROM [ CPU
`
`ACTIVE
`PORT PORT
`ROM I CPU
`
`180a~
`CONTROLLER
`
`205
`
`180b~
`CONTROLLER
`
`185
`
`~105b
`
`105a
`
`120
`
`140b
`
`140¢
`
`130-" ’
`
`I
`
`FIG. 1
`
`IBM-Oracle 1008
`Page 2 of 12
`
`
`
`U.S. Patent
`
`Jun. 10, 2003 Sheet 2 of 4
`
`US 6,578,158 B1
`
`110a
`
`110b
`
`HOST COMPUTER
`
`HOST COMPUTER
`
`I HBA 155A
`
`II
`
`HBA --
`155a I
`
`!
`
`115a
`
`HUB
`
`~-150a
`
`I’,
`
`! I
`! I
`
`HBA 155b
`
`155BI
`
`HBA
`
`! I
`
`I HUB
`
`,,
`
`! !
`
`, 150b
`
`195a---,~
`
`l
`
`PORT
`ACTIVE
`
`,’ a
`FALLOVERI
`PORTl
`NACT VE I
`
`IFALLOVERI
`
`PORTI
`INACTIVE I
`
`, T (,--195b
`
`PORT J
`ACTIVE
`
`105aj
`
`120
`
`130-" ’
`
`I
`
`CONTROLLER
`
`CONTROLLER
`
`~-105b
`
`140b
`
`140c;
`
`@ @ @
`
`FIG. 2
`
`IBM-Oracle 1008
`Page 3 of 12
`
`
`
`U.S. Patent
`
`Jun. 10, 2003 Sheet 3 of 4
`
`US 6,578,158 B1
`
`START
`
`210
`
`PROVIDE UNIQUE
`IDENTIFIER
`
`RESPOND TO I/0
`INSTRUCTIONS FOR
`SURVIVING AND
`FAILED CONTROLLERS
`
`245
`
`I
`
`POLL FAILED
`CONTROLLER ~
`
`~ 250
`
`NO
`
`~255
`
`~, YES
`COMMUNICATE
`UNIQUE
`IDENTIFIER
`
`COMMUNICATE
`UNIQUE
`IDENTIFIER
`
`I
`
`BEGIN DUAL-
`ACTIVE OPERATION
`
`EXCHANGE PINGS
`
`225
`
`230
`
`NO
`
`DISABLE FAILED
`CONTROLLER
`
`235"~
`
`ASSUME IDENTITY
`OF FAILED
`CONTROLLER
`
`I
`
`REPLACEMENT
`CONTROLLER
`ASSUMES IDENTITY OF
`FAILED CONTROLLER
`
`RESUME DUAL- I
`ACTIVE OPERATION
`
`270
`
`NO
`
`~~, YES
`(START ~
`
`FIG. 3
`
`IBM-Oracle 1008
`Page 4 of 12
`
`
`
`U.S. Patent
`
`Jun. 10, 2003 Sheet 4 of 4
`
`US 6,578,158 B1
`
`CONTROLLER
`INITIALIZATION
`UNIT
`I
`
`REPLACEMENT
`DETECTION
`UNIT
`
`FAILURE
`DETECTION
`UNIT
`
`FAILBACK
`UNIT
`
`295-~
`
`FAILOVER
`UNIT
`
`LOOP
`REINITIALIZATION
`UNIT
`
`DISABLING
`UNIT
`
`310j
`
`LOOP
`INITIALIZATION
`UNIT
`
`FIG. 4
`
`IBM-Oracle 1008
`Page 5 of 12
`
`
`
`US 6,578,158 BI
`
`1
`METHOD AND APPARATUS FOR
`PROVIDING A RAID CONTROLLER
`HAVING TRANSPARENT FAILOVER AND
`FAILBACK
`
`FIELD OF THE INVENTION
`
`This invention pertains generally to the field of computer
`memory systems, and more particularly to a method and
`apparatus for controlling redundant arrays of independent
`disks.
`
`BACKGROUND OF THE INVENTION
`
`Modern computers frequently require large, fault-tolerant
`memory systems. One approach to meeting this need is to
`providc a Rcdundant Array of Indcpcndcnt Disk drivcs
`(RAID) usually including a plurality of hard disk drives
`operated by a disk array controller that is coupled to a host
`computer. The controller provides the brains of the memory
`system, servicing all host requests, storing data to or retriev-
`ing it from the RAID, caching data to provide faster access,
`and handling drive failures without interrupting host
`requests. Given the importance of the controller, numerous
`solutions have been suggested minimize the potential for
`interrupted service due to controller malfunction. One such
`solution calls for providing dual-active controllers having
`failover and failback capabilities. Dual-active controllers are
`a pair of controllers that are connected to each other and to
`all the disk drives in a RAID. In normal operation, input/
`output (I/O) requests from the host computer are divided
`between the dual-active controllers to increase the rate at
`which information can be transferred to or from the RAID,
`commonly referred to as the bandwidth of the memory
`system. However, in the event that one of the controllers
`fails, the surviving controller takes over the functions of the
`failed controller and begins servicing host requests
`addressed to the failed controller in addition to those
`addressed to it. The mechanism that allows this is commonly
`known as a failover mechanism. If the surviving controller
`is able to assume the functions of the failed controller
`without any actions on the part of the host computer, for
`example redirecting I/O requests to the surviving controller,
`the failover mechanism is said to be transparent. If the failed
`controller can be subsequently replaced and normal opera-
`tion resumed without de-energizing or reinitializing the
`controllers the memory system is said to have a failback
`mechanism.
`One example of the use of such dual-active controllers is
`described, for example, in U.S. Pat. No. 5,790,775, to Marks
`ct al. uscs dual-activc controllcrs conncctcd to thc host
`computer by a Small Computer System Interface (SCSI)
`bus. Typically, the controllers are also connected to a RAID
`comprising multiple disk drives through a number of addi-
`tional SCSI buses. Each SCSI device on a bus, such as a
`controller or a disk drive, is assigned one bit as an identifier
`(SCSI ID) to permit the host computer to select a particular
`controller, and the controller to select a particular disk drive.
`Thus, the method permits a maximum of eight devices to be
`identified on a standard 8-bit SCSI bus. In addition, the
`controllers are connected to one another by a separate
`communications link, and each has access to a cache
`memory in the other. Although both controllers are con-
`nected to every disk drive in the RAID, to permit dual-active
`operation each disk drive is typically under primary control
`of one of the controllers. This is accomplished by dividing
`the RAID into groups of disk drives that appear to the host
`
`10
`
`2
`computer as a logical drive or unit identified by a logical unit
`number (LUN) and, during initialization, associating each
`LUN with the SCSI ID of a particular controller. In normal
`operation, a controller responds only to I/O requests which
`5 are addressed to it and which refer to LUNs over which it has
`primary control. However, if a controller fails the remaining
`controller of the pair obtains configuration information,
`including the SCSI ID and the LUNs of the failed controller,
`over the communications link and begins servicing requests
`addressed by the host to the failed controller as well as those
`addressed to itself
`Whilc the abovc approach has bccn cffcctivc in rcducing
`interruptions in service for memory systems having dual-
`active controllers, it is limited by the architecture of the
`SCSI bus. Traditionally, SCSI buses have from eight to
`15 sixteen signal lines which allows a maximum of from eight
`to sixteen SCSI devices to be interconnected by the SCSI
`bus at any one time. Thus, systems xvhich use a 16-bit wide
`SCSI bus on the host side and g-bit wide SCSI buses on the
`device side, typically provide for at most six device side
`20 SCSI buses having six disk drives each. Moreover, the above
`approach, xvhich relies on SCSI IDs, has not been imple-
`mented using fibre interface type controllers.
`Fibre interface type controllers are coupled to a host
`computer through one or more fibre channels. Fibre channel
`25 is the general name of a technology using an integrated set
`of standards developed by the American National Standards
`Institute (ANSI) for high speed, serial communication
`between computer devices. (See for example the ANSI
`standard X3Tll, "Fibre Channel Physical and Signaling
`3o Interface (FC-PH)," Rev 4.3 (1994), hereby incorporated by
`reference.) Manufacturers of RAID systems have been mov-
`ing to fibre channel technology because it allows transmit-
`ting of data between computer devices at rates of over 1
`Gbps (one billion bits per second), and at distances exceed-
`35 ing several hundred meters and more. Also, fibre channel
`arbitratcd loop (FC-AL) allows for 127 uniquc loop
`identifiers, one of which unique identities is reserved for a
`fabric loop port.
`The widely accepted approach to providing failover/
`4o failback capability in RAID systems comprising fibre inter-
`face controllers has been to use dual-active controllers
`couplcd by a rcdirccting driver. In thc cvcnt of a controllcr
`failure the redirecting driver shifts host requests from the
`failed controller to a surviving controller. The failed con-
`45 troller can then be replaced and the memory system reini-
`tialized to return to normal, dual-active controller operation.
`The redirecting driver can be implemented using a software
`or hardware protocol. One exemplary redirecting driver is
`disclosed in U.S. Pat. No. 5,237,658, to Walker et al., hereby
`s0 incorporated by reference. However, one problem associated
`with this type of solution is that it is achieved at the expense
`of added memory system complexity that increases cost and
`decreases bandwidth. In addition, when, as is common, the
`redirecting driver is implemented using software in the host
`55 computer, this approach is not independent of the host
`computer, and typically requires a special driver for each
`host computer system on ~vhich it is to be utilized. This
`further adds to the cost and complexity, and increases the
`difficulty of installing and maintaining the memory system.
`6o Accordingly, there is a need for a memory system com-
`prising a number of fibre interface controllers and having a
`failover mechanism that is transparent to a host computer.
`There is a further need for such a memory system having a
`failback mechanism that is also transparent to the host
`65 computer. The present invention provides a solution to these
`and other problems, and offers additional advantages over
`the prior art.
`
`IBM-Oracle 1008
`Page 6 of 12
`
`
`
`US 6,578,158 BI
`
`3
`SUMMARY OF THE INVENTION
`
`The present invention provides a memory system and
`method of operating a memory system. In one embodiment,
`the memory system includes a number of controllers con-
`nected by a fibre channel arbitrated loop to provide trans-
`parent failover and failback for failed controllers. The con-
`trollers are adapted to transfer data between a data storage
`system and at least one host computer in response to
`instructions therefrom. In the inventive method, a unique
`identifier is provided to each controller to permit the host
`conrpute r to address instructions to a specific controller.
`Then, operation of the controllers is monitored and xvhen a
`failed controller is detected, a failover procedure is per-
`formed on a surviving controller. In one embodiment, the
`failover procedure disables the failed controller and assumes
`the identity of the failed controller. Thus, the surviving
`controllcr bccomcs capable of responding to instructions
`addressed to it and instructions addressed to the failed
`controller, and the failure of the failed controller is trans-
`parent to the host computer. In one particular embodiment,
`the step of providing a unique identifier to each controller
`preferably includes the step of providing a world wide name
`to each controller, and more preferably the step further
`includes providing a loop identifier to each controller.
`In another aspect the invention provides a memory system
`for transferring data between a data storage systcnr and at
`least one host computer in response to instructions there-
`from. The memory system includes a pair of dual-active
`controllers connected by a fibre channel arbitrated loop.
`Each controller has a unique identifier and is adapted to
`assume the identity of a failed controller and to respond to
`instructions addressed to it, thereby rendering failure of the
`failed controller transparent to the host computer. In one
`embodiment, the memory system further includes a com-
`munication path coupling thc controllcrs, thc communica-
`tion path being adapted to enable each controller to detect
`failure of the other controller. The present invention is
`particularly useful for data storage systems comprising
`multiple disk drives coupled to the controllers by disk
`channels, in xvhich at least one disk channel also serves as
`the communication path.
`In yet another aspect the invention provides a computer
`program and a computer program product for operating a
`menrory systcnr conrprising a plurality of controllers, each
`controller having a unique identifier, and the controllers
`adapted to transfer data between a data storage system and
`at least one host computer in response to instructions there-
`from. The computer program product includes a computer
`readable medinm with a computer program stored therein.
`The computer program has a failure dctcction unit adapted
`to detect a failed controller. A failover unit is adapted to
`enable a surviving controller to respond to instructions
`addressed to it and to instructions addressed to the failed
`controller. The failover unit includes a disabling unit adapted
`to disable the failed controller. The failover unit also
`includes a loop initialization unit, which is adapted to
`instruct a surviving controller to assume the identity of the
`failed controller and to instruct the surviving controller to
`respond to instructions addressed to it and to the failed
`controller as well as instructions addressed to the surviving
`controller. Thus, failure of the failed controller is transparent
`to the host computer. In one embodiment, each controller
`has an active port and a failover port, and the failover unit
`is adapted to activate the failover port of the surviving
`controller. In another embodiment, the computer program
`product further includes a replacenrent detection unit
`
`5
`
`4
`adapted to instruct a replacement controller to assume the
`identity of the failed controller and respond to instructions to
`the failed controller, thereby rendering replacement of the
`failed controller transparent to tire host conrputer.
`In still another aspect the invention provides a memory
`system for transferring data between a data storage system
`and at least one host computer in response to instructions
`therefrom. The memory system comprising a pair of dual-
`active controllers connected by a fibre channel arbitrated
`10 loop, each controller having a unique identifier, and a means
`for providing a failover mode from a failed controller to a
`surviving controller that is substantially transparent to the
`host computer. In one embodiment, the means for providing
`a failover mode is a computer program product having a
`15 computer program including a loop initialization unit
`adapled to instruct the surviving controller to assume the
`identity of the failed controller and to instruct the surviving
`controller to respond instructions addressed to it and to the
`failed controller.
`
`2O
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`Additional objects and features of the invention will be
`more readily apparent from the following detailed descrip-
`2s tion and appended claims when taken in conjunction with
`the drawings, in which:
`FIG. I is a block diagram of an embodiment of a memory
`system comprising a pair of controllers having a transparent
`failover and failback mechanism according to the present
`3o invention;
`FIG. 2 is a block diagram of another embodiment of a
`lrrcnrory system according to the present invention in an
`environment comprising a pair of host computer systems;
`FIG. 3 is a flowchart showing an cmbodimcnt of a mcthod
`3s of operating the memory system shown in FIG. 1 or FIG. 2
`to provide a transparent failover and failback mechanism
`according to the present invention; and
`FIG. 4 is a block diagram illustrating the hierarchical
`structure of an embodiment of a computer program accord-
`40 . mg to an embodiment of the present invention.
`
`DETAILED DESCRIPTION
`
`The present invention is directed to a memory system
`45 having a number of controllers adapted to transfer data
`between at least one host computer and a data storage
`system, such as one or more Redundant Array of Indepen-
`dent Disks (RAID) storage systems. The controllers are
`coupled to the host computer and one another through a
`so host-side loop to provide a failover and a failback mecha-
`nism for a failed controller that is transparent to the host
`computer. Advantageously, the controllers are connected by
`a fibre channel arbitrated loop (FC-AL). While the invention
`is described using examples of data storage system com-
`55 prising a RAID having nrultiple nragnetic disk drives, the
`present invention can be used with other data storage
`systems, as apparent to those skilled in the art, including
`arrays and individual disk drives in which thc disk drivcs arc
`optical, magnetic, or magneto-optical disk drives.
`FIG. 1 shows a block diagram of an exemplary embodi-
`ment of a memory system 100 according to the present
`invention having a pair of controllers 105 (singularly 105a
`and 105b) coupled to a host computer 110 through a pair of
`host-side loops 15 (singularly 115a and 115b). It is to be
`65 understood that by host-side loop 115 it is meant a commu-
`nication path which connects the controllers 105 to the host
`conrputer 110, and that the host-side loop can also connect
`
`6o
`
`IBM-Oracle 1008
`Page 7 of 12
`
`
`
`US 6,578,158 BI
`
`other devices or systems (not shown) to the host computer.
`The controllers 105 are in turn coupled a data storage system
`120, sho~vn here as a RAID 130 comprising multiple disk
`drives 135, via several device-side loops 140 (singularly
`140a to 140c) also known as disk cha~mels. Alternatively,
`the controllers 105 could also be coupled to the data storage
`system 120 via SCSI buses (not shown). Although FIG. 1
`shows a single pair of controllers 105 coupled by three
`device-side loops 140 to a RAID 130 comprising only
`twelve disk drives 135, the illustrated architecture is extend-
`able to memory systems having any number of controllers,
`disk drives, and device-side loops. For example, the memory
`system 100 can a number, n, of n-way controllers using
`operational primitives in a message passing multi-controller
`non-uniform workload environment, as described in com-
`monly assigned co-pending U.S. patent application Ser. No.
`09/326,497, which is hereby incorporated by reference.
`The host-side loops 115 are made up of several fibre
`channels 145 and a hub 150a, 150b. The term fibre channel
`as used here refers to any physical medium that can be used
`to transmit data at high speed, for example to serially
`transmit data at high speed in accordance with standards
`developed by the American National Standards Institute
`(ANSI), such as for example optical fibre, co-axial cable, or
`twisted pair telephone line. Each of the host-side loops 115
`connect to three nodes or ports, including a single server port
`known as a host bus adapter HBA 155a, 155b, on the host
`computer 110 and to t~vo controller ports 160a, 160b, on
`each of the controllers 105. The host-side loops 115 are
`adapted to enable data and input/output (I/O) requests from
`the host computer 110 to be transferred between any port on
`the loop 115.
`The controllers 105 can be any suitable fibre channel
`compatible controller that can be modified to operate
`according to the present invention, such as for example the
`DAC960SF, commercially available from Mylcx, Inc.,
`Boulder, Colo. Such controllers 105 include, or can be
`modified to include, an active port 165a, 165b, and a failover
`port 166a, 166b, on each controller, and a register (not
`shown) adapted to support the failover and a failback
`mechanism of the present invention. Apair of the controllers
`105 can bc configurcd to opcratc as dual-activc controllcrs
`as described above, or as dual-redundant controllers wherein
`one controller serves as an installed spare for the other,
`which in normal operation handles all I/O requests from the
`host computer 110. Preferably, the controllers 105 operate as
`dual-active controllers to increase the bandwidth of the
`memory system 100. Generally, each or the controllers 105
`have a computer readable medium, such as a read only
`memory (ROM) 170, in which is embedded a computer or
`machine readable code, commonly known as firmware, with
`instructions for configuring and operating the controller, a
`cache 180a, 180b, for temporarily storing I/O requests and
`data from the host computer 110, and a local processor 185a,
`185b, for executing the instructions and requests. The firm-
`ware of each controller is modified to support the failover
`and a failback mechanism of the present invention.
`To enable the controllers 105 to be operated in dual-active
`mode, the controllers on host-side loops llSa, llSb, are
`identified by a unique identifier to permit the host computer
`110 to address an I/O request to a specific controller. In one
`embodiment, the unique identifier includes a non-volatile,
`64 bit World Wide Name (WWN). AWAVN is an identifying
`code that is hardwircd, embedded in the firmware, or oth-
`erwise encoded in a fibre channel compatible device, such as
`the HBA 155a, 155b, or the controllers 105, at the time of
`manufacture. Additionally, the unique identifier includes a
`
`loop identifier (LOOP ID) which is.assigned to each port in
`a host-side loop llSa, 115b, during a system initialization of
`the memory system 100. This LOOP ID can be acquired
`during a Loop Initialization Hard Address (LIHA) phase of
`5 the system initialization, or during a Loop Initialization
`Software Address (LISA) phase. Because not all host com-
`puters have operating systems that support addressing
`schemes using WWNs, for example some legacy host com-
`puter systems, in a preferred embodiment, the unique iden-
`
`10 rifler includes both a WAVN and a LOOP ID to enable the
`memory system 100 of the present invention to be used with
`any host computer 110 independent of the operating system.
`During system initialization, each of the controllers 105
`register the unique identifier of the other controller. This
`15 enables a surviving controller, for example controller 105a,
`to accept and process I/O requests addressed to a failed
`controller, for example controller 105b, by assuming the
`identity of the failed controller.
`The RAID is comprised of multiple virtual or logical
`
`20 volumes. Although the controllers 105 share the same RAID
`130, that is both controllers are connected to every disk drive
`135 in the RAID, preferably each logical volume is under
`the primary control of one of the controllers so that coher-
`ency need not bc maintained between the caches 180a, 180b,
`~_5 of the controllers when they are operated in dual-active
`mode. By primary control it is meant that during normal
`operation each logical volume 135 in the RAID 130 is
`controlled solely by one of the controllers 105. Each logical
`volume is represented by a logical unit number (LUN) to the
`3o host computer 110. Each LUN in turn is associated with the
`unique identifier of one of the controllers 105 so that when
`data needs to be stored in or retrieved from a particular LUN,
`the I/O request is automatically directed to the correct
`controller.
`35 In a preferred embodiment, shown in FIG. 2, reliability is
`further enhanced by providing a clustered cnviromncnt in
`which two host computers 110 (singularly ll0a and ll0b)
`each have direct access to both controllers 105 through a
`number of IIBAs 155a-d. Thus, the failure of a single host
`4o computer ll0a, ll0b, will not result in the failure of an
`entire network of client computers (not shown). In addition,
`as shown in FIG. 2, cach of the controllcrs 105 havc at lcast
`one active port 195a, 195b and one inactive port 200a, 200b.
`The active ports 195a, 195b receive and process I/O requests
`45 sent by the host computers 110 on the host-side loops 115.
`The inactive ports 200a, 200b, also known as a failover
`ports, can process I/0 requests only when the active port
`195a, 195b on the same host-side loop 115a, llSb, has
`failed. For example, in case of failure of controller 105a,
`50 inactive port 200b on surviving controller 105b assumes the
`identity of the active port 195a on failed controller 105a and
`begins accepting and processing I/O requests directed to the
`failed controller 105a.
`In accordance with the present invention, the memory
`55 system further includes a communication path 205 adapted
`to transmit a signal from one controller 105 to another in the
`event of a controller failure. The communication path 205
`can be a Small Computer System Interface (SCSI) bus or a
`fibre channel as described above. It can take the form of a
`6o dedicated high speed path extending directly between the
`controllers 105, as shown in FIG. 1, or one of the device-side
`channels 140a-c (disk channels) which can also serve as the
`communication path 205, as shown in FIG. 2. The signal
`passed between the controllers 105 to indicate controller
`65 failure can be a passive signal, such as for example the lack
`of a proper response to a polling or pinging scheme in which
`each controller interrogates the other at regular, frequent
`
`IBM-Oracle 1008
`Page 8 of 12
`
`
`
`US 6,578,158 BI
`
`7
`intervals to ensure the other controller is operating correctly.
`Alternatively, the signal can be a dynamic signal transmitted
`directly from a failed or failing controller 11}5a, 11}5b, to the
`surviving controller 11}5b, 11}5a, instructing it to initiate a
`failover process or mechanism. Optionally, the communica-
`tion path 205 is also adapted to enable the controllers 105 to
`achieve cache coherency in case of controller failure.
`An cxcmplary mcthod of opcrating thc mcmory systcm
`100 shown in FIG. 2 to provide a failover process that is
`substantially transparent to the host computers ll0a, llb,
`will now be described with reference to FIG. 3. The fol-
`lowing initial actions or steps are required to make the
`failover operation transparent to the host computer. First, in
`a system initialization step 210 each of the controllers 105
`is provided with a unique identifier which is communicated
`to the host computers 110. This step 210 generally merely
`involves querying the controllers 105 to obtain their WWN,
`but it may also include assigning a LOOP ID to each
`controller in a LIHA phase or a LISA phase, as described
`above. The unique identifiers are then registered by the host
`computers 110 and one or more of the LUNs are associated
`with each unique identifier. Next, in a communication step
`215, the unique identifiers and their associated LUNs are
`communicated between the controllers 105 via the commu-
`nication path 205. Each of the controllers 105 assign the
`unique identifier and the associated LUNs of the other
`controller, to its failover port 200a, 200b. This enables a
`surviving controller 105a, 105b to assume the identity of a
`failed controller 105b, 105a, and to accept and process I/O
`requests addressed to it by activating the normally inactive
`or failover port 200a, 200b.
`The memory system 100 is then ready to begin regular
`operations in a dual-active operation step 225 in which the
`controllers 105 both simultaneously receive and process I!O
`requests from the host computers 110. During normal opera-
`tions a fault dctcction stcp 230 is cxccutcd in which thc
`controllers 105 exchange a series oP’pings," also referred to
`as a heart beat signal, the response to which, as described
`above, signals to each controller that the other has not failed.
`This step 230 may also involve a scheme in which a failed
`or failing controller 105a, 105b dynamically signals a sur-
`viving controller 105b, 105a, that a failure has occurred or
`is about to occur.
`On detection of a controller failure, a failover procedure
`is performed on the surviving controller 105a, 105b, the
`failover procedure involves the steps of disabling the failed
`controller (step 235) and assuming the identity of the failed
`controller (step 240). In the disabling step 235, the surviving
`controller 105a, 105b asserts a reset signal, which disables
`the failed controller 105b, 105a by resetting its.local pro-
`cessor 185a, lgSb, and the active port 195a, 195b, fibre
`protocol chip (not shown). Resetting the fibre protocol chip
`causes the hub 150a, 150b to automatically bypass the
`primary port 195a, 195b, on the failed controller 105a, 105b.
`In the assuming identity step 240, the failover port 200a,
`200b of the surviving controller 105a, 105b, begins accept-
`ing and processing I/O requests addressed by the host
`computers ll0a, ll0b, to the failed controller 105b, 105a.
`Preferably, to speed up the failover process the unique
`identifier for the failed controller 105a, 105b, was previ-
`ously assigned to the failover port 200a, 200b, during the
`communication step 215, and the surviving controller 105
`merely activates the failover port 200a, 200b, to enable it to
`bcgin acccpting and proccssing I/O rcqucsts.
`Alter the failover process is completed, the surviving
`controller 105a, 105b, in a resume operation (step 245)
`resumes operations by responding to I/O requests addressed
`
`20
`
`to itself and to the failed controller. The surviving controller
`11}5a, 11}5b, responds to requests to store or retrieve data
`addressed to the failed controller, without any additional
`support from the host computers 111} or the HBAs 155.
`5 Because there is no need to alter the registered unique
`identifiers or the associated LUNs, the failover process is
`transparent to the host computers 110. To the host computers
`110, the delay, if any, caused by the time it takes to detect the
`failed controller 105a, 105b and to perform the loop initial-
`10 ization procedure appears to be no more than a momentary
`loss of power to the memory system 100, which requires the
`host computers to re-transmit the last several commands sent
`to the failed controller.
`Optionally, when the controllers 105 include caches 180a,
`15 180b, the failover process can also include a cache flush step
`(not shown) and a conservative cache mode enable step (not
`shown). The cache flush step prevents the loss of