`
`USOUGSTSISSBI
`
`r12) Umted States Patent
`(10) Patent N0.:
`US 6,578,158 Bl
`E
`
`Dcitz ct al.
`(45) Date of Patent:
`Jun. 10 2003
`
`(54) METHOD AND APPARATUS FOR
`PROVIDING A RAID CONTROLLER
`HAVING TRANSPARENT FAILOVER AND
`FAILMCK
`
`(75)
`
`.
`Inventors: William G. Deitz. leol. (‘0 {US}:
`Keith Short, Ialiayette. CO (US)
`_
`.
`_
`'I
`.. M .
`.
`.
`_
`71
`t J Magnw- 3:333:13 Egifnczsfitglgft
`
`( ' ) Notice:
`
`Subject to an}r disclaimer, the term ofthjs
`patent is extended or adjusted under 35
`USE. 154a,) bv 0 days.
`I
`'
`
`.
`‘1
`(J) App}. No" 09"]429'523
`(22)
`Filed:
`Oct. 28, 1999
`
`7
`
`‘
`Int. ('1'
`(51')
`.
`(52) U..S. Cl.
`(53) Fleld M Search
`
`1
`‘
`G0“! 11:11“
`7l4fll: 714.5
`714K). 7‘ 8.. “.-
`71-9710. 5: 7] 11’ 1 14
`
`(56)
`
`"
`References Liter]
`US. PATENT DOIITUMENTS
`
`‘7 337363}
`3 $743343
`'ifibififiq
`
`J .r .nLu [JE }>}>;3>3*
`
`3:90:73
`5.811764
`
`r
`*
`
`3952““
`3.51093 Walk?! 61 311-
`----- 371-"13-1
`“ 1319‘” “19ml!“ “1 ‘11-
`395575
`llrl'JU-l Holland et at.
`..3‘.‘.t.
`r
`r:
`01
`L
`-
`a I.
`36am.“
`J‘r HI!
`._rl Jo ILICMB Ll al
`5r'l‘J93 Jones
`395.:13211?
`3,1993 Marks et al.
`961998 IJJl ct al. ................=6QSIISNH
`
`
`
`..
`
`
`
`‘mrr
`19999 Espy e101.
`5,922,071 A
`3?llr‘222
`zoom PiI-Bntal
`......
`’1
`6,129,027 A]
`“NIH
`4900] Richardson ..
`"
`“119,353 El
`1: 12/200] Gnflith .__....o..o......,...... 714.“?!
`fijfitlfiS? Bl
`"‘"l'db
`'=
`U L
`y ”drum”
`Primary Exrirrtr'ner——Roberl Beatusoliel
`Assistant Exmtrirrcr—Mare Duncan
`(74) Ammo; Agent, or Firrn—-l)orsey & Whitney LLP
`(57)
`ABSTRACT
`.
`‘
`A method and apparatus [or controlling a memory system
`100 comprising a plurality of controllers 105 connected by
`a fibre channel arbitrated loop 145 to provide transparent
`failover and tailback mechanisms for failed conlrollers. The
`controllers 105 are adapted to transfer data hetwaun a data
`storage system 120 and at least one host computer 110 in
`response to instructions therefrom. ln Ihe method, a unique
`identifier is )rovidecl to each controller 105. The (I eraljon
`l
`P
`of the controllers m5 is then monitored and, when a failed
`controller is detected, a failover procedure is performed on
`a surviving controller. The l‘uilovcr procedure includes (lis—
`ahling the failed controller and instructing the surviving
`controller to assume the identity of the failed controller.
`Thus. the surthng controller is capahle or respondmg to
`Instructions addressed lo 11 and Instructions addressed to the
`
`failed controller. and the failure of the failed controller is
`transparent to the host computer 110. A computer program
`and a computer program product
`for
`implementing the
`.
`method are also provrded.
`
`25 C Iaims. 4 Drawing Sheets
`
`title
`
`1100
`
`HOST COMPUTER
`
`HOST COMPUTER
`
`
`
`VMWARE-1008
`
`Page 1 of 12
`
`VMWARE-1008
`Page 1 of 12
`
`
`
`US. Patent
`
`Jun. 10,2003
`
`Sheet 1 of4
`
`US 6,578,158 B1
`
`HOST COMPUTER
`
`155b
`
`I \
`
`\
`
`m D
`
`VMWARE-1008 / Page 2 of 12
`
`VMWARE-1008 / Page 2 of 12
`
`
`
`US. Patent
`
`.lun.10,2003
`
`Sheet 2 of4
`
`US 6,578,158 B]
`
`1108
`
`11%
`
`HOST COMPUTER
`
`HOST COMPUTER
`
`INACTIVE
`
`2003
`
`200b
`
`FALLOVER
`PORT
`
`VMWARE-1008 / Page 3 of 12
`
`VMWARE-1008 / Page 3 of 12
`
`
`
`US. Patent
`
`Jun. 10, 2003
`
`Sheet 3 of 4
`
`US 6,578,158 Bl
`
`210
`
`215
`
`PROVIDE UNIQUE
`IDENTIFIER
`
`COMMUNICATE
`UNIQUE
`IDENTIFIER
`
`BEGIN DUAL-
`ACTIVE OPERATION
`
`EXCHANGE PINGS
`
`
`
`
`AILED
`CONTROLLER
`DETECTED?
`
`YES
`
`DISABLE FAILED
`CONTROLLER
`
`ASSUME IDENTITY
`OF FAILED
`CONTROLLER
`
`270
`
`235
`
`240
`
`
`
`RESPOND TO IIO
`INSTRUCTIONS FOR
`SURVIVING AND
`
`
`FAILED CONTROLLERS
`
`
`
`
`
`POLL FAILED
`CONTROLLER
`
`
`
`FAILED
`CONTROLLER
`REPLACED?
`
`225
`
`230
`
`YES
`
`COMMUNICATE
`UNIQUE
`IDENTIFIER
`
`255
`
`250
`
`REPLACEMENT
`CONTROLLER
`ASSUMES IDENTITY OF
`FAILED CONTROLLER
`
`
`
`RESUME DUAL-
`ACTIVE OPERATION
`
`
`
`MEMORY
`SYSTEM
`REBOOTED?
`
`YES
`
`FIG. 3
`
`VMWARE-1008 / Page 4 of 12
`
`VMWARE-1008 / Page 4 of 12
`
`
`
`US. Patent
`
`.Iun.10,2003
`
`Sheet 4 of4
`
`US 6,578,158 B1
`
`230
`
`\
`
`CONTROLLER
`
`INITIALIZATION
`UNIT
`
`235
`
`315
`
`REPLACEMENT
`DETECTION
`UNIT
`
`290
`
`FAILURE
`DETECTION
`UNIT
`
`320
`
`FAILBACK
`UNIT
`
`295
`
`FAILOVER
`UNIT
`
`LOOP
`
`325
`
`REINITEWIZATION
`
`300
`
`DISABLING
`
`310
`
`LOOP
`
`INWIJLIIIZITP'ITION
`
`FIG. 4
`
`VMWARE-1008 / Page 5 of 12
`
`VMWARE-1008 / Page 5 of 12
`
`
`
`US 6.578.158 Bl
`
`1
`METHOD AND APPARATUS FOR
`PROVIDING A RAID CONTROLLER
`HAVING TRANSPARENT FA] LUV ICR AN 1)
`FAILBACK
`
`FIELD OI“ THE INVENTION
`
`This invention pertains generally to the field of computer
`memory systems, and more particularly to a method and
`apparatus for controlling redundant arrays of independent
`disks.
`
`BACKGROUND OF THE INVENTION
`
`Modem computers frequently require large, fault-tolerant
`memory systems. One approach to meeting this need is to
`provide a Redundant Array of Independent Disk drives
`(RAID) usually including a plurality of hard disk drives
`operated by a disk array controller that is coupled to a host
`computer. The controller provides. the brains of the memory
`system, servicing all host requests. storing data to or retriev-
`ing it from the RA] D. caching data to provide faster access.
`and handling drive failures without
`interrupting host
`requests. Given the importance of the controller. numerous
`solutions have been suggested minimize the potential for
`interrupted service due to controller malfunction. One such
`solution calls for providing dual-active controllers having
`l'ailover and tailback capabilities. Dual-active controllers are
`a pair of controllers that are connected to each other and to
`all the disk drives in a RAID. In normal operation. input!
`output (I10) requests from the host computer are divided
`between the dual-active controllers to increase the rate at
`which in formation can be transferred to or from the RAID,
`commonly referred to as the bandwidth of the memory
`system. However,
`in the event that one of the controllers
`fails, the surviving controller takes over the functions of the
`failed controller and begins servicing host
`requests
`addressed to the failed controller in addition to those
`addressed to it. The mechanism that allows this is commonly
`known as a t‘ailover mechanism. If the surviving controller
`is able to assume the functions of the failed controller
`without any actions on the part of the host computer. for
`example redirecting 130 requests to the surviving controller.
`the failover mechanism is said to be transparent. Ifthc failed
`controller can be subsequently replaced and normal opera~
`lion resumed without decnergizing or reinitializing the
`controllers the memory system is said to have a failback
`mechanism.
`
`One example of the use of such dual-active controllers is
`described. for example. in US. Pat. No. 5390.775, to Marks
`et 3]. uses dual—active controllers connected to the host
`computer by a Small Computer System Interface (SCSI)
`bus. 'l‘ypicatly. the controllers are also connected to a RAID
`comprising multiple disk drives through a number of ttdtli-
`tionul SCSI buses. Each SCSI device on a bus. such as a
`controller or a disk drive, is assigned one bit as an identifier
`(SCSI ID) to permit the host computer to select a particular
`controller. and the controller to select a particular disk drive.
`Thus, the method permits a maximum of eight devices to be
`identified on a standard 8-bit SCSI bus.
`In addition,
`the
`controllers are connected to one another by a separate
`communications link. and each has access to a cache
`memory in the other. Although both controllers are con—
`nected to every disk drive in the RAID, to permit dual-active
`operation each disk drive is typically under primary control
`of one of the controllers. This is accomplished by dividing
`the RAID into groups of disk drives that appear to the host
`
`2
`computer as a logical drive or unit identified by a logical unit
`number {LUN} and, during initialization, associating each
`LUN with the SCSI ID of a particular controller. In normal
`operation, a controller responds only to lit) requests which
`are addressed to it and which refer to LUNs over which it has
`primary control. However. if a controller fails the remaining
`controller of the pair obtains configuration information,
`including the SCSI ID and the LUNs of the failed controller.
`over the communications link and begins servicing requests
`addressed by the host to the failed controller as well as those
`addressed to itself
`While the above approach has been ell‘ective in reducing
`interruptions in service for memory systems having dual-
`active controllers.
`it
`is limited by the architecture of the
`SCSI bus. Traditionally, SCSI buses have from eight
`to
`sixteen signal lines which allows a maximum of from eight
`to sixteen SCSI devices to be interconnected by the SCSI
`bus at any one time. Thus, systems which use :1 lo~bit wide
`SCSI bus on the host side and 8-bit wide SCSI buses on the
`device side, typically provide for at most six device side
`SC‘SI buses having six disk drives each. Moreover, the above
`approach, which relies on SCSI IDs, has not been imple-
`mented using librc interface type controllers.
`Fibre interface type controllers are coupled to a host
`computer through one or more fibre channels. Fibre channel
`is the general name of a technology using an integrated set
`ofstandards developed by the American National Standards
`Institute (ANSI)
`for high speed, serial communication
`between computer devices. {See for example the ANSI
`standard K3'l‘tt, "Fibre Channel Physical and Signaling
`Interface (PC-Pin,“ Rev 4.3 {1994), hereby incorporated by
`reference.) Manufacturers of RAID systems have been mov—
`ing to fibre channel technology because it allows transmit-
`ting of data between computer devices at rates of over 1
`Gbps (one billion bits per second}, and at distances exceed—
`ing several hundred meters and more. Also, fibre channel
`arbitrator]
`loop (PC-AL) allows for 137 unique loop
`identifiers, one of which unique identities is reserved for a
`fabric loop port.
`The widely accepted approach to providing failovc-rt'
`failbuck capability in RAID systems comprising fibre inter-
`face controllers has been to use dual-active controllers
`coupled by a redirecting driver. In the event of a controller
`failure the redirecting driver shifts host requests. from the
`failed controller to a surviving controller. The failed con-
`troller can then be replaced and the memory system reini—
`tialized to return to normal. dual-active controller operation.
`The redirecting driver can be implemented using a software
`or hardware protocol. One. exemplary redirecting driver is
`disclosed in U.S. Pat. No. 5,237,658, to Walker et aI., hereby
`incorporated by reference. However. one problem associated
`with this type of solution is that it is achieved at the expense
`of added memory system complexity that increases cost and
`decreases bandwidth. In addition. when. as is common, the
`rcdirccting driver is implemented using software in the host
`computer,
`this approach is not
`independent of the host
`computer, and typically requires it special driver for each
`host computer system on which it
`is to lie utilitted. This
`further adds to the cost and complexity, and increases the
`difficulty of installing and maintaining the memory system.
`Accordingly. there is a need for a memory system com—
`prising a number of fibre interface controllers and having a
`faitover mechanism that is transparent to a host computer.
`There is a further need for such a memory system having a
`failback mechanism that
`is also transparent
`to the host
`computer. The present invention provides a solution to these
`and other problems, and offers additional advantages over
`the prior art.
`
`f."
`
`.10
`
`3.5
`
`40
`
`5t]
`
`55
`
`6h
`
`55
`
`VMWARE-1008 I Page 6 of 12
`
`VMWARE-1008 / Page 6 of 12
`
`
`
`3
`SUMMARY or THE INVENTION
`
`US 6,578,158 Bl
`
`.10
`
`3.5
`
`4U
`
`St]
`
`invention provides a memory system and
`The present
`method of operating a memory system. In one embodiment,
`the memory system includes a number of controllers con-
`nected by a fibre channel arbitrat'ed loop to provide trans—
`parent t'ailove-r and t‘ailhack for failed controllers. The con-
`trollers are adapted to transfer data between a data storage
`system and at
`least one host computer
`in response to
`instructions therefrom. In the inventive method, a unique
`identifier is provided to each controller to permit the host
`compute r to address instructions to a specific controller.
`Then. operation of the controllers is monitored and when a
`failed controller is detected, a l'ailover procedure is per-
`formed on a surviving controller. In one embodiment, the
`failover procedure disables the failed controller and assumes
`the identity of the failed controller. Thus,
`the surviving
`controller becomes capable ot‘ responding to inslmctio-ns
`addressed to it and instructions addressed to the failed
`controller, and the failure. of the failed controller is trans-
`parent to the host computer. In one particular embodiment.
`the step of providing a unique identifier to each controller
`preferably includes the step of providing a world wide name
`In each controller. and more preferably the step further
`includes providing a loop identifier to each controller.
`In another aspect the invention provides a memory system
`tor transferring data between a data storage system and at
`least one host computer in response to instructions there-
`from. The memory system includes a pair of dual-active
`controllers connected by a Iibre channel arbitrated loop.
`Each controller has a unique identifier and is adapted to
`assume the identity of a failed controller and to respond to
`instructions addressed to it, thereby rendering failure of the
`failed controller transparent to the host computer. In one
`embodiment. the memory system further includes a com-
`munication path coupling the controllers, the communica-
`tion path being adapted to enable each controller to detect
`failure of the other controller. The present
`invention is
`particularly useful
`for data storage systems comprising
`multiple disk drives coupled to the controllers by disk
`channels. in which at least one disk channel also serves as
`the communication path.
`In yet another aspect the invention provides a computer
`program and a computer program product for operating a
`memory system comprising a plurality of controllers, each
`controller having a unique identifier, and the controllers
`adapted to transfer data between a data storage system and
`at least one host computer in response to instructions there-
`from. The computer program product includes a computer
`readable medium with a computer program stored therein.
`The computer program has a failure detection unit adapted
`to detect a failed controller . A failover unit is adapted to
`enable a surviving controller to respond to instructions
`addressed to it and to instructions addressed to the failed
`controller. The failover unit includes a disabling unit adapted .
`to disable the failed controller. The t'ailovcr unit also
`includes a loop initialization unit. which is adapted to
`instruct a surviving controller to assume the identity of the
`failed controller and to instruct the surviving controller to
`respond to instructions addressed to it and to the failed
`controller as well as instructions addressed to the surviving
`controller. 'l'hus, failure of the failed controller is transparent
`to the host computer. In one embodiment. each controller
`has an active port and a failover port, and the l‘uilover unit
`is adapted to activate the failover port of the surviving
`controller. In another embodiment, the computer program
`product
`further
`includes a replacement detection unit
`
`4
`adapted to instruct a replacement controller to assume the
`identity ofthe failed controller and respond to instructions to
`the failed controller, thereby rendering replacement of the
`failed controller transparent to the host compute r.
`In still another aspect the invention provides a memory
`system for transferring data between a data storage system
`and at least one host computer in response to iostntctions
`therefrom. The memory system comprising a pair of dual-
`active controllers connected by a libre channel arbitratetl
`loop, each controller having a unique identifier. and a means
`for providing a failover mode from a failed controller to a
`surviving controller that is substantially transparent to the
`host computer. In one embodiment, the means for providing
`a failover mode is a computer program product having a
`computer program including a loop initialization unit
`adapted to instruct the surviving controller to assume the
`identity of the failed controller and to instruct the surviving
`controller to respond instructions addressed to it and to the
`failed controller.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`Additional objects and features of the invention will he
`more readily apparent from the following detailed descrip—
`tion and appended claims when taken in conjunction with
`the drawings, in which:
`FIG. 1 is a block diagram ofan embodiment ofa memory
`system comprising a pair ot’controllers having a transparent
`failover and tailback mechanism according to the present
`invention;
`FIG. 2 is a block diagram of another embodiment of a
`memory system according to the present invention in an
`environment comprising a pair of host computer systems;
`FIG. 3 is a flowchart showing an emhodimertt of a method
`of operating the memory system shown in FIG. 1 or FIG. 2
`to provide a transparent failover and fallback mechanism
`aecording to the present invention; and
`FIG. 4 is a block diagram illustrating the hierarchical
`structure- ot' an embodiment of a computer program accord—
`ing to an embodiment of the present invention.
`
`DETAILED DESCRIPTION
`
`VMWARE-1008 I Page 7 of 12
`
`The present invention is directed to a memory system
`- having a number of controllers adapted to transfer data
`between at
`least one host computer and a data storage
`system. such as one or more Redundant Array of Indepen-
`dent Disks (RAID) storage systems. The controllers are
`coupled to the host computer and one another through a
`host-side loop to provide a failover and a failback mecha-
`nism for a failed controller that is transparent to the host
`computer. Advantageously. the controllers are connected by
`a fibre channel arbitratcd loop (FC-AL). While the invention
`is described using examples of data storage system com—
`prising a RAID having multiple magnetic disk drives, the
`present
`it'tV'e't'ttiUt't can he used with other data storage
`systems, as apparent to those skilled in the art. including
`arrays and individual disk drives in which the disk drives are
`optical, magnetic. or magneto-mitical disk drives.
`FIG. 1 shows a block diagram of an exemplary embodi—
`ment of a memory system 100 according to the present
`invention having a pair of controllers 195 (singularly 1051:
`and 105m coupled to a host computer lltl through a pair of
`host-side loops 15 (singularly 115i? and 115th). It is to he
`understood that by host-side loop 115 it is meant a commu-
`nication path which connects the controllers 105 to the host
`computer 110. and that the host-side loop can also connect
`
`on
`
`65
`
`VMWARE-1008 / Page 7 of 12
`
`
`
`US 6,578,158 Bl
`
`5
`other devices or systems {not shown) to the host computer.
`The controllers 105 are in turn coupled a data storage system
`120. shown here its a RAID 130 comprising multiple disk
`drives 135, via several device-side loops I40 (singularly
`14017 to 140c) also known as disk channels. Alternatively.
`the controllers 105 could also be coupled to the data storage
`system 120 via SCSI buses (not shown). Although FIG. 1
`shows a single pair of controllers 105 coupled by three-
`devicc-side loops 140 to a RAID 130 comprising only
`twulve disk drives 135, the illustra tcd architecture is extend-
`ahle to memory systems having any number of controllers.
`disk drives, and device-side loops. For example. the memory
`system 100 can a- number. n, of n-way controllers. using
`operational primitives in a message passing multi—controllcr
`non-uniform workload environment, as described in com-
`monly assigned ctr-pending US. patent application Ser. No.
`09826397, which is hereby incorporated by reference.
`The host-side loops 115 are made up of several
`tlbre
`channels 145 and a hub 150m 15%. The term fibre channel
`as used here refers to any physical medium that can be used
`to transmit data at high speed,
`for example to serially
`transmit data at high speed in accordance with standards
`developed by the American National Standards Institute
`(ANSI). such as for example optical fibre, co—axial cable. or
`twisted pair telephone line. Each of the host—side loops 115
`connect to three nodes or ports, including a single server port
`known as a host bus adapter HBA 155:1, 155b, on the host
`computer 110 and to two controller ports 160d, 160b, on
`each of the controllers 105. The host-side loops 115 are
`adapted to enable data and inputtoutput (It'O) requests from
`the host computer 110 to be transferred betwuen any port on
`the loop 115.
`The controllers 105 can be any suitable tibrc channel
`compatible controller that can be modified to operate
`according to the present invention, such as for example the
`DAC‘JGUSI'". commercially available from Myles.
`lnc.,
`Boulder. Colo. Such controllers .105 include, or can be
`modified to include. an active port 165e, 1651‘). and a failover
`port
`land, 1661:. on each controller. and a register [not
`shown) adapted to support
`the failover and a
`fallback
`mechanism of the present invention. A pair of the controllers
`105 can be configured to operate as dual-active controllers
`as described above, or as dual-redundant controllers wherein
`one controller serves as an installed spare for the other,
`which in normal operation handles all [£0 requests from the
`host computer [10. Preferably, the controllers 105 operate as
`dual-active controllers to increase the bandwidth of the
`memory system 100. Generally. each or the controllers 105
`have a computer readable medium, such as a read only
`memory (ROM) 170, in which is embedded a computer or
`machine readable code. commonly known as firmware, with
`instructions for configuring and operating the controller, a
`cache 180?, 180th, [or temporarily storing [to requests and
`data from the host computer 110, and a local processor 185:3,
`185th, for executing the instructions and requests. The firm-
`ware of each controller is modified to support the failovcr
`and a failback mechanism of the present invention.
`To enable the controllers 105 to be operated in dual-active
`mode,
`the controllers on host-side loops 115a. 115b, are
`identified by a unique identifier to permit the host computer
`110 to address an [m request to a specific controller. In one
`embodiment. the unique identifier includes a non-volatile,
`{1411“ World Wide Name (WWN). AWWN is an identifying
`code that is hardwired, embedded in the firmware, or oth-
`erwise encoded in a fibre channel compatible device, such as
`the NBA lfifin, 1551), or the controllers 105, at the time of
`manufacture. Additionally. the unique identifier includes a
`
`6
`loop identifier (LOOP 1D] which is.assigned to each port in
`a host-side loop llSn, 115b, during a system initialization of
`the memory system 100. This LOOP ID can be acquired
`during a Loop Initialization l-Iard Address (LlllA) phase of
`the system initialization, or during a Loop initialization
`Software Address (LISA) phase. Because not all host com-
`puters have operating systems that support addressing
`schemes using WWNs, for example some legacy host com—
`puter systems, in a preferred embodiment. the unique iden-
`titicr includes both a WWN and a LOOP ll) to enable the
`memory system 100 of the present invention to be used with
`any host computer 110 independent of the operating system.
`During system initialization, each of the controllers I05
`register the unique identifier of the other controller. This
`enables a surviving controller. for example controller 105.1.
`to accept and process ttO requests addressed to a failed
`controller, for example controller 1051‘). by assuming the
`identity of the failed controller.
`The RAID is comprised of multiple virtual or logical
`volumes. Although the controllers 105 share the same RAID
`130, that is both controllers are connected to every d isk drive
`135 in the RAID, preferably each logical volume is under
`the primary control of one of the controllers so that coher—
`ency need not be maintained between the caches 180a. 180b,
`of the controllers when they are operated in dual—active
`mode. By primary control it is meant that during normal
`operation each logical volume 135 in the RAID 130 is
`controlled solely by one of the controllers 105. Each logical
`volume is represented by a logical unit number (LUN) lo the
`host computer 110. Each LUN in turn is associated with the
`unique identifier of one of the controllers 105 so that when
`data needs to be stored in or retrieved from a particular [.UN.
`the Ill)
`request
`is automatically directed to the correct
`controller.
`
`In a preferred embodiment, shown in F1112, reliability is
`further enhanced by providing a clustered environment in
`which two host computers 110 (singularly 110n and 11%}
`each have direct access to both controllers 105 through a
`number of I-IBAs 155n—d. Thus. the failure of a single host
`computer llOrt, 110b, will not result in the failure of an
`entire network of client computers {not shown). In addition.
`nsshown in FIG. 2. each of the controllers 105 have at least
`one active port 19511, 1951:! and one inactive port 2000. 20015.
`The active ports 195a, 195th receiVe and process It‘O requests
`sent by the host computers 110 on the host-side loops 115.
`The inactive ports. 200a, 2001?, also known as a failover
`ports, can process It‘ll] requests only when the active port
`195:3, 19511 on the same host-side loop 115m 115b, has
`failed. For example,
`in case of failure of controller 105a.
`inactive por1200h on surviving controller 105th assumes the
`identity ofthe active port 195:: on failed controller 105a and
`begins accepting and processing [to requests directed to the
`failed Controller 105n.
`
`the memory
`invention,
`In accordance with the present
`system further includes a communication path 205 adapted
`to transmit a signal from one controller 105 to another in the
`event of a controller failure. The communication path 205
`can be a Small Computer System Interface (SCSI) bus or a
`fibre channel as described aboVe. It can take the form of a
`dedicated high speed path extending directly between the
`controllers 105, as shown in FIG. 1. or one of the device-side
`channels 140(t—‘C {disk channels) which can also serve as the
`communication path 205. as shown in FIG. 2. The signal
`passed between the controllers 105 to indicate. controller
`failure can be a passive signal, such as for example the lack
`of a proper response to a polling or pinging scheme in which
`each controller interrogates the other at regular, frequent
`
`lit
`
`3.5
`
`40
`
`5t]
`
`55
`
`bit
`
`65
`
`VMWARE-1008 I Page 8 of 12
`
`VMWARE-1008 / Page 8 of 12
`
`
`
`US 6,578,158 Bl
`
`7
`intervals to ensure the other controller is operating correctly.
`Alternatively, the signal can be a dynamic signal transmitted
`directly from a failed or failing controller 105n, 105i). to the
`surviving controller 105b, 105n, instructing it
`to initiate a
`failover process or mechanism. Optionally, the communica—
`tion path 205 is also adapted to enable the controllers 105 to
`achieve cache coherency in case of controller t'ailurc.
`An exemplary method of operating the memory system
`100 shown in FIG. 2 to provide a failover process that is
`substantially transparent to the host computers lan, 11!).
`will now be described with reference to FIG. 3. The fol-
`
`lowing initial actions or steps are required to make the
`t‘ailover operation transparent to the host computer. First, in
`a system initialization step 210 each of the controllers 105
`is provided with a unique identifier which is communicated
`to the host computers 110. This step 210 generally merely
`involves querying the controllers 105 to obtain their WWN,
`but it may also include assigning a LOOP It) to each
`controller in a [.IHA phase or a LISA phase, as described
`above. The unique identifiers are then registered by the host
`computers .110 and one or more of the LUNs are associated
`with each unique identifier. Next, in a communication step
`215. the unique identifiers and their associated LUNs are
`communicated between the controllers 105 via the commu-
`nication path 205. Each of the controllers 105 assign the
`unique identifier and the associated LUNs of the other
`controller, to its tailriver port 200a. 2001!). This enables a
`surviving controller 105rr, 105!) to assume the identity of a
`failed controller 105b, 105:}, and to accept and process l.-’O
`requests addressed to it by activating the normally inactive
`or failover port 200a, 200i).
`The memory system 100 is then ready to begin regular
`operations in a dual-active operation step 225 in which the
`controllers 105 both simultaneously reeeive and process ltO
`requests from the host computers 110. During normal opera-
`lions a fault detection step 230 is executed in which the
`controllers 105 exchange a series of‘pings," also referred to
`as a heart beat signal, the response to which, as described
`above. signals to each controller that the other has not failed.
`This step 230 may also involve a scheme in which a failed
`or failing controller 105a, 105b dynamically signals a sur-
`viving controller 105b, 105a, that a failure has occurred or
`is about to occur.
`(to detection of a controller failu re, a t'ailover procedure
`is perfomied on the surviving controller 105n, 105b, the
`[ailover procedure involves the steps of disabling the failed
`controller (step 235} and assuming the identity of the 1'ailcd
`controller {step 240). In the disabling step 235. the surviving
`controller [05:1, 105?: asserts a reset signal, which disables
`the failed controller 105b, 105c by resetting its.]ocal pro-
`cessor 185a, 1855, and the active port 195:3, 195b, fibre
`protocol chip (not shown). Resetting the fibre protocol chip
`causes the hub 150:1, 150!) to automatically bypass the
`primary port 195a, 195b, on the failed controller lllSn, 105b.
`In the assuming identity step 240, the failover port 201M,
`200th of the surviving controller 105o, 105b, begins accept-
`ing and processing U0 requests addressed by the host
`computers llllrr, 110b, to the failed controller 1051), 105a.
`Preferably,
`to speed up the failover process the unique
`identifier for the failed controller 105a, 105b, was previ—
`ously assigned to the failover port 2000, 2110!), during the
`communication step 215. and the surviving controller 105
`merely activates the t'ailover port 200a. 200b, to enable it to
`begin accepting and processing [JO requests.
`the surviving
`After the failove-r process is completed,
`controller 105”, 105th,
`in a resume operation (step 245)
`resumes operations by responding to ItO requests addressed
`
`8
`to itselfand to the failed controller. The surviving controller
`105n. 105b, responds to requests to store or retrieve data
`addressed to the failed controller, without any additional
`support from the host computers 110 or the lIBAs 155.
`Because there is no need to alter the registered unique.
`identifiers or the associated LUNs, the failover process is
`transparent to the host computers 110. To the host computers
`110. the delay, it any. caused by the time it takes to detect the
`failed controller 1050, 105!) and to perform the loop initial-
`ization procedure appears to be no more than a momentary
`loss of power to the memory system 100, which requires the
`host computers to restransmit the last several commands sent
`to the failed controller.
`
`Optionally, when the controllers 105 include caches 180rr.
`180h. the t'ailover process can also include a cache flush step
`(not shown) and a conservative cache mode enable step {not
`shown). The cache flush step prevents the loss of data that
`was presented with good status to the host computers 110
`because the data has been written to both caches 18%, 1806.
`but has not actually been written to the data storage system
`120 before the controller
`failure. The cache flush step
`commits this data to the data storage system 120. Enabling
`conservative cache mode minimizes the chance of data
`being lost while operating with a single controller 105.7,
`105b, in failover mode, by ensuring that all data is written
`to the data storage system 120 prior to a good status signal
`being sent.
`In another aspect, the present invention is directed to a
`memory system 100 having a failover mechanism. such as
`the one described above, that further includes a tailback
`process or mechanism that is substantially transparent to the
`host computers lllln, 11033. To be transparent
`to the host
`computers 110a, two, the failback mechanism should sup-
`port a hot swap of a failed controller 1050. 105th. By hot
`swap it is meant the failed controller 1050, 105.6 is removed
`and a replacement controller (not shown) put
`in service
`without rte—energizing or re—booting the memory system 100
`andflor the host co