`
`US0lJ65?8]58Bl
`
`(12) Umted States Patent
`Dcitz ct al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 6,573,158 B1
`Jun. 10, 2003
`
`(54) MI':THOl) ANDA|'l’ARA'I‘LIs 1901:
`pR()v[1)1NG A RA") c()N1‘R()111‘[gR
`HAVING TRANSPARENT F-AILOVER AND
`
`(75)
`
`Inventors: William G. Deiu, Niwol. U0 (us):
`Keith Sim": I*"':“Y‘7““- CO (US)
`_
`_
`_
`_
`fl
`_
`.
`'“‘'?‘‘'‘=‘‘'‘'"*'' ““5'"*** M“'~'“"'~’*
`Curpumlion. Arrnnnk, NY (US)
`
`A
`_
`1
`(13 M5'3"°°-
`
`( ’ ) Nntice:
`
`Subject to any di5claimcr,thelcrn1ofllu's
`palm." is cxwnilgd Ur airjuslw “mar 35
`U_5_(~_ 15403) by U dd),s_
`'
`
`(Zn App!‘ No": 09'M29'523
`(22)
`Filed:
`Oct. 28, I999
`
`.
`(SD Int‘ ('1'?
`(52
`U.S. Ci.
`(53) Fleid ‘if Sean‘-31'!
`
`,
`‘
`Gael‘ um“
`714,311; 71435
`71435. 7. 3. 1].
`714:’ 710. 52 7] 1."'11‘l
`
`(55)
`
`References Cited
`U3 PM-EN-1~ DOCUMENTS
`
`5.-33?_.-‘:58 A
`8_«’l‘-W3 Walk?! EH11.
`39530’-'
`5.274,-‘:45 A “ 12/1903 irtlir.-111-arr el al. 37'-"'13-1
`‘ ‘
`f
`3.
`.
`.
`‘
`5-ll’?-"’6° "
`””W4 H‘’''‘‘"‘' “‘ "1'
`3'95'’575
`..
`3,353,231; A *
`frltilitu Ictcrsen er .1I.
`3‘J_:1,11SI_:
`)_.T57_,fi-'13 A
`_'\.ai‘J93 Jones
`3-b4;"l34
`5,T‘Jl'l,775 A ‘
`3,«"i(J93 Marks el :l|.
`......... . 3‘J5.I'l32.i.l7
`:'».812.?54 A *
`FHI998 hui cl al.
`.............. . 35'5.»'l32.f_|4
`
`
`
`5,922,077 A
`fu,l2*J,{J2'1‘ Al
`l'I,2l'~J,?S3 Bl
`5.3.-'W,l()87 Bl
`
`*
`:
`
`914:7
`751999 Espy 1-1111.
`37115222
`2;2m1 f«.'I_-Iautai
`......
`'1-"lI1'll+l
`4,300] Rtehardson ..
`l.2r..rJ|-ll Gflfllih .......................... 714.“?
`
`
`
`I "ml by mmmr
`Pr1'nmr_3' Exrrrr1iner——Ruberl Beatusclliel
`A.ss'.£sInru [:”.1'mm‘ncr—Marc Duncan
`(74) Attorney, Agm'r_. o1'I“irm—-l)orsey & Whitney 1.1.13
`(57)
`ABSTRACT
`I
`‘
`A n1eIhoLl uucl apparatuu [or ‘controlling :1 memory system
`100 eornpnslng a plurality D1 controllers 105 connected by
`11 fibre channel arbitrated [crop 145 to provide transparent
`fuilcwer and fuilbaelrt mcchan isms lbr failed conlrollers. The
`controllers 105 art: adapted In transfer data hclwcun a data
`storage system 120 and :11 ltZ.i]51 um: host computer 110 in
`response 10 instructions Ihcrefrcrm. In Ihe rnutlmd. .9 unique
`identifier is prrnvidecl to each scunlruller 105. The ciperalion
`ofmc Lnntrouem 195 [5 mm mm-limmd and, when a faik_.(|
`r.'on1ml1er is rlelecled, a failtwer procedure is ]Jerforme.::l on
`:1 surviving. eonlruller. The failcwcr pruccclun: includes (|is—
`ahliug the l':ril::cl cunlrullcr and inslrucI.ing the survixring
`controller 10 assume the identity L11‘ the i‘a1"lecl Controller.
`Thus. true surviving controller is. capalale of rcsptanding to
`rnslnlclmns acldrc-s.*sc(l lo 11 and mstructtons addressed In the
`failed controller. and the failure of the failed cunlmllcr is
`transparent lo Ihc ho.-st 1.-umput-;-r 110. A uonnpulcr program
`'
`and a computer program product
`for
`implementing the
`mmmd are also pmvidcd_
`
`25 Claims. 4 Drawing Sheets
`
`HOST COMPUTER
`
`HOST COW-‘U'|'ER
`
`1
`
`VMWARE, INC. 1008
`
`VMWARE, INC. 1008
`
`1
`
`
`
`4|.H8fl3POS
`
`Imm
`
`Bm20.
`
`0
`
`4
`
`(DSU
`
`11B85J.8HI.5,
`
`fR
`
`m
`
`
`
`11M1m...W.hWGQ,..H
`
`2
`
`2
`
`
`
`U.S. Patent
`
`Jun. 10, 2003
`
`Sheet 2 M4
`
`US 6,578,158 B1
`
`1103
`
`1‘I0b
`
`HOST COMPUTER
`
`HOST COMPUTER
`
`FALLOVER
`PORT
`
`INACTIVE
`
`
`
`
`105a
`
`105b
`
`:L IN)
`
`.__(‘cs1 \ \—-——.l-.-.______-_____.-——————-——--———-—__—— - _ - _ _ _ _-....___________________.n
`
`3
`
`
`
`U.S. Patent
`
`Jun. 10, 2003
`
`sheet 3 of4
`
`US 6,578,158 B1
`
`
`
`
`
`RESPOND ToIIo
`INSTRUCTIONS FOR
`SURVIVING AND
`FAILED CONTROLLERS
`
`
`
`
`POLL FAILED
`CDNTRCLLER
`
`FAILED
`CONTROLLER
`REPLACED?
`YES
`
`
`
`COMMUNICATE
`UNIQUE
`IDENTIFIER
`
`255
`
`225
`
`230
`
`START
`
`PROVIDE UNIQUE
`IDENTIFIER
`
`CCMMUNICATE
`UNIQUE
`IDENTIFIER
`
`BEGIN DUAL.
`ACTIIIE OPERATION
`
`210
`
`215
`
`
`
`EXCHANGE PINGS
`
`
`
`CON%|g-EELER
`
`DETECTED?
`
`YES
`
`REPLACEMENT
`
`CONTROLLER
`ASSUMES IDENTITY CF
`FAILED CONTROLLER
`
`-
`
`RESUME DUAL-
`ACTIVE DPERATIDN
`
`250
`
`
`CONTROLLER
`
`235
`
`240
`
`DISABLE FAILED
`CDNTRCLLER
`
`ASSUME ,DENT,TY
`OF FAILED
`
`270
`
`FIG. 3
`
`4
`
`
`
`U.S. Patent
`
`Jun. 10, 2003
`
`sheet 4 M4
`
`US 6,578,158 B1
`
`230
`
`\
`
`285
`
`CONTROLLER
`INITIALIZIWON
`UNIT
`
`315
`
`REPLACEMENT
`DETECTION
`UNIT
`
`290
`
`FAILURE
`DETECTION
`UNIT
`
`320
`
`FAILBACK
`uNIT
`
`295
`
`FAILOVER
`UNIT
`
`325
`
`LOOP
`
`RE|N|Tw%AT|0N
`
`300
`
`DISABLING
`
`310
`
`LOOP
`
`INITINJLWION
`
`FIG. 4
`
`5
`
`
`
`US 6.578.158 B1
`
`1
`METHOD AND APPARATUS FOR
`I’R(JVIDING A RAID CONTRULLl'IR
`HAVING TRANSW-\Rl*lNT FAI LUVICR AN 1)
`FAILBACK
`
`FIELD OF THE IN VliN'l’I(JN
`
`This invention pertains generally to the field of computer
`memory systems. and more particularly to a method and
`apparatus for controlling redundant arrays of independent
`disks.
`
`BACKGROUND OF THE INVENTION
`
`Modem computers frequently require large. fault-tolerant
`memory systems. One approach to meeting this need is to
`provide it Redundant Array of Independent Disk drives
`(RAID) usually including a plurality of hard disk drives
`operated by a disk array controller that is coupled to it host
`computer. Tl'te controller provides the brains of the memory
`system, servicing all host requests. storing data to or retriev-
`ing it from the RA] D. caching data to provide faster access.
`and handling drive failures without
`interrupting host
`requests. Given the importance of the controller. numerous
`solutions have been suggested minimize the potential for
`interrupted service due to controller malfunction. One such
`solution calls for providing dual-active controllers having
`failover and fallback capabilities. Dual-active controllers are
`a pair of controllers that are connected to each other and to
`all the disk drives in a RAID. In normal operation. input!
`output (U0) requests from the host computer are divided
`between the duaI—active controllers to increase the rate at
`which information can he transferred to or from the RAID,
`commonly referred to as Lhc bandwidth of the memory
`system. However,
`in the event that one of the controllers
`fails, the surviving controller takes over the functions of the
`failed controller and begins servicing host
`requests
`ad(|rcssed to the failed controller in addition to those
`addressed to it. The mechanism that allows this is cornrnonly
`known as it failover mechanism. If the surviving controller
`is able to assume the functions of the failed controller
`without any actions on the part of the host computer. for
`example redirecting 130 requests to the surviving controller.
`the failover mechanism is said to be transparent. Ifthc failed
`controller can be subsequently replaced and normal opera-
`tion rcsurned without decnergizing or rc-initializing the
`controllers the memory system is said to have a failt:-aelt
`mechanism.
`
`One example of the use of such dual-active controllers is
`described. for example. in U.S. Pat. No. 5_.'r"Jt],'z"7S, to Marks
`et al. uses dual—active controllers connected to the host
`computer by a Small Computer System Interface (SCSI)
`bus. Typically. the controllers are also connected to a RAID
`coruprising multiple disk drives through a number of tt.(ltlI—
`tional SCSI buses. Each SCSI device on a bus. such as a
`controller or a disk drive. is assigned one bit as an identifier
`(SCSI ID) to permit the host computer to select a particular
`controller. and the controller to select a particular disk drive.
`Thus. the method permits a maximum of eight devices to be
`identitied on a standard til-bit SCSI bus.
`In addition,
`the
`controllers are connected to one another by a separate
`communications link. and each has access to a cache
`memory in the other. Although both controllers are con-
`nected to every disk drive in the RAID. to pemiit dual-active
`operation each disk drive is typically under primary control
`of one of the controllers. This is accomplished by dividing
`the RAID into groups of disk drives that appear to the host
`
`2
`computer as a logical drive or unit identified by a logical unit
`number {LUN} and. during initialization. associating each
`I..UN with the SCSI ID of a particular controller. In normal
`operation, a controller responds only to H0 requests which
`are atldressed to it and which refer to I_.UI*-ls over which it has
`primary control. However. if a controller fails the remaining
`controller of the pair obtains configuration information,
`including the SCSI ID and the LUNs of the failed controller.
`over the communications link and begins servicing requests
`addressed by the host to the failed controller as well as those
`addressed to itself
`While the above approach has been effective in reducing
`interruptions in service for memory systems. having dual-
`active controllers.
`it
`is lirnited by the architecture of the
`SCSI bus. Traditionally, SCSI buses have from eight
`to
`sixteen signal lines which allows a maximum of from eight
`to sixteen SCSI devices to be interconnected by the SCSI
`bus at any one time. Thus, systems which use :1 16-bit wide
`SCSI bus on the host side and S-hit wide SE‘SI buses on the
`device side, typicttlly provide for at most six device side
`SCSI buses having six disk drives each. Moreover, the above
`approach. which relies on SCSI IDs, has not been imple-
`mented using librc interface type controllers.
`Iiibre interface type controllers are coupled to :1 host
`computer through one or more fibre channels. Fibre channel
`is the general name of a technology using an integrated set
`ofstttndards developed by the American National Standards
`Institute (ANSI)
`for high speed. serial communication
`between computer devices. {See for example the ANSI
`standard K3Tll, "Fibre Channel Physical and Signaling
`Interface (PC-PlI),“ Rev -1.3 (1994). hereby incorporated by
`reference.) Manufacturers of RAID systems have been mov-
`ing to fibre channel technology because it allows transmit-
`ting of data between computer devices at rates of over 1
`Gbps (one billion bits per second}. and at distances exceed-
`ing several hundred meters and more. Also. fibre channel
`ttrhitrated loop (PC-AL) allows for 137 unique loop
`identifiers. one of which unique identities is reserved for :1
`fabric loop port.
`The widely accepted approach to providing failovc-rs‘
`failbuck capability in RAID systems comprising fibre inter-
`face control lcrs has been to use dual-active controllers
`coupled by a redirecting driver. In the event of a oontrollcr
`failure the redirecting driver shifts host requests from the
`failed controller to a surviving controller. The failed con-
`troller can then be replaced and the memory system reini-
`tialized to return to normal. dual-active controller operation,
`The redirecting driver can be implemented using a software
`or hardware protocol. One exemplary redirecting driver is
`disclosed in U.S. Pat. No. 5,237,653. to Walker et aI., hereby
`incorporated by reference. I-Iowever. one problem associated
`with this type. of solution is that it is achieved at the expense
`of added memory system complexity that increases cost and
`decrcams bandwidth. In addition. when. as is common. the
`rcdirccting driver is implemented using software in the host
`computer.
`this ttpprouch is not
`independent of the host
`computer. and typically requires it. special driver for each
`host computer system on which it
`is to be utilised. This
`further adds to the cost and complexity. and increases the
`difliculty of installing and maintaining the memory systeni.
`Accordingly. there is a need for a memory system com-
`prising a number of fibre interface controllers and having a
`failover mechanism that is transparent to a host computer.
`There is a further need for such at memory system having a
`failback mechanism that
`is also transparent
`to the host
`computer. The present invention provider. a solution to these
`and other problems. and offers additional advantages over
`the prior art.
`
`-'.n
`
`I0
`
`40
`
`St]
`
`55
`
`an
`
`55
`
`6
`
`
`
`3
`SUMMARY or me‘ INVENTION
`
`US 6,578,158 B1
`
`.10
`
`35
`
`40
`
`50
`
`invention provides a memory system and
`The present
`method of operating a memory system. In one embodiment,
`the memory system includes a number of controllers con-
`nected by a fibre channel arbitrated loop to provide trans-
`parent failover and lailliack for failed controllers. The con-
`trollers are adapted to transfer data between a data storage
`system and at
`least one host computer
`in response to
`instructions therefrom. In the inventive method, a unique
`identifier is provided to each controller to permit the host
`compute r to address instructions to a spccitic controller.
`Then, operation of the controllers is monitored and when a
`failed controller is detected, a failover procedure is per-
`formed on a surviving controller. In one cmborlirnent, the
`failover procedure disables the failed controller and assumes
`the identity of the failed controller. Thus,
`the surviving
`controller becomes capable of responding to instructions
`addressed to it and instructions addressed to the failed
`controller, and the failure of the failed controller is trans-
`parent to the host computer. In one particular embodiment.
`the step of providing a unique identifier to each controller
`prc-furably includes the step of providing a world wide name
`to each controller. and more preferably the step further
`includes providing a loop identifier to each controller.
`In another aspect the invention provides it memory system
`for transferring data between it data storage system and at
`least one host computer in response to instructions there-
`from. The memory system includes a pair of dual-active
`controllers connected by a libre channel arbitrated loop.
`Each controller has a unique identifier and is adapted to
`assume the identity of a failed controller and to respond to
`instructions addressed to it, thereby rendering failure of the
`failed controller transparent to the host computer. In one
`embodiment. the memory system further includes a corn-
`municat.ion p.-tth coupling the controllers, the communica-
`lion path being adapted to enable each controller to detect
`failure of the other controller. The present
`invention is
`particularly useful for data storage systems comprising
`multiple disk drives coupled to the controllers by disk
`channels. in which at least one disk channel also serves as
`the contntunicntion path.
`In yet another aspect the invention provides a computer
`program and a computer program product for operating at
`memory system comprising a plurality of controllers, each
`controller having it unique identifier, and the controllers
`adapted to transfer data between a data storage system and
`at least one host computer in response to instructions there-
`from. The computer program product includes a computer
`readable medium with a computer program stored therein.
`The computer program has a failure detect ion unit adapted
`to detect a failed controller . A failover unit is adapted to
`enable at surviving controller to rcspond to instructions
`addressed to it and to instructions addressed to the failed
`controller. The fttilover unit includes a disabling unit adapted .
`to disable the failed controller. The failovcr unit also
`includes a loop initialization unit, which is adapted to
`instruct a surviving controller to assume the identity of the
`failed controller and to instruct the surviving controller to
`respond to instructions addressetl
`to it and to the failed
`controller as well as instructions addressed to the surviving
`controller. Thus, failure of the failed controller is transparent
`to the host computer. In one embodiment. each controller
`has an active port and a failover port, and the failovcr unit
`is adapted to activate the fttilover port of the surviving
`controller. In another ernbodirnenl, the computer program
`product
`further
`includes a replacement detection unit
`
`4
`adapted to instruct rt replacement controller to assume the
`identity ofthc. failed controller and respond to instructions to
`the failed controller, thcrchy rendering replacement of the
`failed controller transparent to the host computer.
`In still another aspect the irtverttion provides a memory
`system for transferring data between a data storage system
`and at least one host computer in response to instructions
`therefrom. The memory system comprising a pair of dual-
`activc controllers connected by a libre channel arbitratetl
`loop, each controller having a unique identifier. and a means
`for providing at failover mode from 21 failed controller to a
`surviving controller that is substantially transparent to the
`host computer. In one embodiment, the means for providing
`a failover mode is rt computer program product having a
`computer program including a loop initialization urtil
`adapted to instruct the surviving controller to assume the
`idcntity of the failed controller and to instruct the surviving
`controller to respond instructions addressed to it and to the
`failed controller.
`
`BRIEF DESC.'RIP'l'lON OF Tl-IE DRAWINGS
`
`Additional objects and features of the invention will be
`more readily apparent from the following detailed descrip-
`tion and appended claims when taken in conjunction with
`the drawings, in which:
`FIG. 1 is it block diagram ofan embodiment ofa memory
`system comprising a pair ofcontrollers having a transparent
`failovcr and tailback meeltanism according to the present
`invention;
`FIG. 2 is a block diagram of another embodiment of a
`memory system according to the present invention in an
`environment comprising a pair of host computer systems;
`FIG. 3 is rt flowchart showing an embodiment of a method
`of operating the memory system shown in FIG. 1 or FIG. 2
`to provide a transparent failover and fallback mechanism
`according to the present invention; and
`FIG. 4 is a block diagram illustrating the hierarchical
`stntcturc of an embodiment of a computer program accord-
`ing to an embodiment of the present invention.
`
`DETAl|.ED I)E.'iCRIl"l"lflN
`
`The present invention is directed to a memory system
`- having a number of controllers adapted to transfer data
`between at
`least one host computer and it data storage
`system. such as one or more Redundant Array of Indepen-
`dent Disks (RAID) storage systems. The controllers are
`coupled to the host computer and one another through it
`host-side loop to provide a failover and a failback mecha-
`nism for a failed controller that is transparent to the host
`computer. Advantageously, the controllers are connected by
`a fibre channel arbitrated loop (FC-AL). While the invention
`is described using examples of data storage system com-
`prising it RAID having multiple magnetic disk drives, the
`present
`invention can be used with other data storage
`systems, as apparent to those skilled in the art. including
`arrays and individual disk drives in which the disk drives are
`optical, magnetic. or rriagneto-optical disk drives.
`FIG. I shows a block diagram of an exemplary embodi-
`ment of a memory system 100 according to the present
`invention having :1 pair of controllers lfl5 (singularly 105;:
`and l05b) coupled to a host computer llll through a pair of
`host-side loops 15 (singularly 115i: and 115th). It is to he
`understood that by host-side loop 115 it is tneant a continu-
`niczttion path which connects the controllers 105 to the host
`computer 110. and that the liosl-side lo-op can also connect
`
`an
`
`55
`
`7
`
`
`
`US 6,578,158 B1
`
`5
`other devices or systems {not shown) to the host computer.
`Tlic controllers 105 are in turn coupled at data storage system
`120. shown here as a RAID 130 comprising multiple disk
`drives 135, via several device-side loops I40 (singularly
`14047 to 140:‘) also ltnowrt as disk channels. Alternatively.
`the controllers 105 could also be coupled to the data storage
`system 120 via SCSI buses (not shown). Although FIG. 1
`shows a single pair of controllers 105 coupled by tltree
`device-side loops 140 to it RAID 130 comprising only
`twelve disk drives 135, the illustrated architecture is extend-
`able to memory systems having any number of controllers.
`disk drives, and device-side loops. For example. the memory
`system 100 can a- number. n, of n-way controllers using
`operational primitives in a message passing multi—eontro1lcr
`non-uniform workload environment, as described in com-
`monly assigned co-pending U.S. patent application Ser. No.
`U9_t’326,497, which is hereby incorporated by reference.
`The host-side loops 115 are made up of several fibre
`channels 145 and a hub 150a. l.5l]b. The term Iihre channel
`as used here refers to any physical medium that can be used
`to transmit data at high speed,
`for example to serially
`transmit data at high speed in accordance with standards
`developed by the American National Standards Institute-
`(ANSIJ, such its for example optical fibre, coaxial cable. or
`twisted pair telephone line. Each of the host—.-tide loops 115
`connect to three nodes or ports, including :1 single server port
`known as a host bus adapter l-IBA 155:1, 155b, on the host
`computer 110 and to two controller ports 160:1, Ifillb, on
`each of the controllers 105. The host-side loops 115 are
`adapted to enable data and inputtoutput (IEO) requests frorrt
`tlte host computer 110 to be transferred between any port on
`the loop 115.
`The controllers 105 can be any suitable tihre channel
`compatible controller that can he modilied to operate.
`according to the present invention. such as for example the
`DACQGUSF. commercially available [rom Myles.
`lnc.,
`Boulder. Colo. Such controllers .105 include, or can be
`modified to include. an active port 165a, 1651‘). and at failover
`port
`lfifirt, 1661:, on each controller. and :1 register [not
`shown) adapted to support
`the llailover and a Failback
`mechanism of the present invention. /\ pair of the controllers
`105 can be configured to operate as dual-active controllers
`as [It.’:SCfil'tt'2d above, or as dual-redundant controllers wherein
`one controller serves as an installed span: for the other,
`which in normal operation handles all IEO requests from the
`host computer lli}. Preferably, the controllers I05 operate as
`dual-active controllers to increase the bandwidth of the
`memory system 100. Generally. each or the controllers 105
`have a computer readable mecliurn, such as a read only
`memory (ROM) 170, in which is embedded at computer or
`machine readable code, commonly known as lirrnware, with
`instructions for configuring and operating the controller, a
`cache 180a, 180b, for temporarily storing [IO requests and
`data from the host computer 110, and a local processor 185:3,
`185.-la, lor executing the instructions and reqtt.est.s. The firm-
`ware of each controller is utodilied to support the [allover
`and ti failhaclt mechanism of the present invention.
`To enable the controllers 105 to be operated in dual-active
`mode,
`the controllers on host-side loops 115:1. I15b, are
`itlentitied by a unique identifier to permit the host computer
`110 to address an [E0 request to a specific controller. In one
`embodiment, the unique identifier includes a non-volatile,
`64 ‘oil World Wide Name (WWN). AWWN is an identifying
`code that is hardwired, ernbedded in the linrtware, or oth-
`crwise encoded in ii lihrc channel compatible device, such as
`the IIBA lfifin, I55b, or the controllers 105, at the time of
`trtanufacrurc. Additionally. the unique identifier includes a
`
`6
`loop identifier (IDDP ID] which is.nssigned to each port in
`a host-side loop 115:1, 115b, during a system initialization of
`the mcrnory system 100. This LOOP ID can he acquired
`during rt Loop Irtitittlizatiort I-lard Address (LIII/\} phase of
`the system initialization, or during a Loop Initialization
`Software Address {I.|SA.) phase. Because not all host corn-
`puters have operating systems that support addressing
`schemes using Wwhls, for example some legacy host com-
`puter systems, in a preferred embodiment. the unique iden-
`titier includes both a WWN and B LOOP II) to enable the
`memory system 100 of the present invention to be used with
`any host computer 110 independent of the operating system.
`During system initialization, each of the controllers I05
`register the unique identifier of the other controller. This
`enables a surviving controller. for example control ler 105:1.
`to accept and process HO requests addressed to a failed
`controller, for example controller 105;‘), by assuming the
`identity of the failed controller.
`The RAID is comprised of multiple virtual or logical
`volumes. Although the corttrollcrs 105 share the same RAID
`130, that is both controllers are connected to every (I isl-L drive
`135 in the RAID, preferably each logical volume is under
`the primary control of one of the controllers so that coher-
`ency need not he maintained between the caches 1801:. 180b,
`of the controllers when they are operated in dual—active
`mode. By primary control it is meant that during normal
`operation each logical volume 135 in the RAID 130 is
`controlled solely b_v one of the controllers 105. Each logical
`volume is represented by a logical unit number (LUN) to the
`host computer 110. Each LUN in turn is associated with the
`unique identifier of one of the controllers 105 so that when
`data needs to be stored in or retrieved from a particular LUN.
`the 110 request
`is automatically directed to the correct
`controller.
`
`In a prcfcrrccl embodiment. shown in FIG. 2, rt:-liability is
`further enhanoctl by providing a clustered environment in
`which two host computers 110 (singularly llflrt and 11%}
`each have direct access to both con|.rol.lers 105 through a
`number of I-IBAs l55rt—d. Thtt.-st the failure of a single host
`computer ll0o, Illib, will not result in the {a.iIttre of an
`entire network of client computers {not shown). In addition,
`assltown in FIG. 2. each of the controllers 105 have at least
`one active port 195:1, 1951:! and one inactive port 200a. 200:5.
`The active ports 195:3, I95!) receive and process 130 requests
`sent by the host computers 110 on the host-side loops 115.
`The inactive ports 200a, 200b, also known as a lailover
`ports, can process IEO requests only when the active port
`195:3, 195!) on the same host-side loop 115a. 1154?), has
`failed. For example,
`in case of failure of controller 105:2.
`inactive port 200?: on surviving, controller 10551 assumes the
`identity ofthc active port 195:: on failed controller 105a and
`begins accepting and processing [IO requests directed to the
`failed controller l05rr.
`
`the memory
`invention,
`in accordance with the present
`system] lttrther includes a connrnunication path 205 adapted
`to transmit a signal from one controller 105 to another in the
`event of it controller failure. The communication path 205
`can he a Small Computer System Interface (SCSI) bus or a
`l'IlJre channel as described above. It can take the form of a
`dedicated high speed path extending directly between the
`controllcrs1I|5, as shown in FIG. I. or one of the device-sidc
`channels l40n—c {disk channels) which can also serve as the
`communication path 205. as shown in FIG. 2. The signal
`passed between the controllers 105 to indicate controller
`failure can he a passive signal. such as for example the lack
`of a proper response. to El polling or pinging scheme. in which
`each controller interrogates the other at regular, frequent
`
`til
`
`35
`
`40
`
`50
`
`55
`
`on
`
`55
`
`8
`
`
`
`US 6,578,158 B1
`
`7
`intervals to ensure the other controller is operating correctly.
`Alternatively, the signal can be a dynamic signal transmitted
`directly from it failed or failing controller 105:1. 105?). to the
`surviving controller 1051.’), 105:1, instructing it
`to initiate a
`failover process or mechanism. Optionally, the communica-
`tion path 205 is also adapted to enable the controllers 105 to
`achieve cache coherency in case of controller failure.
`An exemplary method of operating the memory system
`100 shown in FIG. 2 to provide a failover process that is
`substantially transparent to the host eontputers 110:1, 11!).
`will now be described with reference to FIG. 3. The fol-
`
`lowing initial actions or steps are required to make the
`liailovcr operation transparent to the host computer. First, in
`a system initialization step 210 each of the controllers 105
`is provided with a unique identifier which is communicated
`to the host computers 110.
`'Il:tis step 210 generally merely
`involves querying the controllers 105 to obtain their WWN,
`but it may also include assigning a LOOP ii) to each
`controller in a LIHA phase or a LISA phase, as described
`above. The unique identifiers are then registered by the host
`cornputcrs .110 and one or more of the LUNs are associated
`with each unique identifier. Next, in a corrununicattion step
`215. the unique identifiers and their associated LUNs are
`communicated between the controllers 105 via the commu-
`nication path 205. Each of the controllers 105 assign the
`unique identifier and the associated I.UNs of the other
`controller, to its failovcr port 200:1. 20015. This enables a
`surviving controller 105:1. 105!) to assume the identity of a
`failed controller 105b, 105.-1. and to accept and process 1.30
`requests addressed to it by activating the normally inactive
`or failovcr port 200:1, 20%.
`The memory system 100 is then ready to begin regular
`operations in a dual-active operation step 225 in which the
`controllers 105 both simultaneously receive and process 1:0
`requests from the host computers 110. During normal opera-
`tions a fault detection step 230 is executed in which the
`controllers 105 exchange a series of“pings," also referred to
`as a heart beat signal. the response to which. as described
`above. signals to each controller that the other has not failed.
`This step 230 may also involve a scheme in which a failed
`or failing controller 105r:. 105!) dynamically signals a sur-
`viving controllcr 105b, 105:1, that a failure has occurred or
`is about to occur.
`(to detection of a controller failu re, a failover procedure
`is performed on the surviving controller 105:1, 105b, the
`[allover procedure involves the steps of disabling the failed
`controller (step 235} and assuming the identity of the failed
`controller {step 240). In the disabling step 235. the surviving
`controller 105:1, I055 asserts a reset signal, which disables
`the failed controller 105b, 105:1 by resetting its.local pro-
`cessor 1S5:1, 18511, and the active port 195:1, 1951:. fibre
`protocol chip (not shown). Resetting the fibre protocol chip
`causes the hub 15011, 150!) to :tutomatically bypass the
`primary port 195:1, 195b, on the failed controller 105:1, 1051:.
`In the assuming identity step 240, the failover port 2l.I'0:::.
`200!) of the surviving controller 105.51. 1055. bcgiI1s accept-
`ing and processing U0 requests addressed by the host
`computers 110:1, 11011. to the failed controller 1051), 105:1.
`Preferably,
`to speed up the failover process the unique
`identiticr for the failed controller 105:1, 105b, was previ-
`ously assigned to the failover port 200:1, 2001), during the
`communication step 215, and the surviving controller 105
`merely activates the lailover port 200:1. 20011, to enable it to
`begin accepting and processing I.-“O requests.
`the surviving
`After the failovcr process is completed,
`controller 105:1, 1051!),
`in a resume operation (step 245)
`resurrtcs operations by responding to IE0 requests addressed
`
`8
`to itsell'and to the failed controller. The surviving controller
`105:1. 1115b. responds to requests to store or retrieve data
`addressed to the failed controller, without any additional
`support from the host computers 110 or the llBAs 155.
`Because there is no need to alter the registered unique.
`identifiers or the associated I-UNs, the failover process is
`transparent to the host computers 110. To the host computers
`110. the delay, if any. caused by the time it takes to detect the
`failed controller 105:1. 105!) and to perfonrt the loop initial-
`ization procedure appears to be no more than at momentary
`loss of power to the memory system 100, which rcqttircs the
`host computers to re—1ra nsmit the last several commands sent
`to the failed controller.
`
`Optionally, when the controllers 105 include caches 180:1.
`180b, the lailover process can also include a cache flush stcp
`(not shown) and a conservative cache mode enable step {not
`shown). The cache llush step prevents the loss of data that
`was presented with good status to the host computers 110
`because the data has been written to both caches 180:1, 180.6.
`but has not actually been written to the data storage system
`120 before the controller
`failure. The cache flush step
`commits this data to the data storage system 120. Enabling
`conservative cache mode minimizes the chance of data
`being lost while operating with a single controller 105:1.
`105b, in failover mode, by ensuring that all data is written
`to the data storage system 120 prior to a good status signal
`being sent.
`In another aspect, the present invention is directed to a
`memory system 100 having a failover mechanism. such as
`the one described above, that further includes a lailback
`process or mechanism that is substantially transparent to the
`host computers 11011, 11031. To be transparent
`to the host
`computers 110:1, 110b, the fallback mechanism should sup-
`port a hot swap of a failed controller 105:1. 1051‘). By hot
`swap it is meant the failed controller 105:1, 105.-5 is removed
`and it replacement controller {not shown) put
`in serv