`(10) Patent N0.:
`US 6,401,170 B1
`
`Griffith et al.
`(:45) Date of Patent:
`Jun. 4, 2002
`
`USOO6401170B1
`
`(54) RAID SYSTEMS DURING NON-FAULT AND
`FAULTY CONDITIONS 0N AFIBER
`CHANNELARBITRATED LOOP, SCSI BUS
`
`(75)
`
`Inventors: Geoffrey J. Griffith, Laurel; Tomlinson
`G. Rauscher, Ellicott City, both of MD
`(US)
`.
`.
`.
`._
`.
`(73) Assrgnee. 33:1) Data Corporatlon, Jessup, MD
`
`( * ) Notice:
`
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154a,) by 0 days.
`
`21 A 1. N .: 09
`)
`PP
`0
`(
`(22)
`Filed:
`
`/376’324
`Aug. 18, 1999
`
`7
`
`................................................ G06F 12/00
`Int. Cl.
`(51)
`........................ 711/114; 711/150; 711/156;
`(52) US. Cl.
`711/162
`_
`(58) Fleld of Search ................................. 711/114, 150,
`711(156, 162, 163; 714/5, 6; 370/222, 258;
`710/37
`
`7
`(59)
`
`.
`References CltEd
`U.S. PATENT DOCUMENTS
`
`6,055,228 A *
`4/2000 DeKoning et al.
`.......... 370/258
`6,073,218 A *
`6/2000 DeKoning et al.
`..
`711/150
`
`6,131,148 A : 10/2000 West et al.
`.........
`711/162
`6,192,027 B1 *
`2/2001 El—Batal ............... 370/222
`
`
`
`2/2001 Asano """""""""""""" 7146
`
`6’192’484 B1 *
`* cited by examiner
`Primary Examiner—David Hudspeth
`Assistant Examiner—Fred F. Tzeng
`(74) Attorney, Agent, or Firm—William S. Ramsey
`(57)
`ABSTRACT
`
`The RAID system disclosed here uses arbitrated fiber chan-
`nels or switch fabric to connect multiple host computers and
`storage array controllers (SAC). Each SAC is designated a
`primary SAC for an array of storage units, which it normally
`y
`serves as controller, and as a secondar SAC for another
`array of storage units. Aprimary SAC, secondary SAC, and
`array of storage units is termed a storage unit set. When the
`primary SAC or associated host computer fails, the failure is
`detected by an interface chip, which causes the secondary
`SAC to assume the identify of the primary controller. Using
`system configuration information from the DASDS,
`the
`secondary SAC then controls the storage units of the storage
`unit set along with the storage units of which it is primary
`SAC. With this configuration, there is no need for switch
`apparatus between the storage arrays and there is no inter-
`ference because dual ported storage units are used.
`
`6,006,342 A * 12/1999 Beardsley et al.
`
`............. 714/5
`
`11 Claims, 6 Drawing Sheets
`
`
`
`K
`
`700
`
`Failure of Primary SAC halts
`heartbeat
`
`
`710
`
`
`
`Secondary SAC notes cessation
`
`of heartbeat
`
`
`
`720 —\
`
`730
`
`Using interface chip, Secondary
`SAC assumes identity of
`Primary SAC
`l
`
`Secondary SAC identifies
`devices of storage array set
`
`
`
` J
`
`740
`
`\
`
`Secondary SAC controls
`devices of storage array set
`
`
`
`VMWARE, INC. 1007
`
`VMWARE, INC. 1007
`
`1
`
`
`
`US. Patent
`
`Jun. 4, 2002
`
`Sheet 1 0f 6
`
`US 6,401,170 B1
`
`v—l
`
`v—1
`~
`
`(*1~
`
`v—(
`
`M.
`
`mmQm<Q
`
`<mQm<Q
`
`mtQm<Q
`
`<vQm<Q
`
`m:Qm<Q
`
`44Qw<Q
`
`mmQm<Q
`
`<NQw<Q
`
`Nv3
`
`mmAAOMFZOUWEMED
`
`
`
`.vHow
`
`
`
`moQm<Q
`
`<0Qm<D
`
`Hmpsmflm
`
`mmQm<Q
`
`<mQm<Q
`
`m.32monm
`
`2
`
`
`
`
`US. Patent
`
`Jun. 4, 2002
`
`Sheet 2 0f 6
`
`US 6,401,170 B1
`
`§\ o
`
`\
`
`\
`
`
`
`‘//
`
`HH
`
`H2
`
`mmQm<Q<mQm<Q
`
`-.mov
`
`m
`
`9&4xOHmm
`
`meSE<083m953m
`
`/ooMow
`
`//o
`
`IE«a?EvIEA..JV50
`
`.v
`
`mI
`
`mmQw<Q4mQm<Qmi
`-1ow
`
`NII
`
`
`
`v90
`
`
`
`
`
`ZEDEZOUPwOE
`
`
`
`MMFDEZODHWOE
`
`
`
`
`
`
`
` MmAAOMHHZOU>§m<Man;_omCm
`
`
`-
`
`-3H9.
`
`II.JI.a9:3iQ95
`MNQw<Q<NQm<fl--m:-.v
`
`3
`
`
`
`
`US. Patent
`
`Jun. 4, 2002
`
`Sheet 3 0f 6
`
`US 6,401,170 B1
`
`°\\
`
`0 v
`
`VT
`
`\
`
`m ”
`
`(3‘
`
`N30
`
`W
`0 3%
`III-IIIII
`III-I
`
`
`
`
`
`
`
`ONHo
`
`
`
`HH
`
`
`
`MHmELDnHEOUPMOE
`
`H,
`:H
`
`
`
`MHerHLDnQ/HOUPmOE
`
`
`
` /om.m
`
`
`4
`
`
`
`US. Patent
`
`Jun. 4, 2002
`
`Sheet 4 0f 6
`
`US 6,401,170 B1
`
`343
`
`310
`
`All
`
`360
`
`362
`
`36-4
`
`366
`
`368
`
`170
`
`844
`
`43:/433
`
`332
`
`432
`
`312
`
`434
`
`\
`
`430
`
`()H)
`
`(HZ
`
`F1gure4
`
`5L7
`
`E
`m
`
`\334
`
`336
`
`330
`
`/823
`
`234
`
`333—\824
`/0
`
`232
`
`
`
`436
`
`
`
`
`
`340
`
`142
`
`344
`
`\3, 260
`
`262
`
`264
`
`
`
`
`
`150
`
`XH.
`
`270
`
`
`
`346
`
`266
`
`"$48
`
`263
`
`a,q.94OH H
`
`m
`2
`
`S?
`N
`
`‘3‘
`N
`
`3
`N
`
`k‘
`
`
`
`o
`:r
`N
`
`A
`so
`\7"
`c:
`N N!
`
`\
`
`..
`A
`
`\
`\_
`”
`F
`
`#
`m
`N
`
`m
`m/
`00
`
`31
`be
`
`o
`m
`
`o
`
`N
`
`g
`m
`
`N r
`
`m
`"
`
`5
`
`
`
`US. Patent
`
`Jun. 4, 2002
`
`Sheet 5 0f 6
`
`US 6,401,170 B1
`
`432
`
`/843
`
`310
`
`844
`
`43!/433
`
`43’»
`
`360
`
`362
`
`364
`
`366
`
`368
`
`370
`
`312I 332
`
`824
`
`
`/823330
`F1gure5
`
`334
`
`234
`
`336
`
`430
`
`M0
`
`
`
`v‘.
`
`
`
`H!
`
`
`
`4
`
`26232
`
`260340
`
`144
`
`264
`
`266346
`
`”M
`
`268X
`
`111
`
`130
`
`___270
`
`94aOH8H
`
`333\
`
`23?.
`
`
`
`
`
`O
`tn
`m
`
`rd
`m
`no
`
`/
`
`0
`'_
`N
`
`<7
`g
`
`m
`N
`m
`
`N
`<r
`N
`
`:3
`<1
`N
`
`o
`V
`N
`
`S
`N
`
`
`
`x
`V
`N
`
`V
`"
`‘"
`N
`
`
`
`-
`\ "
`
`n
`‘
`0
`Ff
`r’
`p
`
`6
`
`
`
`US. Patent
`
`Jun. 4, 2002
`
`Sheet 6 0f 6
`
`US 6,401,170 B1
`
`o23E
`
`
`
`
`
`3?:U<m>58ango82me
`
`83:3:
`/cow
`
`5:33038:U<m$25030:
`
`H8850:we
`
`bfiéooom.35885%$53om»
`go35:028:53“U<m/
`
`
`
`9%5&5
`
`
`
`
`
`
`EmEwanowns:3303%
`mucus?U<m53:5quomb
`
`
`
`
`
`
`
`5m>35owio;,68033
`
`
`
`205:3U<mgwcgomo0:
`
`7
`
`
`
`
`
`
`US 6,401,170 B1
`
`1
`RAID SYSTEMS DURING NON-FAULT AND
`FAULTY CONDITIONS ON A FIBER
`CHANNEL ARBITRATED LOOP, SCSI BUS
`OR SWITCH FABRIC CONFIGURATION
`
`SEQUENCE LISTING
`
`Not Applicable.
`CROSS—REFERENCE TO RELATED
`APPLICATIONS
`
`Not Applicable.
`STATEMENT REGARDING FEDERALLY
`SPONSORED RESEARCH OR DEVELOPMENT
`
`Not Applicable.
`MICROFICI IE APPENDIX
`
`Not Applicable.
`BACKGROUND OF THE INVENTION
`
`(1) Field of the Invention
`This invention relates to systems in which multiple con-
`trollers are used to control an array of storage devices.
`(2) Description of Related Art Including Information
`Disclosed Under 37 CFR 1.97 and 37 CFR 1.98
`
`The acronym RAID refers to systems which combine disk
`drives for the storage of large amounts of data. In RAID
`systems the data is recorded by dividing each disk into
`stripes, while the data are interleaved so the combined
`storage space consists of stripes from each disk. RAID
`systems fall under 5 different architectures, plus one addi—
`tional type, RAID-0, which is simply an array of disks and
`does not offer any fault tolerance. RAID 1—5 systems use
`various combinations of redundancy, spare disks, and parity
`analysis to achieve conservation reading and writing of data
`in the face of one and, in some cases, multiple intermediate
`or permanent disk failures. Ridge, P. M. The Book QfSCSI:
`A Guide For Adventurers. Daly City Cal. No Starch Press.
`1995 p. 323—329. In this application, a RAID system con-
`sisting of one host computer, one controller, and an array of
`multiple channels, each channel consisting of several direct
`access storage devices in serial electrical connection, will be
`termed a “single RAID subsystem”.
`Conventional RAID systems guard against failure of a
`controller by the active-active system. This system consists
`of two single RAID subsystems, each with a host computer,
`a controller, and an array of direct access storage units. The
`direct access storage units, in the most common case, disks,
`are arranged in channels in which the disks are connected in
`a series. A common arrangement is for one controller to
`control six channels of five disks in each channel. In the
`
`active-active system, each channel of one system is con-
`nected electrically to another channel in another system.
`This means that, in the event of the failure of one controller,
`the other controller can serve all 10 disks in each “double”
`channel. Unfortunately, during normal operation when both
`controllers are operating there is interference associated with
`the fact that two controllers are simultaneously accessing a
`double channel of ten disks. This interference reduces the
`
`speed of a normally acting active-active system to about
`130% of the speed of a single RAID subsystem rather than
`the 200% of a single RAID subsystem expected from the
`operation of two single RAID subsystems.
`US. Pat. No. 5,768,623 discloses a system for storing
`data for several host computers and several storage arrays
`
`2
`which are linked so that each storage array can be accessed
`by any host computer. The system uses single-ported disks
`and Serial Storage Architecture (SSA) in a SSA disk array
`loop. Messages and data can travel either clockwise or
`counter-clockwise when traversing the loop. The bandwidth
`of such a loop is necessarily lower than that of a fibre
`channel configuration.
`US. Pat. No. 5,812,754 discloses a RAID system which
`uses a fibre channel arbitrated loop to connect host comput-
`ers and controllers as well as a separate fibre channel
`arbitrated loop to connect controllers and storage disks. In
`addition, a port bypass circuit is connected to each compo-
`nent in order to allow bypassing of any failed component so
`the operation of the loop is not affected by the failed
`component. Finally,
`in one embodiment, a star coupled
`RAID system with orthogonal data striping is described. In
`this embodiment defective components can be removed
`physically from the system. This system is considerably
`more expensive and slower in operation than the system of
`the present invention.
`The RAID systems of the prior art do not provide the
`advantages of the present invention, that of inexpensively
`increasing the overall speed of N same—speed single RAID
`subsystems to N times the speed of a single RAID system
`under normal conditions while providing for the sharing of
`multiple storage devices during conditions in which a host
`computer or storage array controller fails. The present
`system maintains the high overall speed under normal
`conditions and provides host computer and controller redun-
`dancy without the expense of a switching system connecting
`the channels of storage devices and while taking advantage
`of the high speed associated with fibre channel loops and
`switch fabric configurations.
`The system of the present invention is unlike the conven-
`tional active-active system because it uses a high bandwidth
`fibre channel arbitrated loop or switch fabric to connect the
`host computers and controllers. This provides redundancy in
`the case of any single computer or controller failure. In
`addition, since the present invention includes dual-ported
`storage devices, the failure of a storage device does not have
`a disruptive effect on the system. Each storage array con-
`troller (SAC) is designated a primary SAC for an array of
`storage units and as a secondary SAC for a different array of
`storage units. Each array of storage units is assigned to a
`primary SAC, which normally controls the array, and to a
`secondary SAC, which assumes the identity of the primary
`SAC upon failure of the primary SAC. Under normal
`conditions, each SAC controls only the array of storage units
`that it serves as primary SAC. Both the primary SAC and the
`secondary SAC are connected by separate loops to separate
`ports on the dual-ported storage devices. The combination of
`one primary SAC, its storage device array, and one second-
`ary SAC which is potentially able to control the storage
`device array is termed a “storage array set”.
`If three same speed single RAID subsystems are included,
`for example, the system functions at 300% the speed of a
`single RAID subsystem during the vast preponderance of the
`time when all of the host computers and SACs are func-
`tioning properly. In the case of a storage array controller or
`associated host computer failure, however, an intact host
`computer and SAC (the secondary SAC of the defective
`storage array set) takes over the operation of the failed
`system’s array of storage devices. The intact secondary SAC
`assumes the identify or address of the failed controller and
`retains its own identity and duties to serve its own storage
`device array as the primary SAC. In this way, the intact
`system can address its own storage devices as well as those
`
`10
`
`15
`
`40
`
`45
`
`60
`
`65
`
`8
`
`
`
`US 6,401,170 B1
`
`3
`of the failed host computer or controller. In this configura-
`tion the system has the speed expected of a conventional
`active—active system, after a host computer or SAC failure,
`about 100% of the speed of an individual RAID subsystem
`for the two affected single RAID subsystems. Any remaining
`unaffected single RAID subsystems continue to operate at
`the unhindered maximum] speed.
`The fibre channel loop and switch fabric configuration are
`becoming the industry standards for loop or serial interfaces,
`and SCSI has long been the industry standard for bus or
`parallel interfaces. The present invention is applicable for
`either the fibre channel disk array loop or SCSI interfaces for
`the host computers and SACs. In addition,
`the present
`invention is applicable to a switch fabric configuration.
`BRIEF SUMMARY OF THE INVENTION
`
`10
`
`15
`
`The redundant RAID system of this invention extends the
`protection of the operation of a RAID system from provid-
`ing for disk failure to providing for host computer or SAC ,
`failure. The invention comprises two or more (N) single
`RAID subsystems which are linked by a very wide band-
`width fibre channel loop or switch fabric configuration. Each
`SAC is designated a primary SAC for an array of storage
`devices to which it is linked by a loop connection to one port
`on each device. A second port on each device is used to link
`in a loop to a secondary SAC. The primary SAC normally
`controls the array of storage devices. In the event of failure
`of the primary SAC or associated host computer, the failure
`is detected by the secondary SAC, which then assumes the
`identity of the primary SAC, learns the identity and location
`of the affected array of storage devices, and serves this array
`as though it were the primary SAC.
`Thus the system normally functions as (N) independent
`single RAID subsystems and functions at the speed of one
`single RAID subsystem multiplied by N if the single RAID
`subsystems all have the same speed. If the speed of the
`single RAID subsystems vary, the system normally func-
`tions at a speed which is the sum of the single RAID
`subsystems. In the event of a host computer or primary SAC
`failure, the secondary SAC controls a double set of storage
`array devices. This causes interference in transmission of
`data to the storage devices and slows the speed of the
`system. The functioning controller thus takes over the func-
`tion of the disabled controller and provides continuing
`service, albeit at a reduced speed. The unaffected single
`RAID subsystems of the redundant RAID system of this
`invention continue to function unhindered.
`
`40
`
`45
`
`invention
`In the normal operating mode the present
`enables each SAC to communicate with a set of disks
`independently of any other SAC, thus operating the redun-
`dant RAID system at
`the speed of N single RAID sub-
`systems. In the event of failure of the host computer or SAC
`of a component single RAID subsystem, the system auto-
`matically assumes the configuration of a conventional
`active-active system with respect
`to the affected single
`RAID subsystem and an unaffected single RAID subsystem.
`The redundant RAID system continues to operate with
`access by the functioning RAID subsystem host SAC to all
`of the disks of both the failed and the functioning SAC,
`although at a reduced speed.
`A host computer and SAC redundant RAID system with
`a normal speed much higher than the conventional active-
`active host computer and SAC redundant systems is pro-
`vided by this invention. In the event of failure of a host
`computer or SAC the speed of the system is no lower than
`that of a conventional host computer and storage array
`
`60
`
`65
`
`4
`controller redundant system. If greater than two single
`RAID subsystems are included in the redundant RAID
`system, the speed of the system under nearly all conditions
`is greater than the conventional redundant system.
`The objective of this invention is to provide a host
`computer and SAC redundant RAID system which contin-
`ues to operate despite the failure of a single host computer
`or SAC.
`
`Another objective of this invention is to provide a N host
`computer and SAC redundant RAID system which operates
`at the speed of N single RAID subsystems if all have the
`same speed in the absence of failures, yet provides protec-
`tion against host computer or SAC failure.
`Another objective of this invention is to provide a N host
`computer and N SAC redundant RAID system which con-
`tinues to operate at a reduced speed during a host computer
`or SAC failure while the system continues to operate at the
`speed of N—1 single RAID systems if all subsystems have
`the same speed.
`Another objective of this invention is to provide a N host
`computer and SAC redundant RAID system which contin-
`ues to operate as long as fewer than or equal to N/2 of the
`single RAID subsystems suffer a failure of the host computer
`or SAC and each single RAID subsystem with a failed host
`computer or SAC is linked to an intact secondary SAC.
`Another objective is to provide a redundant RAID system
`with two-ported storage devices each of which is connected
`to both a primary SAC and to a secondary SAC.
`Another objective is to provide a redundant RAID system
`in which fibre channel or switch fabric technology is used to
`maximize the speed of the system.
`A final objective of this invention is to provide a host
`computer and SAC redundant RAID subsystem which is
`inexpensive, resistant to failure, easy to maintain, and is
`without harmful effects on the environment.
`
`BRIEF DESCRIPTION OF THE SEVERAL
`VIEWS OF THE DRAWINGS
`
`FIG. 1 is a diagrammatic representation of a single prior
`art RAID subsystem.
`FIG. 2 is a diagrammatic representation of a conventional
`prior art active—active RAID system with two controllers and
`two host computers.
`FIG. 3 is a diagrammatic representation of a FULL-
`SPEED ACTIVE-ACTIVE redundant RAID system which
`uses switches to connect arrays of storage devices.
`FIG. 4 is a diagrammatic representation of the FULL-
`SPEED ACTIVE-ACTIVE redundant RAID system of the
`present invention.
`FIG. 5 is a diagrammatic representation of the embodi-
`ment of the FULL-SPEED ACTIVE-ACTIVE redundant
`
`RAID system of the present invention which incorporates a
`switch fabric configuration.
`FIG. 6 is a flow chart of the process of operation of the
`redundant RAID system of the present invention.
`DETAILED DESCRIPTION OF THE
`INVENTION
`
`FIG. 1 is a schematic of the external View of a RAID
`
`system referred to in this application as a “single RAID
`subsystem”. The single RAID subsystem comprised a single
`host computer 10, a SAC 30, and an array of direct access
`storage devices (DASD). The host computer 10 is electri-
`cally connected to the disk array controller 30 by connector
`means 20.
`
`9
`
`
`
`US 6,401,170 B1
`
`5
`The connector means may be a wire or cable connector or
`a SCSI bus.
`
`In all of the FIGS. the convention is followed of depicting
`connectors which are not electrically connected as lines
`which cross perpendicularly. An electrical connection is
`indicated by a line which terminates perpendicularly at
`another line or at a symbol for a component. Thus in FIG.
`1 the host computer 10 is electrically connected to disk array
`controller 30 by connector 20. Connector 401 is electrically
`connected to disk array controller 30 and to DASD 1A 40
`and to DASD 1B 41 but is not electrically connected to
`connectors 402 to 406.
`
`10
`
`15
`
`DASD may be disks, tapes, CD8, or any other suitable
`storage device. A preferred DASD is a disk.
`All the DASD and connectors in a system taken as a
`whole is referred to as an “array” of DASD. The DASD are
`arranged in channels which consist of a number of DASD
`which are electrically connected to each other and to the disk
`array controller by connector means. The channels are ,
`designated in FIG. 1 as 1 to 6. The number of channels may
`vary. A preferred number of channels is 6.
`A channel, for example channel 1, consists of connector
`401, DASD 1A 40, and DASD 1B 41. Although only two
`DASD are depicted in channel 1 of FIG. 1, there may be as
`many as 126 DASD in a channel. A preferred number of
`DASD in a channel is five.
`
`,
`
`A group of DASDs served by separate channels across
`which data is striped is referred to as a “tier” of DASDs. A
`DASD may be uniquely identified by a channel number and
`a tier letter, for example DASD 1A is the first disk connected
`to channel 1 of the controller.
`
`Apreferred SAC 30 is the Z-9100 Ultra-Wide SCSI RAID
`controller manufactured by Digi-Data Corporation, Jessup,
`Md.
`
`Additional tiers of DASDs may be used.
`Any suitable host computer 10 may be used. A preferred
`host computer 10 is a Pentium-based personal computer
`available from multiple vendors such as IBM, Research
`Triangle Park, North Carolina; Compaq Computer Corp,
`Houston, Tex., or Dell Computer, Austin, Tex.
`FIG. 2 shows the prior art active-active redundant host
`computer and SAC RAID system. This system comprises
`two single RAID subsystems of FIG. 1, system 11 and
`system 111 in FIG. 2 which are electrically connected
`through the disk array controllers and through the arrays of
`DASD.
`
`FIG. 2 shows system 11 which comprises host computer
`10, connected by connector 20 to disk array controller 30,
`and the system 11 array of which channels 1 to 6 consisting
`of connectors 401 to 406,
`respectively, and associated
`DASD 40—60, respectively. Only one DASD of each chan-
`nel is depicted on FIG. 2.
`FIG. 2 also shows system 111 which comprises host
`computer 110, connected by connector 120 to disk array
`controller 130, and the system 111 array of which channels
`1 to 6 consisting of connectors 401 to 406, respectively, and
`associated DASD 141—161, respectively. Only one DASD of
`each channel in system 11 is depicted on FIG. 2. Note that
`in both system 11 and system 111 the arrays are electrically
`connected bidirectionally to each system. For example, array
`1 of system 11 is connected by connector 401 to array 1 of
`system 111.
`The disk array controller 30 of system 11 is connected to
`the disk array controller 130 of system 111 by a bidirectional
`connector which is depicted in FIG. 2 as connectors 300 and
`
`40
`
`45
`
`60
`
`65
`
`10
`
`6
`310. Disk array controller 30 contains internal software
`which generates a binary signal termed a “normal operating
`signal” or a “heartbeat” at an interval of a few milliseconds
`when the disk array controller 30 and host computer 10 of
`subsystem 11 are operational. When the host computer or
`disk array controller is in a defective condition, the emission
`of the normal operating signal ceases.
`The normal operating signal is emitted from disk array
`controller ovcr connector 300 to the disk array controller
`130 of subsystem 111. Similarly, when the host computer
`110 and disk array controller 130 of subsystem 111 are
`operating normally, a normal operating signal is emitted
`from disk array controller 130 over connector 310 to disk
`array controller 30 of subsystem 11.
`When one disk array controller no longer receives the
`normal operating signal because the host computer or disk
`array controller of the other system is defective, the opera-
`tional disk array controller begins to assume the tasks of the
`defective array of the system containing the defective com-
`ponent. For example, if disk array controller 30 of sub-
`system 11 ceases to receive a normal operating signal from
`disk array controller 130 of subsystem 111, disk array
`controller 30 will assume the control and service of not only
`its own DASD, 40—60 in FIG. 2, but also of the DASD of
`subsystem 111, 141—161. Connector 20 also connects host
`computer 10 with disk array controller 130. Similarly con-
`nector 120 connects host computer 110 with disk array
`controller 30. Connectors 20 or 120 are used to transfer
`
`information from the host computer of a single RAID
`subsystem which has a faulty host computer or disk array
`controller to the disk array controller of the functional single
`RAID subsystem. This protects each component of the
`active-active RAID system from failure of any one host
`computer or disk array controller and allows each DASD to
`be read to or written from.
`
`Unfortunately, the protection against failure in the system
`of FIG. 2 is achieved at a cost in speed of operation. An
`interference condition is created in any channel 401—406 of
`FIG. 2 because two disk array controllers are using a single
`connector to address the DASD of two single RAID sub-
`systems. Each disk array controller must wait until the
`conductor is free before addressing its DASD. The net effect
`is a considerable reduction of speed in normal operation. If
`the speed of a single RAID subsystem is 100% (relative
`speed), then the relative speed of the active-active system of
`FIG. 2 under normal operating conditions is about 130%,
`rather than the 200% expected of two single RAID sub-
`systems (which, however, do not enjoy the fault-tolerance
`associated with the redundant host computers and disk array
`controllers).
`termed the FULL-
`Another redundant RAID system,
`SPEED ACTIVE-ACTIVE redundant RAID system,
`is
`depicted in FIG. 3. This system is disclosed in US. patent
`application Ser. No. 09/192,016, filed Nov. 13, 1998, incor—
`porated herein by reference.
`The system in FIG. 3 is identical to that in FIG. 2 with the
`exception of the addition of a normally open switch means
`between the channels which are connected in FIG. 2. and the
`means to control the switch means. In FIG. 3 the electrical
`
`connector 401 between channel 1 of subsystem 11 and
`channel 1 of subsystem 111 is intercepted by controllable
`repeater or core 70. The core 70 consists of connections to
`channel 1 of subsystems 11 and 111 with normally open
`switch means,
`in this case a normally open repeater 90
`electrically connected to and interposed between the seg-
`ments of connector 401, which has been segmented into
`
`10
`
`
`
`US 6,401,170 B1
`
`7
`connector 401 and 411. When repeater 90 of core 70 is in the
`open position,
`there is no electrical connection between
`channel 1 of subsystem 11 and channel 1 of subsystem 111.
`Similarly, switch means or repeaters 91—95 are interposed in
`the connections between channels 2, 3, 4, 5, and 6,
`respectively, and while the switch means or repeater 91—95,
`respectively, are in the open position, there are no electrical
`connections between channels 2, 3, 4, 5, and 6 of subsystem
`11 and channels 2, 3, 4, 5, and 6 of subsystem 111,
`respectively. The core 70 is a container which contains and
`supports the switch means and the connection means for
`attaching switch means to a channel.
`When the switch means of the core are closed the elec-
`
`trical connections between the channels of subsystem 11 and
`subsystem 111 are formed. Under the conditions of closed
`switch means the system of FIG. 3 is electrically equivalent
`to that of the active-active system of FIG. 2.
`In operation,
`the switch means 90—95 in core 70 are
`normally open while each host computer and disk array
`controller is functioning normally. Under these normal con-
`ditions the channels of subsystems 11 and 111 are electri-
`cally isolated from each other. The relative speed achieved
`by the system is 200% of the speed of a single RAID
`subsystem.
`In the rare event of failure of one host computer or disk
`array controller the normal operating signal or heartbeat
`emitted from a disk array controller is stopped. When the
`other disk array controller does not receive a normal oper-
`ating signal
`it emits a closure signal
`to the core. The
`normally open switch means are now closed and the elec-
`trical connections between the channels of the functional
`and non-filnctional systems are closed, allowing the func-
`tional system to control the DASD of both subsystems.
`The present invention is designed to overcome the lack of
`performance associated with the active-active RAID system
`under normal conditions while retaining the fault-tolerance
`under conditions of failure of a host computer or disk array
`controller.
`
`FIG. 4 is a diagrammatical representation of the present
`invention. In FIG. 4 the redundant RAID system is shown
`with parts of 4 single RAID subsystems. FIG. 4 shows two
`RAID subsystems with host computer, storage array com-
`puter (SAC) and a portion of two array of direct access
`storage devices (DASD). This may be extended to N
`subsystems, where N is a number greater than two, by the
`addition of single RAID subsystems. In FIG. 4, the host
`computers 210, 310, 510, and 610 and the SAC 230, 330,
`430, and 530, and the interface chips 232, 332, 432, and 532
`are connected by loop connecting means 22 by connectors
`212, 312, 512, and 612; 236, 336, 436, and 536; and 234,
`334, 434, and 534, respectively.
`Only a portion of the DASD are shown in FIG. 4. Two
`channels of DASD are shown on SAC 230; and two chan-
`nels of DASD are shown on SAC 330. SAC 230 is con-
`nected by connector 233 to DASD 240, 242, 244, 246, 248,
`and 250. SAC 230 is connected by connector 231 to DASD
`260, 262, 264, 268, and 270. SAC 330 is connected by
`connector 331 to DASD 340, 342, 344, 346, 348, and 350.
`Sac 330 is connected by connector 333 to DASD 360, 362,
`364, 366, 368, and 370. SAC 330 is also connected by
`connector 331 to DASD 240, 242, 244, 246, 248, and 250.
`Note that connector 331 is connected to these DASD at a
`
`connection site or port which is different from the port to
`which connector 233 is connected. Also, connector 331
`forms a loop. SAC 330 is connected by connector 333 to
`DASD 260, 262, 264, 266, 268, and 270. Again a different
`
`10
`
`15
`
`40
`
`45
`
`60
`
`65
`
`8
`port from that which connects connector 321 was used with
`these DASD and connector 333 forms a loop.
`Connectors 231 and 233 which are connected to SAC 230,
`connect with the DASD of channels in an array which is not
`shown in FIG. 4. Connector 431 connects with a SAC which
`is not shown in FIG. 4 and also connects with DASC 340,
`342, 344, 346, 348, and 350 using a port on each DASC
`which was not used in connecting with connector 331.
`Connector 433 connects with a SAC which is not shown in
`FIG. 4 and also connects with DASC 360, 362, 364, 366,
`368, and 370.
`Each SAC, 230, 330, 430 and 530, has an associated
`interface chip 232, 332, 432, and 532, respectively.
`SAC 230 and SAC 330 are connected by connector means
`824 and 823, used to deliver the heartbeat. SAC 230 is
`connected for this purpose to another SAC (not shown) by
`connector means 833 and 834. SAC 330 is connected for this
`
`purpose to another SAC (not shown) by connector means
`843 and 844.
`
`FIG. 5 is identical to FIG. 4 except that the loop con-
`necting means 22 does not appear in FIG. 5. Rather, the
`components in FIG. 5 are connected by connecting means to
`a switch fabric device 24.
`
`Loop connecting means may be a SCSI bus, fibre channel
`arbitrated loop, or a switch fabric device.
`Fibre Channel is a high-speed low-latency communica-
`tions technology with gigabit-per-second transmission rates
`in storage/server environments. A preferred fiber channel
`switch is a GigWorks MKII-16 Fibre Channel Switch,
`available from Ancor Communications, Inc., Minnetonka,
`Minn.
`A switch fabric device is a distributed switch with the
`
`topology of a torroidal derivative. The system scales in a
`linear fashion to over a terabit per second in bandwidth. A
`preferred switch fabric device is CST 2000, available from
`ServerSwitch Corporation, Dallas, Tex.
`A preferred dual-port disk is the 3.5-Inch Ultrastar2 XP,
`available from IBM, San Jose, Calif.
`A preferred SAC is the Z—9100 Ultra-Wide SCSI RAID
`controller manufactured by DigiData Corporation, Jessup,
`Md.
`
`Any suitable host computer may be used. Apreferred host
`computer 10 is a Pentium-based personal computer available
`from multiple vendors such as IBM, Research Triangle Park,
`North Carolina; Compaq Computer Corp, Houston, Tex., or
`Dell Computer, Austin, Tex.
`Connectors may be fiber optics or copper wires.
`The systems of both FIGS. 4 and 5 function in an identical
`manner, which will be described with reference to FIG. 4.
`Each host computer may be associated with a specific SAC.
`For example, host computer 510 may provide and retrieve
`data from SAC 230, which then stores the data on its array
`of DASC, in FIG. 4, DASC 240—250 and 260 to 270 in the
`write mode. In the read mode, SAC 230 reads the data from
`the same DASC and reports the data to the host computer
`510. A typical SAC has 6 channels of DASC in its array, but
`only two channels are shown in FIG. 4. The SAC stripes the
`data across the channels of DASD. When data are being
`read, the SAC 230 reads the data across the channels and
`transmits the data to the associated host computer 510.
`This redundant RAID system has provisions for the
`failure of a SAC or associated host computer, as will be
`described with reference to FIG. 4. The components of the
`RAID system are organized into groups called “storage
`array sets”. A storage array set consists of a primary SAC
`
`11
`
`11
`
`
`
`US 6,401,170 B1
`
`9
`and its attached array of DASD and another SAC which is
`designated the secondary SAC. Since each SAC has an array
`of DASD, each SAC is designated a primary SAC for its
`own attached array and as a secondary SAC for another
`array of DASD. In the event of a failure of a primary SAC
`or its associated host computer, the secondary SAC, as a
`member of the storage array set, assumes the identity of the
`primary SAC, identifies the array of DASD in the storage
`array set, and controls both the array of the storage array set
`of which