throbber
Ulllted States Patent
`1191
`1111 Patent Number:
`6,073,218
`
`DeKoning et al.
`[45] Date of Patent:
`Jun. 6, 2000
`
`U3006073218A
`
`[54] METHODS AND APPARATUS FOR
`gggggggggggggEgoMgggggg RAID
`STORACF DFVICFQ
`‘
`7 ‘
`‘
`“
`a
`.
`'
`_
`Inventorb- FRmillgey £8- 1???? GEM“ J-
`re III, 0t
`1c ita,
`0
`ans.
`
`[75]
`
`[73] Assignee: LSI Logic Corp.; Milpitas; Calif.
`
`5
`[21] Appl. No.. 08/772,614
`[22]
`Filed:
`Dec. 23, 1996
`n .
`......................................................
`.
`G06F 13/16
`[51]
`l t (:17
`711/150; 711/114; 711/148;
`...................
`[52] US. Cl.
`711/149; 711/151; 711/152; 711/153; 711/162;
`711/168; 710/20; 710/21; 710/38; 710/241;
`714/6; 714/11
`[58] Field of Search ..................................... 711/114, 152,
`711/145, 148, 149, 150, 151, 153, 168,
`162; 714/6; 11; 710/20, 21, 38; 241
`
`
`
`[56]
`
`,
`‘
`_
`References Cited
`
`Us‘ PATENT DOCUMENTS
`10,1972 Page ............................................ 444/1
`3,702,006
`3/1992 Schultz et al.
`395/575
`5,101,492
`
`9/1992 Gordon et 31,
`371/101
`5,148,432
`5/1993 Pfeffer et a1.
`395/575
`5,210,860
`.
`9/1993 Schmenk et a1.
`.. 395/425
`5,249,279
`
`5/1994 Dias et al.
`395/600
`5,317,731
`7/1994 Fry et a1.
`.. 360/53
`......
`5,331,476
`
`11/1994 H011and et al.
`395/575
`5,367,669
`1/1995 L31 ‘31 al-
`------
`796/791,6
`593799417
`”[1995 Fry ‘31 a1~
`~~~~~~
`-~ 560/53
`595809324
`211995 DCMOSS et al.
`371/511
`53885108
`
`3133: gellfiltngr 1““
`:ggjégg
`2:339:19
`/
`0 0 a e a'
`”’
`’
`8/1995 Dang et a1.
`395/401
`5,446,855
`
`5,455,934 10/1995 Holland et a1.
`395/404
`5,459,864 10/1995 Brent et a1.
`395/650
`
`5,495,601
`2/1996 Narang et a1.
`.. 395/600
`5,535,365
`7/1996 Barriso et a1.
`711/152
`5,546,535
`8/1996 Stallmo et a1.
`.................... 395/182.07
`
`FOREIGN PATENT DOCUMENTS
`Mm
`........
`European Pat. Off.
`........ G06F 11/20
`7/1993
`European Pat. Off.
`........ G06F 12/08
`4/1996
`3/1995 Germany .
`5/1995 WIPO ............................. G06F 12/00
`OTHER PUBLICATIONS
`
`0551718
`0707269
`0645702
`9513583
`
`Inexpensive Disks
`A. Case for Redundant Arrays of
`(RAID); David Patterson, Garth Gibson & Randy kata;
`Dec.; 1987; pp. 1_24'
`
`Primary Examiner—Hiep T Nguyen
`[37]
`ABSTRACT
`Methods and associated apparatus for performing concur-
`rent I/O operations on a common shared subset of disk
`drives (LUNs) by a plurality of RAID controllers. The
`methods of the present invention are operable in all of a
`plurality of RAID controllers to coordinate concurrent
`access to a shared set of disk drives. In addition to providing
`redundancy features; the plurality of RAID controllers oper-
`able in accordance with the methods of the present invention
`enhance the performance of a RAID subsystem by better
`utilizing available processing power among the plurality of
`RAID controllers. Under
`the methods of the present
`invention; each of a plurality of RAID controllers may
`actively process different I/O requests on a common shared
`subset of disk drives. One of the plurality of controllers is
`designated as primary with respect to a particular shared
`subset of disk drives. The plurality of RAID controllers then
`exchange messages over a communication medium to coor-
`dinate concurrent access to the shared subset of disk drives
`through the primary controller. The messages exchanged
`include semaphore lock and release requests to coordinate
`exclusive access during critical operations as well as cache
`and meta-cache data to maintain cache coherency between
`the plurality of the RAID controllers with respect to the
`common shared subset of disk drives. These messages are
`~
`V
`~
`~
`“(ganged Vlladeiny 0f SE”? well know“ comrtnufiicanlon
`“”16. “in? me “11mg a; ire memory C9mm0“ 0
`e P.”
`ra 1ty o contro ers an.
`t e communication bus connecting
`the shared subset of disk drives to each of the plurality of
`commuters-
`
`36 Claims, 11 Drawing Sheets
`(List continued on next page.)
`
`
`
`12p
`
`HUN} #1
`HUSI
`
`
`
`
`15?1
`COMPUTER
`«112.1
`
`_/
`112,1
`CPU
`
`
`‘74 MEMDRV
`114.1
`PROGRAM
`
`
`O
`DlSK
`umvs
`V
`
`2110
`
`:
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`CACHE
`“5:1
`t, MEMOHV
`
`
`
`30AM:
`
`
`CPU
`
`
`112-2
`
`”i156
`
`
`
`DlSK
`DRIVE
`\, V
`15° 0
`— DISK
`fl];
`
`._
`
`“ 110
`
`7/110
`
`1007"“
`
`.
`“6‘2,
`
`MCEAIEgIEv
`
`5 1
`152.2
`
`”11512
`
`108 ./
`
`”1110
`
`015K
`fl];
`
`1
`
`VMWARE, INC. 1010
`
`1
`
`VMWARE, INC. 1010
`
`

`

`6,073,218
`
`Page 2
`
`US. PATENT DOCUMENTS
`
`ran 6 a.
`,
`.
`t
`81996 B
`t
`1
`395/182 03
`5 548711
`2/1997 Sato et a1.
`...... 710/52
`5,603,062
`
`
`9/1997 Suganuma et a1.
`..................... 711/114
`5,666,511
`10/1997 Vertti et a1.
`.................. 711/152
`5,678,026
`
`5,682,537 10/1997 Davies et al.
`.................. 395/726
`
`.............
`
`5,694,571
`
`12/1997 Fuller ...................................... 395/440
`
`
`,
`.
`. 395/608
`...............
`2/1998 Hayashi et a1.
`5,715,447
`711/113
`6/1998 DeKonmg et 31
`5761705
`395/275
`6/1998 PeaCOCk et a1~
`5764922
`395/821
`7/1997 Hodges 6t al~ ~~~~~~~~~~~~~~~~
`5,787,304
`5,845,292 12/1998 Bohannon et a1.
`..................... 707/202
`
`
`
`2
`
`

`

`US. Patent
`
`Jun.6,2000
`
`Sheet1,0f11
`
`6,073,218
`
` >mozm§
`/ 00F
`
`Hmo:
`
`mmhzméoo
`
`N.GE
`
`Essa:
`
`
`
` Essa:IN565B:
`
`>m02m2
`
`2<mwommmvrr
`
`
`
`mzo<oNmrr
`
`3
`
`
`

`

`US. Patent
`
`Jun.6,2000
`
`Sheetz 0f11
`
`6,073,218
`
`mor
`
`fwor
`
`\omw
`
`_..om_.
`
`NE.T/
`
`In:
`
`wt3.8
`
`mmoznZEmm
`
`150m
`
`E93m
`
`mm:
`
`mmozn—Ezmm
`
`.50;
`
`Wm:
`
`2w:
`
`N.GE
`
`\va
`
`4
`
`
`
`

`

`US. Patent
`
`Jun. 6,2000
`
`Sheet 3 0f 11
`
`6,073,218
`
` 114.1
`
`SEMAPHORE POOL
`
`118.1
`
`154
`
`5
`
`

`

`US. Patent
`
`Jun. 6, 2000
`
`Sheet 4 0f 11
`
`6,073,218
`
`v.GE
`
`2:F/(
`
`
`
`$chch>m<s=E
`
`3onmmozn—Szmm
`
`
`
`.ou_z_.oEzoo23
`
`mmmoans—mmHE;#
`
`
`
`wmmozmgfimm9553#
`
`n5Hm:
`
`mmozmdémm
`
`mepzm
`
`6
`
`

`

`US. Patent
`
`Jun. 6, 2000
`
`Sheet 5 0f 11
`
`6,073,218
`
`// 500
`
`MORE
`
`LUNS FOR
`WHICH I AM PRIMARY
`CONTROLLER
`
`YES
`
`
`
`
`
`
`ALLOCATE SEMAPHORE
`POOL FOR LUN
`
`INITIALIZE SEMAPHORE
`POOL FOR LUN
`
`FIG. 5
`
`NO
`
`‘ 502
`
`\ 504
`
`AWAIT SHARED
`ACCESS REQUEST
`
`\ 506
`
`508
`
`
`REQUEST
`
`FOR A LUN FOR
`
`
`WHICH I AM PRIMARY
`CONTROLLER
`?
`
`
`
`YES
`
`NO
`
`
`
`
`EXCLUSIVE
`ACCESS
`REQUEST
`
`? N
`
`O
`
`510
`
`
` 514
`
`RELEASE EXCLUSIVE
`ACCESS TO
`
`REQUESTED STRIPE(S)
`
`
`
`516
`
`YES
`
`
`
`ACQUIRE EXCLUSIVE
`ACCESS TO
`REQUIRED STRIPE(S)
`
`
`
`
`
`
`
`TRANSMIT EXCLUSIVE
`ACCESS GRANT MESSAGE
`
`
`TO REQUESTING
`CONTROLLER
`
`
`518
`
`7
`
`

`

`US. Patent
`
`Jun. 6, 2000
`
`Sheet 6 0f 11
`
`6,073,218
`
`
`
`
` 600fl
`ANY
`FREE
`YES
`
`
`SEMAPHORES
`
`?
`
`
`602 “/—
`
`AWAIT
`FREE SEMAPHORES
`
`
`
`608
`
`610
`
`
`WITH REQUESTED STRIPE(S)
`
`
`AND LOCK SEMAPHORE
`
`
`
`INCREMENT # LOCKED
`
`SEMAPHORES, DECREMENT
`
`# FREE SEMAPHORES
`
`
`
`
`
`
`
`
`
`AWAIT RELEASE FOR
`
`
`REQUESTED STRIPE(S)
`
`
`REQUESTED
`
`NO
`
`STRIPE(S) ALREADY
`LOCKED
`?
`
`/
`
`606
`
`
`
`ASSOCIATE A FREE SEMAPHORE
`
`
`
`
`
`
`RELEASE EXCLUSIVE ACCESS
`SEMAPHORE ASSOCIATED WITH
`
`
`
`REQUESTED STRIPE(S)
`
`
`DECREMENT # LOCKED
`SEMAPHORES, INCREMENT
`
`# FREE SEMAPHORES
`
`
`
`
`
`
`
`
`
`514
`
`612
`
`614
`
`8
`
`

`

`US. Patent
`
`Jun.6,2000
`
`Sheet7 0f11
`
`6,073,218
`
`G)
`
`GEEEw
`
`oz
`
`was:
`
`wE2522.mngm
`
`
`
`m>_w:._QXmmmfijmm
`
`
`
`QmmSOmmo._.mwmoo<
`
`BEES”.ESE
`
`aEEm85375o\_
`
`$255E82
`
`Emsam8$52
`
`@EEm
`
`k.GEwe.\
`
`>x<_>=mn_
`
`mmjomhzoo
`
`amkmmzomEmom
`
`o
`
`23
`
`
`
`
`
`mmmoo<m>_w._:oxm_._.__>_mz<x._.
`
`
`
`
`
`ommSOmmmo;mw<mmm=2meSOmm
`
`23Emsomm75@EE
`
`o\_858%ESE;
`
`
`
`
`
`wmmoo<mam—30$:<>><
`
`
`
`SSEmo<mmm§Hz<mw
`
`
`
`mmjomhzoo>m<s=E
`
`@35585320
`
`
`
`$32$385EMEE
`
`
`
`EmsamIE5&8:59%
`
`23EmsamzoaEEm
`
`muss.
`
`z_8&5
`
`@ESEo\_
`
`
`
`$3725%cho\_:53
`
`
`
`Enema:3:255
`
`omh
`
`mzo<o
`
`23.1mzu<o
`
`
`o\_E1%:
`
`mmi
`
`_.E>>mzo<oESE:
`
`>m<§mm”5293.25;
`
`Nmn‘
`
`>m<s=mn_
`
`54.55200
`
`ompmmzcmmmow
`
`o
`
`23
`
`mm;
`
`wamscmm
`
`@$55ES:»2:
`
`9
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`

`

`US. Patent
`
`Jun. 6, 2000
`
`Sheet 8 0f 11
`
`6,073,218
`
`FIG. 8
`
`l/O
`REQUEST
`
`REQUIRES CACHE
`UPDATE
`
`?
`
`TRANSMIT CACHE UPDATE
`PERMISSION REQUEST TO
`PRIMARY CONTROLLER
`
`AWAIT PERMISSION GRANT
`
`MESSAGE FROM PRIMARY
`
`CONTROLLER
`
`CACHE
`
`
`
`UPDATE LOCAL
`
`10
`
`10
`
`

`

`US. Patent
`
`Jun. 6, 2000
`
`Sheet 9 0f 11
`
`6,073,218
`
`I:I(3. 5?
`
`@
`
`
`
`764
`
`
`
`
`|/O f
`
`
`
`REQUIRES CACHE
`
`NO
`
`REQUEST
`
`900
`
`UPDATE
`
`
`
`?
`
`TRANSMIT CACHE INVALIDATE
`
`MESSAGES TO ALL
`
`SECONDARY CONTROLLERS
`
`
`
`
`
`
`
`
`ACTIVE WITH RESPECT TO
`
`ASSOCIATED LUNS
`
`UPDATE LOCAL
`CACHE
`
`11
`
`902
`
`’R‘904
`
`11
`
`

`

`US. Patent
`
`Jun. 6,2000
`
`Sheet 10 0f 11
`
`6,073,218
`
`FIG. 10
`
`
` PRIMARY
`CONTROLLER
`
`
`DAEMON
`
`
` /N 1000
`
`AWAIT CACHE UPDATE
`PERMISSION REQUEST
`
` 1002
`
`
`
`
`
`TRANSMIT CACHE INVALIDATE
`MESSAGES TO EACH
`SECONDARY CONTROLLER
`HAVING NEWLY VALIDATED
`CACHE
`
`
`
`1004
`
`
`
`TRANSMIT PERMISSION GRANT
`MESSAGE TO REQUESTING
`
`
`SECONDARY CONTROLLER
`
`
`
`
`
`
`
`
`
`
`SECONDARY
`CONTROLLER
`DAEMON
`
`1010
`
`
`AWAIT CACHE INVALIDATE
`MESSAGE FROM PRIMARY
`CONTROLLER
`
`
`
`
`INVALIDATE IDENTIFIED
`PORTIONS OF LOCAL
`CACHE MEMORY
`
`1012
`
`12
`
`12
`
`

`

`US. Patent
`
`Jun. 6,2000
`
`Sheet 11 0f 11
`
`6,073,218
`
`
`
`‘IIOO‘T FIG. 11
`
`
`
`
`PRIMARY
`PRIMARY
`D, E
`A, B
`
`
`
`SECONDARY
`SECONDARY
`
`
`C — G
`A—C, F—G
`
`
`
`PRIMARY
`F, G
`SECONDARY
`
`A—B
`
`PRIMARY
`
`C
`
`SECONDARY
`
`A-B, D-G
`
`
`
`
`
`,CLS‘L-‘I
`
`PRIMARY
`(NONE)
`SECONDARY
`A—J
`
`
`
`
`
`
`
`
`13
`
`
`
`
`
`
`
`
`PRIMARY
`H, J
`SECONDARY
`A — G
`
`LUNS Ha I. J
`
`
`
`
`
`
`13
`
`

`

`6,073,218
`
`1
`METHODS AND APPARATUS FOR
`COORDINATING SHARED MULTIPLE RAID
`CONTROLLER ACCESS TO COMMON
`STORAGE DEVICES
`
`RELATED PATENTS
`
`The present invention is related to commonly assigned
`and co-pending US. patent application entitled “Methods
`And Apparatus For Balancing Loads On A Storage Sub-
`system Among A Plurality Of Controllers”,
`invented by
`Charles Binford, Rodney A. DeKoning, and Gerald Fredin,
`and having an internal docket number of 96-018 and a Ser.
`No. of 08/772,618, filed concurrently herewith on Dec. 23,
`1996, and co-pending U.S. patent application entitled
`“Methods And Apparatus For Locking Files Within A Clus-
`tered Storage Environment”,
`invented by Rodney A.
`DeKoning and Gerald Fredin, and having an internal docket
`number of 96-028 and a Ser. No. of 08/773,470,
`filed
`concurrently herewith on Dec. 23, 1996, both of which are
`hereby incorporated by reference.
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`
`The present invention relates to storage subsystems and in
`particular to methods and associated apparatus which pro-
`vide shared access to common storage devices within the
`storage subsystem by multiple storage controllers.
`2. Discussion of Related Art
`
`Modern mass storage subsystems are continuing to pro-
`vide increasing storage capacities to fulfill user demands
`from host computer system applications. Due to this critical
`reliance on large capacity mass storage, demands for
`enhanced reliability are also high. Various storage device
`configurations and geometries are commonly applied to
`meet the demands for higher storage capacity while main-
`taining or enhancing reliability of the mass storage sub-
`systems.
`One solution to these mass storage demands for increased
`capacity and reliability is the use of multiple smaller storage
`modules configured in geometries that permit redundancy of
`stored data to assure data integrity in case of various failures.
`In many such redundant subsystems, recovery from many
`common failures can be automated within the storage sub—
`system itself due to the use of data redundancy, error
`correction codes, and so-called “hot spares” (extra storage
`modules which may be activated to replace a failed, previ-
`ously active storage module). These subsystems are typi-
`cally referred to as redundant arrays of inexpensive (or
`independent) disks (or more commonly by the acronym
`RAID). The 1987 publication by David A. Patterson, et al.,
`from University of California at Berkeley entitledA Case for
`Redundant A rrays 0fInexpensive Disks (RAID), reviews the
`fundamental concepts of RAID technology.
`There are five “levels” of standard geometries defined in
`the Patterson publication. The simplest array, a RAID level
`1 system, comprises one or more disks for storing data and
`an equal number of additional “mirror” disks for storing
`copies of the information written to the data disks. The
`remaining RAID lcvcls, identified as RAID level 2, 3, 4 and
`5 systems, segment the data into portions for storage across
`several data disks. One of more additional disks are utilized
`to store error check or parity information.
`RAID storage subsystems typically utilize a control mod-
`ule that shields the user or host system from the details of
`managing the redundant array. The controller makes the
`
`10
`
`15
`
`30
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`2
`subsystem appear to the host computer as a single, highly
`reliable, high capacity disk drive. In fact, the RAID con—
`troller may distribute the host computer system supplied
`data across a plurality of the small independent drives with
`redundancy and error checking information so as to improve
`subsystem reliability. Frequently RAID subsystems provide
`large cache memory structures to further improve the per-
`formance of the RAID subsystem. The cache memory is
`associated with the control module such that the storage
`blocks on the disk array are mapped to blocks in the cache.
`This mapping is also transparent to the host system. The host
`system simply rcqucsts blocks of data to be read or written
`and the RAID controller manipulates the disk array and
`cache memory as required.
`To further improve reliability, it is known in the art to
`provide redundant control modules to reduce the failure rate
`of the subsystem due to control electronics failures. In some
`redundant architectures, pairs of control modules are cori-
`figured such that they control the same physical array of disk
`drives. A cache memory module is associated with each of
`the redundant pair of control modules. The redundant con-
`trol modules communicate with one another to assure that
`
`the cache modules are synchronized. When one of the
`redundant pair of control modules fails, the other stands
`ready to assume control to carry on operations on behalf of
`I/O requests. However, it is common in the art to require host
`intervention to coordinate failover operations among the
`controllers.
`
`is also known that such redundancy methods and
`It
`structures may be extended to more than two control mod-
`ules. Theoretically, any number of control modules may
`participate in the redundant processing to further enhance
`the reliability of the subsystem.
`However, when all redundant control modules are
`operable, a significant portion of the processing power of the
`redundant control modules is wasted. One controller, often
`referred to as a master or the active controller, essentially
`processes all I/O requests for the RAID subsystem. The
`other redundant controllers, often referred to as slaves or
`passive controllers, are simply operable to maintain a con-
`sistent mirrored status by communicating with the active
`controller. As taught in the prior art, for any particular RAID
`logical unit (’LUN—a group of disk drives configured to be
`managed as a RAID array), there is a single active controller
`responsible for processing of all
`I/O requests directed
`thereto. The passive controllers do not concurrently manipu-
`late data on the same I,UN.
`
`to permit each passive
`is known in the prior art
`It
`controller to be deemed the active controller with respect to
`other LUNs within the RAID subsystem. So long as there is
`but a single active controller with respect to any particular
`LUN, the prior art teaches that there may be a plurality of
`active controllers associated with a RAID subsystem. In
`other words, the prior art teaches that each active controller
`of a plurality of controllers is provided with coordinated
`shared access to a subset of the disk drives. The prior art
`therefore does not teach or suggest that multiple controllers
`may be concurrently active processing different I/O requests
`directed to the same LUN.
`In view of the above it is clear that a need exists for an
`
`improved RAID control module architecture that permits
`scaling of RAID subsystem performance through improved
`connectivity of multiple controllers to shared storage mod-
`ules. In addition, it is desirable to remove the host depen-
`dency for failover coordination. More generally, a need
`exists for an improved storage controller architecture for
`
`14
`
`14
`
`

`

`6,073,218
`
`3
`improved scalability by shared access to storage devices to
`thereby enable parallel processing of multiple [/0 requests.
`
`SUMMARY OF THE INVENTION
`
`invention solves the above and other
`The present
`problems, and thereby advances the useful arts, by providing
`methods and associated apparatus which permit all of a
`plurality of storage controllers to share access to common
`storage devices of a storage subsystem. In particular,
`the
`present invention provides for concurrent processing by a
`plurality of RAID controllers simultaneously processing I/O
`requests. Methods and associated apparatus of the present
`invention serve to coordinate the shared access so as to
`
`prevent deadlock conditions and interference of one con-
`troller with the I/O operations of another controller. Notably,
`the present invention provides inter-controller communica-
`tions to obviate the need for host system intervention to
`coordinate failover operations among the controllers.
`Rather, a plurality of controllers share access to common
`storage modules and communicate among themselves to
`permit continued operations in case of failures.
`As presented herein the invention is discussed primarily
`in terms of RAID controllers sharing access to a logical unit
`(LUN) in the disk array of a RAID subsystem. One of
`ordinary skill will recognize that the methods and associated
`apparatus of the present invention are ecually applicable to
`a cluster of controllers commonly attached to shared storage
`devices. In other words, RAID control management tech-
`niques are not required for application 0 the present inven-
`tion. Rather, RAID subsystems are a common environment
`in which the present
`invention may 3e advantageously
`applied. Therefore, as used herein, a LLN (a RAID logical
`unit) is to be interpreted as equivalent
`to a plurality of
`storage devices or a portion of a one or more storage devices.
`Likewise, RAID controller or RAID con rol module is to be
`interpreted as equivalent to a storage controller or storage
`control module. For simplicity of this presentation, RAID
`terminology will be primarily utilized to describe the inven—
`tion but should not be construed to limi application of the
`present invention only to storage subsystems employing
`RAID techniques.
`More specifically, the methods of the present invention
`utilize communication between a plurality of RAID control-
`ling elements (controllers) all attached to a common region
`on a set of disk drives (a LUN) in the RAID subsystem. The
`methods of the present invention transfer messages among
`the plurality of RAID controllers to coordinate concurrcnt,
`shared access to common subsets of disk drives in the RAID
`
`
`
`subsystem. The messages exchanged between the plurality
`of RAID controllers include access coordination messages
`such as stripe lock semaphore information to coordinate
`shared access to a particular stripe of a particular LUN of the
`RAID subsystem.
`In addition,
`the messages exchanged
`between the plurality of controllers include cache coherency
`messages such as cache data and cache meta-data to assure
`consistency (coherency) between the caches of each of the
`plurality of controllers.
`In particular, one of the plurality of RAID controllers is
`designated as the primary controller with respect to each of
`the LUNs (disk drive subsets) of the RAID subsystem. The
`primary controller is responsible for fairly sharing access to
`the common disk drives of the LUN among all requesting
`RAID controllers. A controller desiring access to the shared
`disk drives of the LUN sends a message to the primary
`controller requesting an exclusive temporary lock of the
`relevant stripes of the LUN. The primary controller returns
`
`10
`
`15
`
`30
`
`35
`
`4O
`
`45
`
`50
`
`55
`
`60
`
`65
`
`4
`a grant of the requested lock in due course when such
`exclusivity is permissible. The requesting controller then
`performs any required I/O operations on the shared devices
`and transmits a lock release to the primary controller when
`the operations have completed. The primary controller man-
`ages the lock requests and releases using a pool of sema-
`phores for all controllers accessing the shared LUNs in the
`subsystem. One of ordinary skill
`in the art will readily
`recognize that the primary/secondary architecture described
`above may be equivalently implemented in a peer-to-peer or
`broadcast architecture.
`
`As used herein, exclusive, or temporary exclusive access,
`refers to access by one controller which excludes incompat—
`ible access by other controllers. One of ordinary skill will
`recognize that the degree of exclusivity among controllers
`depends upon the type of access required. For example,
`exclusive read/write access by one controller may preclude
`all other controller activity, exclusive write access by one
`controller may permit read access by other controllers, and
`similarly, exclusive append access by one controller may
`permit read and write access to other controllers for unaf-
`fected portions of the shared storage area. It is therefore to
`be understood that the terms “exclusive” and “temporary
`exclusive access” refer to all such configurations. Such
`exclusivity is also referred to herein as “coordinated shared
`access.”
`
`Since most RAID controllers rely heavily on cache
`memory subsystems to improve performance, cache data
`and cache meta-data is also exchanged among the plurality
`of controllers to assure coherency of the caches on the
`plurality of controllers which share access to the common
`LUN. Each controller which updates its cache memory in
`response to processing an [/0 request (or other management
`related I/O operation) exchanges cache coherency messages
`to that effect with a designated primary controller for the
`associated LUN. The primary controller, as noted above,
`carries the primary burden of coordinating activity relating
`to the associated LUN. In addition to the exclusive access
`
`lock structures and methods noted above, the primary cori-
`troller also serve as the distributed cache manager (DCM) to
`coordinate the state of cache memories among all controllers
`which manipulate data on the associated LUN.
`In particular, a secondary controller (non-primary with
`respect to a particular LUN) wishing to update its cache data
`in response to an I/O request must first request permission of
`the primary controller (the DCM for the associated LUN) for
`the intended update. The primary controller then invalidates
`any other copies of the same cache data (now obsolete)
`within any other cache memory of the plurality of control-
`lers. Once all other copies of the cache data are invalidated,
`the primary controller grants permission to the secondary
`controller which requested the update. The secondary con-
`troller may then complete the associated I/O request and
`update the cache as required. The primary controller (the
`DCM) thereby maintains data structures which map the
`contents of all cache memories in the plurality of controllers
`which contain cache data relating to the associated LUN.
`The semaphore lock request and release information and
`the cache data and meta-data are exchanged between the
`plurality of shared controllers through any of several com—
`munication mediums. Adedicated communication bus inter-
`
`connecting all RAID controllers may be preferred for per-
`formance criteria, but may present cost and complexity
`problems. Another preferred approach is where the infor-
`mation is exchanged via the communication bus which
`connects the plurality of controllers to the common subset of
`disk drives in the common LUN. This communication bus
`
`15
`
`15
`
`

`

`6,073,218
`
`5
`industry standard connections,
`may be any of several
`including, for example, SCSI, Fibre Channel, IPI, SSA, PCI,
`etc. Similarly the host connection bus which connects the
`plurality of RAID controllers to one or more host computer
`systems may be utilized as the shared communication
`medium. In addition, the communication medium may be a
`shared memory architecture in which the a plurality of
`controllers share access to a common, multiported memory
`subsystem (such as the cache memory subsystem of each
`controller).
`As used herein, controller (or RAID controller, or control
`module) includes any device which applies RAID tech-
`niques to an attached array of storage devices (disk drives).
`Examples of such controllers are RAID controllers embed—
`ded within a RAID storage subsystem, RAID controllers
`embedded within an attached host computer system, RAID
`control
`techniques constructed as software components
`within a computer system, etc. The methods of the present
`invention are similarly applicable to all such controller
`architectures.
`
`Another aspect of the present invention is the capability to
`achieve N—way connectivity wherein any number of con-
`trollers may share access to any number of LUNs within a
`RAID storage subsystem. A RAID storage subsystem may
`include any number of control modules. When operated in
`accordance with the present invention to provide temporary
`exclusive access to LUNs within commonly attached storage
`devices such a RAID subsystem provides redundant paths to
`all data stored within the subsystem. These redundant paths
`serve to enhance reliability of the subsystem while,
`in
`accordance with the present invention, enhancing perfor-
`mance of the subsystem by performing multiple operation
`concurrently on common shared LUNs within the storage
`subsystem.
`The configuration flexibility enabled by the present inven—
`tion permits a storage subsystem to be configured for any
`control module to access any data within the subsystem,
`potentially in parallel with other access to the same data by
`another control module. Whereas the prior art generally
`utilized two controllers only for purposes of paired
`redundancy, the present invention permits the addition of
`controllers for added performance as well as added redun-
`dancy. Cache mirroring techniques of the present invention
`are easily extended to permit (but not require) any number
`of mirrored cached controllers. By allowing any number of
`interfaces (i.e., FC-AL loops) on each controller, various
`sharing geometries may be achieved in which certain storage
`devices are shared by one subset of controller but not
`another. Virtually any mixture of connections may be
`achieved in RAID architectures under the methods of the
`present invention which permit any number of controllers to
`share access to any number of common shared LUNs within
`the storage devices.
`Furthermore, each particular connection of a controller or
`group of controllers to a particular LUN or group of LUNs
`may be configured for a different
`level of access (i.e.,
`rcad-only, rcad-writc, appcnd only, etc.) Any controllcr
`within a group of commonly connected controllers may
`configure the geometry of all controllers and LUNs in the
`storage subsystem and communicate the resultant configu—
`ration to all controllers of the subsystem. In a preferred
`embodiment of the present invention, a master controller is
`designated and is responsible for all configuration of the
`subsystem geometry.
`The present invention therefore improves the scalability
`of a RAID storage subsystem such that control modules can
`
`10
`
`15
`
`30
`
`35
`
`4O
`
`45
`
`50
`
`55
`
`60
`
`65
`
`6
`be easily added and configured for parallel access to com-
`mon shared LUNs. Likewise, additional storage devices can
`be added and utilized by any subset of the controllers
`attached thereto within the RAID storage subsystem. A
`RAID subsystem operable in accordance with the present
`invention therefore enhances the scalability of the subsystem
`to improve performance and/or redundancy through the
`N-way connectivity of controllers and storage devices.
`It is therefore an object of the present invention to provide
`methods and associated apparatus for concurrent processing
`of [/0 requests by RAID controllers on a shared LUN.
`It is a further object of the present invention to provide
`methods and associated apparatus for concurrent access by
`a plurality of RAID controllers to a common LUN.
`It is still a further object of the present invention to
`provide methods and associated apparatus for coordinating
`shared access by a plurality of RAID controllers to a
`common LUN.
`
`It is yet another object of the present invention to provide
`methods and associated apparatus for managing semaphores
`to coordinate shared access by a plurality of RAID control—
`lers to a common LUN.
`
`It is still another object of the present invention to provide
`methods and associated apparatus for managing cache data
`to coordinate shared access by a plurality of RAID control-
`lers to a common LUN.
`
`It is further an object of the present invention to provide
`methods and associated apparatus for managing cache meta-
`data to coordinate shared access by a plurality of RAID
`controllers to a common LUN.
`
`is still further an object of the present invention to
`It
`provide methods and associated apparatus for exchanging
`messages via a communication medium between a plurality
`of RAID controllers to coordinate shared access by a plu-
`rality of RAID controllers to a common LUN.
`It is another object of the present invention to provide
`methods and associated apparatus which enable N—way
`redundant connectivity within the RAID storage subsystem.
`It is still another object of the present invention to provide
`methods and associated apparatus which improve scalability
`of a RAID storage subsystem for performance.
`The above and other objects, aspects, features, and advan—
`tages of the present invention will become apparent from the
`following description and the attached drawing.
`
`BRIEF DESCRIPTION OF THE DRAWING
`
`FIG. 1 is a block diagram of a typical RAID storage
`subsystem in which the structures and methods of the
`present invention may be applied;
`FIG. 2 is a block diagram depicting a first preferred
`embodiment of RAID controllers operable in accordance
`with the methods of the present invention in which the
`controllers communicate via a shared memory bus or via the
`common disk interface channel;
`FIG. 3 is a block diagram depicting a second preferred
`embodiment of RAID controllers operable in accordance
`with the methods of the present invention in which the
`controllers communicate via one or more multipoint loop
`media connecting controllers and disk drives;
`FIG. 4 is a diagram of an exemplary semaphore pool data
`structure associated with a LUN of the RAID subsystem;
`FIG. 5 is a flowchart describing the operation of the
`primary controller in managing exclusive access to one or
`more LUNs;
`
`16
`
`16
`
`

`

`6,073,218
`
`7
`FIG. 6 contains two flowcharts providing additional
`details of the operation of steps of FIG. 5 which acquire and
`release exclusive access to particular stripes of particular
`LUNs;
`FIG. 7 is a flowchart describing the operation of a
`controller requesting temporary exclusive access to a stripe
`of a LUN for purposes of performing a requested I/O
`operation;
`FIG. 8 is a flowchart providing additional details for
`elements of FIG. 7;
`FIG. 9 is a flowchart providing additional details for
`elements of FIG. 7;
`FIG. 10 is flowcharts describing background daemon
`processing in both primary and secondary controllers for
`maintaining distributed cache coherency; and
`FIG. 11 is a block diagram depicting another preferred
`embodiment of RAID controllers operable in accordance
`with the methods of the present invention in which the
`plurality of controllers communicate via one or more multi-
`point loop media connecting controllers and disk drives.
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENT
`
`While the invention is susceptible to various modifica-
`tions and alternative forms, a specific embodiment thereof
`has been shown by way of example in the drawings and will
`herein be described in detail.
`It should be understood,
`however, that it is not intended to limit the invention to the
`particular form disclosed, but on the contrary, the invention
`is to cover all modifications, equivalents, and alternatives
`falling within the spirit and scope of the invention as defined
`by the appended claims.
`
`RAID SUBSYSTEM OVERVIEW
`
`FIG. 1 is a block diagram of a typical RAID storage
`subsystem 100, having redundant disk array controllers
`118.1 and 118.2 (hereinafter synonymously referred to as
`RDACs, RAID control modules, or control modules), in
`which the methods and associated apparatus of the present
`invention may be applied. RAID storage subsystem 100
`includes at least two RDACs 118.1 and 118.2. Each RDAC
`118.1 and 1182 is in turn connected to disk array 108 and
`to one another via bus (or busses) 150 and to host computer
`120 via bus 154. Disk array 108 is comprised of a plurality
`of disk drives 110 (also referred to herein as storage
`devices). One of ordinary skill in the art will readily recog-
`nize that interface bus 150 between RDACs 118.1 and 118.2
`
`and disk array 108 (including disk drives 110) may be any
`of several industry standard interface busses including SCSI,
`IDE, EIDE, IPI, Fibre Channel, SSA, PCI, etc

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket