`DeKoning et al.
`
`[19]
`
`US006073218A
`[111 Patent Number:
`[451 Date of Patent:
`
`6,073,218
`Jun. 6, 2000
`
`[54]
`
`METHODS AND APPARATUS FOR
`COORDINATING SHARED MULTIPLE RAID
`CONTROLLER ACCESS TO COMMON
`STORAGE DEVICES
`
`[75]
`
`Inventors: Rodney A. DeKoning; Gerald J.
`Fredin, both of Wichita, Kans.
`
`[73] Assignee: LSI Logic Corp., Milpitas, Calif.
`
`[21]
`
`Appl. No.: 08/772,614
`
`[22]
`[511
`[521
`
`[58]
`
`[56]
`
`Filed:
`
`Dec. 23, 1996
`
`Int. CI.7 ...................................................... G06F 13/16
`U.S. Cl ............................ 711/150; 711/114; 711/148;
`711/149; 711/151; 711/152; 711/153; 711/162;
`711/168; 710/’20; 710,/21; 710/38; 710/241;
`714/6; 714/11
`Field of Search ..................................... 711/114, 152,
`711/145, 148, 149, 150, 151, 153, 168,
`162; 714/6, 11; 710/20, 21, 38, 241
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`3,702,006
`5,101,492
`5,148,432
`5,210,860
`5,249,279
`5,317,731
`5,331,476
`5,367,669
`5,379,417
`5,386,324
`5,388,108
`5,434,970
`5,440,743
`5,446,855
`5,455,934
`5,459,864
`5,495,601
`5,535,365
`5,546,535
`
`10/1972
`3/1992
`9/1992
`5/1993
`9/1993
`5/1994
`7/1994
`11/1994
`1/1995
`1/1995
`2/1995
`7/1995
`8/1995
`8/1995
`10/1995
`10/1995
`2/1996
`7/1996
`8/1996
`
`Page ............................................ 444/1
`Schultz et al ........................... 395/575
`Gordon et al .......................... 371/10.1
`Pfeffer et al ............................ 395/575
`Schmenk et al ........................ 395/425
`Dias et al ............................... 395/600
`Fry et al ................................... 360/53
`Holland et al .......................... 395/575
`Lai et al ................................. 706/916
`Fry et al ................................... 360/53
`DeMoss et al ........................ 371/51.1
`Schiflleger .............................. 395/200
`Yokota et al ........................... 395/650
`Dang et al .............................. 395/401
`Holland et al .......................... 395/404
`Brent et al .............................. 395/650
`Narang et al ........................... 395/600
`Barriso et al ........................... 711/152
`StallinG et al ..................... 395/182.07
`
`FOREIGN PATENT DOCUMENTS
`
`0493984
`0551718
`0707269
`0645702
`9513583
`
`7/1992
`7/1993
`4/1996
`3/1995
`5/1995
`
`European Pat. Off ......... G06F 11/10
`European Pat. Off ......... G06F 11/20
`European Pat. Off ......... G06F 12/08
`Germany.
`WIPO ............................. G06F 12/00
`
`OTHER PUBLICATIONS
`
`A. Case for Redundant Arrays of Inexpensive Disks
`(RAID); David Patterson, Garth Gibson & Randy kata;
`Dec., 1987; pp. 1 24.
`
`Primary Examinerq-Iiep T Nguyen
`
`[57]
`
`ABSTRACT
`
`Methods and associated apparatus for performing concur-
`rent I/O operations on a common shared subset of disk
`drives (LUNs) by a plurality of RAID controllers. The
`methods of the present invention are operable in all of a
`plurality of RAID controllers to coordinate concurrent
`access to a shared set of disk drives. In addition to providing
`redundancy features, the plurality of RAID controllers oper-
`able in accordance with the methods of the present invention
`enhance the performance of a RAID subsystem by better
`utilizing available processing power among the plurality of
`RAID controllers. Under the methods of the present
`invention, each of a plurality of RAID controllers may
`actively process different I!O requests on a common shared
`subset of disk drives. One of the plurality of controllers is
`designated as primary with respect to a particular shared
`subset of disk drives. The plurality of RAID controllers then
`exchange messages over a communication medium to coor-
`dinate concurrent access to the shared subset of disk drives
`through the primary controller. The messages exchanged
`include semaphore lock and release requests to coordinate
`exclusive access during critical operations as well as cache
`and meta-cache data to maintain cache coherency between
`the plurality of the RAID controllers with respect to the
`common shared subset of disk drives. These messages are
`exchanged via any of several well known communication
`mediums including, a shared memory common to the plu-
`rality of controllers and the communication bus connecting
`the shared subset of disk drives to each of the plurality of
`controllers.
`
`(Eist continued on next page.)
`
`36 Claims, 11 Drawing Sheets
`
`RDAC #1 152.1
`
`IBM-Oracle 1010
`Page 1 of 28
`
`
`
`6,073,218
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`8/1996 Brant et al ......................... 395/182.03
`5~548~711
`2/1997 Sato et al .................................. 710/52
`5~603~062
`9/1997 Suganuma et al ...................... 711/114
`5~666~511
`5~678~026 10/1997 Vertti et al .............................. 711/152
`5,682,537 10/1997 Davies et al ............................ 395/726
`
`5~694~571 12/1997 Fuller ...................................... 395/440
`5~715~447
`2/1998 Hayashi et al .......................... 395/608
`6/1998 DeKoning et al ...................... 711/113
`5,761,705
`6/1998 Peacock et al ......................... 395/275
`5~764~922
`7/1997 Hodges et al ........................... 395/821
`5~787~304
`5,845,292 12/1998 Bohannon et al ...................... 707/202
`
`IBM-Oracle 1010
`Page 2 of 28
`
`
`
`U.S. Patent
`
`Jun. 6, 2000
`
`Sheet 1 of 1~
`
`6,073,218
`
`IBM-Oracle 1010
`Page 3 of 28
`
`
`
`U.S. Patent
`
`Jun. 6, 2000
`
`Sheet 2 of 11
`
`6,073,218
`
`IBM-Oracle 1010
`Page 4 of 28
`
`
`
`U.S. Patent
`
`Jun. 6,2000
`
`Sheet 3 ofll
`
`6,073,218
`
`I
`I
`/
`/
`/
`/
`/
`/
`/
`/
`
`IBM-Oracle 1010
`Page 5 of 28
`
`
`
`U.S. Patent
`
`Jun. 6, 2000
`
`Sheet 4 of 11
`
`6,073,218
`
`IBM-Oracle 1010
`Page 6 of 28
`
`
`
`U.S. Patent
`
`Jun. 6, 2000
`
`Sheet 5 or 11
`
`6,073,218
`
`FIG. 5
`
`0
`
`NO
`
`T YES
`
`ALLOCATE SEMAPHORE
`POOL FOR LUN
`
`~- 502
`
`POOL FOR LUN ~504
`
`AWAIT SHARED
`ACCESS REQUEST
`
`5O8
`
`NO
`
`YES
`
`516
`
`/
`ACQUIRE EXCLUSIVE
`ACCESS TO
`REQUIRED STRIPE(S)
`
`ACCESS GRANT MESSAGE
`TO REQUESTING
`CONTROLLER
`TRANSMIT EXCLUSIVE I
`
`RELEASE EXCLUSIVE
`ACCESS TO
`REQUESTED STRIPE(S)
`
`~o514
`
`IBM-Oracle 1010
`Page 7 of 28
`
`
`
`U.S. Patent
`
`Jun. 6, 2000
`
`Sheet 6 or U
`
`6,073,218
`
`FIG. 6
`
`~516
`
`600
`
`"( No
`AWAIT
`FREE SEMAPHORES
`
`602 j t
`
`604
`
`REQUESTED ~n
`IPE($) ALREADY~
`
`~~~.
`
`REQUESTED STRIPE(S)
`606J1 AWAIT RELEASE FOR [
`I.
`
`ASSOCIATE A FREE SEMAPHORE
`WITH REQUESTED STRIPE(S)
`AND LOCK SEMAPHORE
`
`608
`
`INCREMENT # LOCKED
`SEMAPHORES, DECREMENT
`# FREE SEMAPHORES
`
`RELEASE EXCLUSIVE ACCESS
`SEMAPHORE ASSOCIATED WITH
`REQUESTED STRIPE(S)
`
`DECREMENT # LOCKED
`SEMAPHORES, INCREMENT
`# FREE SEMAPHORES
`
`IBM-Oracle 1010
`Page 8 of 28
`
`
`
`U.S. Patent
`
`Jun. 6,2000
`
`Sheet 7 ofll
`
`6,073,218
`
`L
`
`IBM-Oracle 1010
`Page 9 of 28
`
`
`
`U.S. Patent
`
`Jun. 6, 2000
`
`Sheet 8 of U
`
`6,073,218
`
`FIG.
`
`758
`
`802
`
`8O4
`
`806
`
`800
`
`I/o
`REQUEST
`
`NO
`
`YES
`
`TRANSMIT CACHE UPDATE
`PERMISSION REQUEST TO
`PRIMARY CONTROLLER
`
`AWAIT PERMISSION GRANT
`MESSAGE FROM PRIMARY
`CONTROLLER
`
`UPDATE LOCAL
`CACHE
`
`IBM-Oracle 1010
`Page 10 of 28
`
`
`
`U.S. Patent
`
`Jun. 6, 2000
`
`Sheet 9 of ~1
`
`6,073,2 18
`
`FIG. 9
`
`764
`
`900
`
`i/o
`REQUEST
`nrL, IUIrlro t.,~bnc
`
`NO
`
`\
`
`UP~ATE
`
`TRANSMIT CACHE INVALIDATE
`MESSAGES TO ALL
`SECONDARY CONTROLLERS
`ACTIVE WITH RESPECT TO
`ASSOCIATED LUNS
`
`902
`
`UPDATE LOCAL
`CACHE
`
`IBM-Oracle 1010
`Page 11 of 28
`
`
`
`U.S. Patent
`
`Jun. 6,2000
`
`Sheet 10 ofll
`
`6,073,218
`
`FIG. 10
`
`ROLLER )
`
`AWAIT CACHE UPDATE
`PERMISSION REQUEST
`
`TRANSMIT CACHE INVALIDATE
`MESSAGES TO EACH
`SECONDARY CONTROLLER
`HAVING NEWLY VALIDATED
`CACHE
`
`~----- 1002
`
`TRANSMIT PERMISSION GRANT
`MESSAGE TO REQUESTING
`SECONDARY CONTROLLER
`
`~I004
`
`1010
`
`/
`
`AWAIT CACHE INVALIDATE
`MESSAGE FROM PRIMARY
`CONTROLLER
`
`INVALIDATE IDENTIFIED
`PORTIONS OF LOCAL
`CACHE MEMORY
`
`1012
`
`IBM-Oracle 1010
`Page 12 of 28
`
`
`
`U.S. Patent
`
`Jun. 6,2000
`
`Sheet 11 ofll
`
`6,073,218
`
`FIG. 11
`
`150.2 ~
`;--lO8.1
`
`LUNS A, B, C, D
`
`108.2
`\,
`/
`
`/
`/
`I
`
`LUNS E, F, G
`
`PRIMARY
`D,E
`SECONDARY
`A-C, F-6
`
`118.4
`
`118.5
`
`PRIMARY
`F,G
`SECONDARY
`A-B
`
`118.6
`
`PRIMARY
`(NONE)
`SECONDARY
`A-J
`
`PRIMARY
`A,B
`SECONDARY
`C;-G
`
`118.1
`
`118.2
`
`PRIMARY
`C
`SECONDARY
`A-B, D-G
`
`118.3
`
`PRIMARY
`H,J
`SECONDARY
`A-G
`
`IBM-Oracle 1010
`Page 13 of 28
`
`
`
`6,073,218
`
`1
`METHODS AND APPARATUS FOR
`COORDINATING SHARED MULTIPLE RAID
`CONTROLLER ACCESS TO COMMON
`STORAGE DEVICES
`
`RELATED PATENTS
`
`Tlre present invention is related to commonly assigned
`and co-pending U.S. patent application entitled "Methods
`And Apparatus For Balancing Loads On A Storage Sub-
`system Among A Plurality Of Controllers", invented by
`Charles Binford, Rodney A. DeKoning, and Gerald Fredin,
`and having an internal docket number of 96-018 and a Set.
`No. of 08/772,618, filed concurrently herewith on Dec. 23,
`1996, and co-pending U.S. patent application entitled
`"Methods And Apparatus For Locking Files Within A Clus-
`tered Storage Environment", invented by Rodney A.
`DeKoning and Gerald Fredin, and having an internal docket
`number of 96-028 and a Ser. No. of 08/773,470, filed
`concurrently herewith on Dec. 23, 1996, both of which are
`hereby incorporated by reference.
`
`10
`
`15
`
`20
`
`BACKGROUND OF THE INVENTION
`
`2
`subsystem appear to the host computer as a single, highly
`reliable, high capacity disk drive. In fact, the RAID con-
`trollcr may distributc thc host computcr systcm supplicd
`data across a plurality of the small independent drives with
`redundancy and error checking information so as to improve
`subsystem reliability. Frequently RAID subsystems provide
`large cache memory structures to further improve the per-
`formance of the RAID snbsystem. The cache memory is
`associated with the control module such that the storage
`blocks on the disk array are mapped to blocks in the cache.
`This mapping is also transparent to the host system. The host
`system simply requests blocks of data to be read or written
`and the RAID controller manipulates the disk array and
`cache memory as required.
`To further improve reliability, it is known in the art to
`provide redundant control modules to reduce the failure rate
`of the subsystem due to control electronics failures. In some
`redundant architectures, pairs of control modules are con-
`figured such that they control the same physical array of disk
`drives. A cache mcnrory nrodule is associated with each of
`the redundant pair of control modules. The redundant con-
`trol modules communicate with one another to assure that
`the cache modules are synchronized. When one of the
`redundant pair of control modules fails, the other stands
`ready to assume control to carry on operations on behalf of
`I/O requests. However, it is common in the art to require host
`intervention to coordinate failover operations among the
`controllers.
`It is also known that such redundancy methods and
`structures may be extended to more than two control mod-
`ules. Theoretically, any number of control modules may
`participate in the redundant processing to further enhance
`the reliability of the subsystem.
`
`However, when all redundant control modules are
`operable, a significant portion of the processing power of the
`redundant control modules is wasted. One controller, often
`referred to as a master or the active controller, essentially
`processes all I/0 requests for the RAID subsystem. The
`other redundant controllers, often referred to as slaves or
`passive controllers, are simply operable to maintain a con-
`sistent mirrored status by communicating with the active
`controller. As taught in the prior art, for any particular RAID
`logical unit (LUN--a group of disk drives configured to be
`managed as a RAID array), there is a single active controller
`responsible for processing of all I/0 requests directed
`thereto. The passive controllers do not concurrently manipu-
`late data on the same LUN.
`It is known in the prior art to permit each passive
`controller to be deemed the active controller with respect to
`other LUNs within the RAID subsystem. So long as there is
`but a single active controller with respect to any particular
`LUN, the prior art teaches that there may be a plurality of
`active controllers associated with a RAID subsystem. In
`other words, the prior art teaches that each active controller
`of a plurality of controllers is provided with coordinated
`shared access to a subset of the disk drives. The prior art
`therefore does not teach or suggest that multiple controllers
`may be concurrently active processing different I/0 requests
`directed to the same LUN.
`In view of the above it is clear that a need exists for an
`improved RAID control module architecture that permits
`scaling of RAID subsystem performance through improved
`connectivity of multiple controllers to shared storage mod-
`ules. In addition, it is desirable to remove the host depen-
`dency for failover coordination. More generally, a need
`exists for an improved storage controller architecture for
`
`IBM-Oracle 1010
`Page 14 of 28
`
`30
`
`35
`
`1. Field of the Invention
`The present invention relates to storage subsystems and in 25
`particular to methods and associated apparatus which pro-
`vide shared access to common storage devices within the
`storage subsystem by multiple storage controllers.
`2. Discussion of Related Art
`Modern mass storage subsystems are continuing to pro-
`vide increasing storage capacities to fulfill user demands
`from host computer system applications. Due to this critical
`rcliancc on largc capacity mass storagc, dcmands for
`enhanced reliability are also high. Various storage device
`configurations and geometries are commonly applied to
`meet the demands for higher storage capacity xvhile main-
`taining or enhancing reliability of the mass storage sub-
`systems.
`One solution to these mass storage demands for increased 40
`capacity and reliability is the use of multiple smaller storage
`modules configured in geometries that permit redundancy of
`stored data to assure data integrity in case of various failures.
`In many such redundant subsystems, recovery from many
`common failures can be automated within the storage sub- 45
`system itself due to the use of data redundancy, error
`correction codes, and so-called "hot spares" (extra storage
`modules which may be activated to replace a failed, previ-
`ously active storage module). These subsystems are typi-
`cally referred to as redundant arrays of inexpensive (or 5o
`indcpcndcn0 disks (or morc commonly by thc acronym
`RAID). The 1987 publication by David A. Patterson, et al.,
`from University of California at Berkeley entitled A Case for
`Redundant Arrays of Inexpensive Disks (RAID), reviews the
`fundamental concepts of RAID technology,
`There are five "levels" of standard geometries defined in
`the Patterson publication. The simplest array, a RAID level
`1 system, comprises one or more disks for storing data and
`an equal number of additional "mirror" disks for storing
`copies of the information written to the data disks. The 6o
`remaining RAID levels, identified as RAID level 2, 3, 4 and
`5 systcms, scgmcnt thc data into portions for storagc across
`several data disks. One of more additional disks are utilized
`to store error check or parity information.
`RAID storage subsystems typically utilize a control mod- 65
`ule that shields the user or host system from the details of
`managing the redundant array. The controller makes the
`
`ss
`
`
`
`6,073,218
`
`3
`improved scalability by shared access to storage devices to
`thereby enable parallel processing of multiple liO requests.
`
`SUMMARY OF THE INVENTION
`
`The present invention solves the above and other
`problems, and thereby advances the useful arts, by providing
`methods and associated apparatus which permit all of a
`plurality of storage controllers to share access to common
`storage devices of a storage subsystem. In particular, the
`prcscnt invcntion providcs for concurrcnt proccssing by a
`plurality of RAID controllers simultaneously processing I/O
`requests. Methods and associated apparatus of the present
`invention serve to coordinate the shared access so as to
`prevent deadlock conditions and interference of one con-
`troller with the I/O operations of another controller. Notably,
`the present invention provides inter-controller communica-
`tions to obviate the need for host system intervention to
`coordinate failover operations among the controllers.
`Rather, a plurality of controllers share access to comnron
`storage modules and communicate among themselves to
`permit continued operations in case of failures.
`As presented herein the invention is discussed primarily
`in terms of RAID controllers sharing access to a logical unit
`(LUN) in the disk array of a RAID subsystem. One of
`ordinary skill will recognize that the methods and associated
`apparatus of the present invention are equally applicable to
`a cluster of controllers commonly attached to shared storage
`devices. In other words, RAID control management tech-
`niques are not required for application of the present inven-
`tion. Rather, RAID subsystems arc a common environment
`in which the present invention may be advantageously
`applied. Therefore, as used herein, a LUN (a RAID logical
`unit) is to be interpreted as equivalent to a plurality of
`storage devices or a portion of a one or more storage devices.
`Likewise, RAID controller or RAID control module is to be
`interpreted as equivalent to a storage controller or storage
`control module. For simplicity of this presentation, RAID
`terminology will be primarily utilized to describe the inven-
`tion but should not be construed to limit application of the
`present invention only to storage subsystems employing
`RAID techniques.
`More specifically, the methods of the present invention
`utilize communication between a plurality of RAID control-
`ling elements (controllers) all attached to a common region
`on a set of disk drives (a LUN) in the RAID subsystem. The
`methods of the present invention transfer messages among
`the plurality of RAID controllers to coordinate concurrent,
`shared access to common subsets of disk drives in the RAID
`subsystem. The messages exchanged between the plurality
`of RAID controllers include access coordination messages
`such as stripe lock semaphore information to coordinate
`shared access to a particular stripe of a particular LUN of the
`RAID subsystem. In addition, the messages exchanged
`between the plurality of controllers include cache coherency
`messages such as cache data and cache meta-data to assure
`consistency (coherency) between the caches of each of the
`plurality of controllers.
`In particular, one of the plurality of RAID controllers is
`designated as the primary controller with respect to each of
`the LUNs (disk drive subsets) of the RAID subsystem. The
`primary controller is responsible for fairly sharing access to
`the common disk drives of the LUN among all requesting
`RAID controllers. A controller desiring access to the shared
`disk drives of the LUN sends a message to the primary
`controller requesting an exclusive temporary lock of the
`relevant stripes of the LUN. The primary controller returns
`
`4
`a grant of the requested lock in due course when such
`exclusivity is permissible. The requesting controller then
`performs any required I/O operations on the shared devices
`and transmits a lock release to the primary controller when
`s the operations have completed. The primary controller man-
`ages the lock requests and releases using a pool of sema-
`phores for all controllers accessing the shared LUNs in the
`subsystem. One of ordinary skill in the art will readily
`recognize that the primary/secondary architecture described
`above may be equivalently implemented in a peer-to-peer or
`broadcast architecture.
`As used herein, exch~sive, or temporary exclusive access,
`rcfcrs to acccss by onc controllcr which cxcludcs incompat-
`ible access by other controllers. One of ordinary sMll will
`~5 recognize that the degree of exclusivity among controllers
`depends upon the type of access required. For example,
`exclusive read/write access by one controller may preclude
`all other controller activity, exclusive write access by one
`controller may permit read access by other controllers, and
`
`10
`
`2o similarly, exclusive append access by one controller may
`permit read and write access to other controllers for unaf-
`fected portions of the shared storage area. It is therefore to
`be understood that the terms "exclusive" and "temporary
`exclusive access" refer to all such configurations. Such
`25 exclusivity is also referred to herein as "coordinated shared
`access."
`Since most RAID controllers rely heavily on cache
`memory subsystems to improve performance, cache data
`and cache recta-data is also exchanged among the plurality
`3o of controllers to assure coherency of the caches on the
`plurality of controllers which share access to the common
`I,UN. Each controller which updates its cache memory in
`rcsponsc to proccssing an I/O rcqucst (or other managcmcnt
`related I/O operation) exchanges cache coherency messages
`3s to that effect with a designated primary controller for the
`associated LUN. The primary controller, as noted above,
`carries the primary burden of coordinating activity relating
`to the associated LUN. In addition to the exclusive access
`lock structures and methods noted above, the primary con-
`4o troller also serve as the distributed cache manager (DCM) to
`coordinate the state of cache memories among all controllers
`which manipulate data on the associated LUN.
`In particular, a secondary controller (non-primary with
`respect to a particular LUN) wishing to update its cache data
`45 in response to an I/O request must first request permission of
`the primary controller (the DCM for the associated LUN) for
`the intended update. The primary controller then invalidates
`any other copies of the same cache data (now obsolete)
`within any other cache memory of the plurality of control-
`5o lers. Once all other copies of the cache data are invalidated,
`the primary controller grants permission to the secondary
`controller which requested the update. The secondary con-
`troller may then complete the associated I/O request and
`update the cache as required. The primary controller (the
`ss DCM) thereby maintains data structures which map the
`contcnts of all cachc mcmorics in thc plurality of controllers
`which contain cache data relating to the associated LUN.
`The semaphore lock request and release information and
`the cache data and meta-data are exchanged between the
`60 plurality of shared controllers through any of several com-
`munication mediums. A dedicated communication bus inter-
`connecting all RAID controllers may be preferred for per-
`formance criteria, but may present cost and complexity
`problems. Another preferred approach is where the infor-
`65 marion is exchanged via the communication bus which
`connects the plurality of controllers to the common subset of
`disk drives in the common LUN. This communication bus
`
`IBM-Oracle 1010
`Page 15 of 28
`
`
`
`6,073,218
`
`may be any of several industry standard connections,
`including, for example, SCSl, Fibre Channel, lPl, SSA, PCI,
`ctc. Similarly thc host conncction bus which connccts thc
`plurality of RAID controllers to one or more host computer
`systems may be utilized as the shared communication
`medium. In addition, the communication medium may be a
`shared memory architecture in which the a plurality of
`controllers share access to a common, multiported memory
`subsystem (such as the cache memory subsystem of each
`controller).
`As uscd hcrcin, controllcr (or RAID controllcr, or control
`module) includes any device which applies RAID tech-
`niques to an attached array of storage devices (disk drives).
`Examples of such controllers are RAID controllers embed-
`ded within a RAID storage subsystem, RAID controllers
`embedded within an attached host computer system, RAID
`control techniques constructed as software components
`within a computer system, etc. The methods of the present
`invention are similarly applicable to all such controller
`architectures.
`Another aspect of the present invention is the capability to
`achieve N-way connectivity wherein any number of con-
`trollers may share access to any number of LUNs within a
`RAID storage subsystem. A RAID storage subsystem may
`include any number of control modules. When operated in
`accordance with the present invention to provide temporary
`exclusive access to LUNs within commonly attached storage
`devices such a RAID subsystenr provides redundant paths to
`all data stored within the subsystem. These redundant paths
`serve to enhance reliability of the subsystem while, in
`accordance with the present invention, enhancing perfor-
`mance of the subsystem by performing multiple operation
`concurrently on common shared LUNs within the storage
`subsystem.
`The configuration flexibility enabled by the present inven-
`tion permits a storage subsystcnr to bc configured for any
`control module to access any data within the subsystem,
`potentially in parallel with other access to the same data by
`another control module. Whereas the prior art generally
`utilized two controllers only for purposes of paired
`redundancy, the present invention permits the addition of
`controllers ftor added performance as well as added redun-
`dancy. Cache mirroring techniques of the present invention
`are easily extended to permit (but not require) any number
`of mirrored cached controllers. By allowing any number of
`interfaces (i.e., FC-AL loops) on each controller, various
`sharing geometries may be achieved in which certain storage
`devices are shared by one subset of controller but not
`another. Virtually any mixture of connections may be
`achieved in RAID architectures under the methods of the
`present invention which permit any number of controllers to
`share access to any number of common shared LUNs within
`the storage devices.
`
`Furthermore, each particular connection of a controller or
`group of controllers to a particular LUN or group of LUNs
`may be configured for a different level of access (i.e.,
`read-only, read-write, append only, etc.). Any controller
`within a group of commonly connected controllers may
`configure the geometry of all controllers and LUNs in the
`storage subsystem and communicate the resultant configu-
`ration to all controllers of the subsystem. In a preferred
`embodiment of the present invention, a master controller is
`designated and is responsible for all configuration of the
`subsystem geometry.
`The present invention therefore improves the scalability
`of a RAID storage subsystem such that control modules can
`
`6
`be easily added and configured for parallel access to com-
`mon shared I,UNs. l,ike~vise, additional storage devices can
`bc addcd and utilizcd by any subsct of thc controllcrs
`attached thereto within the RAID storage subsystem. A
`s RAID subsystem operable in accordance with the present
`invention therefore enhances the scalability of the subsystem
`to improve performance and/or redundancy through the
`N-way connectivity of controllers and storage devices.
`
`It is therefore an object of the present invention to provide
`10 methods and associated apparatus for concurrent processing
`of IiO rcqucsts by RAID controllcrs on a sharcd LUN.
`It is a further object of the present invention to provide
`methods and associated apparatus for concurrent access by
`a plurality of RAID controllers to a common LUN.
`It is still a further object of the present invention to
`provide methods and associated apparatus for coordinating
`shared access by a plurality of RAID controllers to a
`common LUN.
`
`15
`
`20 It is yet another object of the present invention to provide
`methods and associated apparatus for managing semaphores
`to coordinate shared access by a plurality of RAID control-
`lers to a common LUN.
`
`It is still another object of the present invention to provide
`25 methods and associated apparatus for managing cache data
`to coordinate shared access by a plurality of RAID control-
`lers to a common LUN.
`It is further an object of the present invention to provide
`methods and associated apparatus for managing cache meta-
`30 data to coordinate shared access by a plurality of RAID
`
`controllers to a common LUN.
`It is still further an object of the present invention to
`provide methods and associated apparatus for exchanging
`messages via a communication medium between a plurality
`3s of RAID controllers to coordinate shared access by a plu-
`
`rality of RAID controllers to a common LUN.
`It is another object of the present invention to provide
`methods and associated apparatus which enable N-way
`redundant connectivity within the RAID storage subsystem.
`It is still another object of the present invention to provide
`methods and associated apparatus which improve scalability
`of a RAID storage subsystem for performance.
`
`4O
`
`The above and other objects, aspects, features, and advan-
`45 rages of the present invention will become apparent from the
`following description and the attached drawing.
`
`BRIEF DESCRIPTION OF THE DRAWING
`
`FIG. 1 is a block diagram of a typical RAID storage
`5o subsystem in which the structures and methods of the
`
`55
`
`present invention may be applied;
`FIG. 2 is a block diagram depicting a first preferred
`embodiment of RAID controllers operable in accordance
`with the methods of the present invention in which the
`controllers communicate via a shared memory bus or via the
`common disk interface channel;
`FIG. 3 is a block diagram depicting a second preferred
`embodiment of RAID controllers operable in accordance
`
`60 with the nrcthods of the present invention in which the
`controllers communicate via one or more multipoint loop
`media connecting controllers and disk drives;
`FIG. 4 is a diagram of an exemplary semaphore pool data
`structure associated with a LUN of the RAID subsystem;
`65 FIG. ~ is a flowchart describing the operation of the
`primary controller in managing exclusive access to one or
`more LUNs;
`
`IBM-Oracle 1010
`Page 16 of 28
`
`
`
`6,073,218
`
`7
`FIG. 6 contains two flowcharts providing additional
`details of the operation of steps of FIG. 5 which acquire and
`rclcasc cxclusivc acccss to particular stripcs of particular
`LUNs;
`FIG. 7 is a flowchart describing the operation of a
`controller requesting temporary exclusive access to a stripe
`of a EUN for purposes of performing a requested 1/O
`operation;
`FIG. 8 is a flowchart providing additional details for
`elements of FIG. 7;
`FIG. 9 is a flowchart providing additional details for
`elements of FIG. 7;
`FIG. 10 is flowcharts describing background daemon
`processing in both primary and secondary controllers for
`maintaining distributed cache coherency; and
`FIG. 11 is a block diagram depicting another preferred
`embodiment of RAID controllers operable in accordance
`with the methods of the present invention in which the
`plurality of controllers communicate via one or more multi-
`point loop media connecting controllers and disk drives.
`
`DETAII,ED DESCRIPTION OF THE
`PREFERRED EMBODIMENT
`
`8
`disk drives 110. For example, xvhen implementing RAID
`level 1 features, approximately half of the disk drives 110 of
`disk array 108 arc used to store and rctricvc data while the
`other half is operated by the RAID controller to mirror the
`s data storage contents of the first half. Further, when imple-
`menting RAID level 4 features, the RAID controller utilizes
`a portion of the disk drives 110 in disk array 108 for the
`storage of data and the remaining disk drives 110 are utilized
`for the storage of error checking/correcting information (e.g.
`s0 parity information). As discussed below, the methods and
`associated apparatus of the present invention may be applied
`to the RAID storage subsystem 100 in conjunction with any
`of the standard RAID levels.
`RDAC 118.1 includes CPU 112.1, program memory
`~5 114.1 (e.g. ROM/RAM devices for storing program instruc-
`tions and variables for the operation of CPU 112.1), and
`cache memory 116.1 for storing data and control informa-
`tion related to the data stored in disk array 108. CPU 112.1,
`program memory 114.1, and cache memory 116.1 are con-
`20 nected via nrenrory bus 152.1 to enable CPU 112.1 to store
`and retrieve information in the memory devices. RDAC
`118.2 is identical to RDAC 118.1 and is comprised of CPU
`112.2, program memory 114.2 and cache memory 116.2, all
`interconnected via memory bus 152.2.
`
`While the invention is susceptible to various modifier- 25