throbber
United States Patent [19]
`Stallmo
`
`I 1111111111111111 11111 111111111111111111111111111111 11111111111111111111111
`US005519844A
`[11] Patent Number:
`[45] Date of Patent:
`
`5,519,844
`May 21, 1996
`
`[54] LOGICAL PARTITIONING OF A
`REDUNDANT ARRAY STORAGE SYSTEM
`
`[75]
`
`Inventor: David C. Stallmo, Boulder, Colo.
`
`[73] Assignee: EMC Corporation, Hopkinton, Mass.
`
`Maximum Strategy, Inc., San Jose, CA: Strategy 2 Disk
`Array Controller Operation Manual (Nov. 2, 1988).
`Maximum Strategy, Inc., San Jose, CA; Stategy 1 Disk
`Array Controller Operation Manual (Date unknown).
`Gibson, G. A., Performance and Reliability in Redundant
`Arrays of Inexpensive Disks (Date Unknown).
`
`[21] Appl. No.: 215,013
`
`[22] Filed:
`
`Mar. 21, 1994
`
`(List continued on next page.)
`
`Primary Examiner-Stephen M. Baker
`Attorney, Agent, or Firm-John Land
`
`Related U.S. Application Data
`
`[57]
`
`ABSTRACT
`
`[63] Continuation of Ser. No. 612,220, Nov. 9, 1990, abandoned.
`Int. Cl.6
`............................. G06F 11/10; G06F 12/06
`[51]
`[52] U.S. CI . ............................................. 395/441; 395/404
`[58] Field of Search ........................... 371/10.2; 395/575,
`395/275, 404, 441
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`3,893,178
`4,092,732
`4,467,421
`4,562,576
`
`7/1975 Sordello ............................... 360/73.02
`5/1978 Ouchi ...................................... 395/575
`8/1984 White ...................................... 395/425
`12/1985 Ratcliffe ................................. 371/11.1
`
`(List continued on next page.)
`
`FOREIGN PATENT DOCUMENTS
`
`9113404
`
`9/1991 WIPO .
`
`OTIIER PUBLICATIONS
`
`Patterson, D. a., Gibson, G., and Katz, H.; A Case For
`Redundant Arrays of Inexpensive Disks (RAID), ACM
`Sigmod Conf., Jun. 1-3, 1988, pp. 109-116.
`Lee, E. K.; Software and Performance Issues in the Imple(cid:173)
`mentation of a RAID Prototype (May 1990), Rept, No.
`UCB/CSD 90/573.
`Chen, P., Gibson, G., Katz, R. H., Patterson, D. A., and
`Schulze, M.; Introduction to Redundant Arrays of Inexpen(cid:173)
`sive Disks (RAID), Compcon Spr. '89.
`Chen, P., Gibson, G., Katz, R. H., Patterson, D. A., and
`Schulze, M., et al. Evolution of the Raid 2 Architecture (Jun.
`12, 1990).
`
`A redundant array storage system that can be configured as
`a RAID 1, 3, 4, or 5 system, or any combination of these
`configurations. The invention includes a configuration data
`structure for addressing a redundant array storage system,
`and a method for configuring a redundant array storage
`system during an initialization process. The redundant array
`storage system includes a set of physical storage units which
`are accessible in terms of block numbers. The physical
`storage units are each configured as one or more logical
`storage units. Each logical storage unit is addressed in terms
`of a channel number, storage unit number, starting block
`number, offset number, and number of blocks to be trans(cid:173)
`ferred. Once logical storage units are defined, logical vol(cid:173)
`umes are defined as one or more logical storage units, each
`logical volume having a depth characteristic. After the
`logical volumes are defined, redundancy groups are defined
`as one or more logical volumes. A redundancy level is
`specified for each redundancy group. The redundancy level
`may be none, one, or two. Logical volumes are addressed by
`a host CPU by volume number, initial block number, and
`number of blocks to be transferred. The host CPU also
`specifies a READ or WRITE operation. The specified vol(cid:173)
`ume number, initial block number, and number of blocks to
`be transferred are then translated into a corresponding
`channel number, storage unit number, starting block number,
`offset number, and number of blocks to be transferred. With
`the present invention, it is possible for a logical volume to
`span across physical storage units ("vertical partitioning"),
`comprise only a portion of each such physical storage unit
`("horizontal partitioning"), and have definable depth and
`redundancy characteristics.
`
`16 Claims, 6 Drawing Sheets
`
`CPU
`
`INTEL-1010
`8,010,740
`
`

`

`5,519,844
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`4,667,362
`4,722,085
`4,754,397
`4,761,785
`4,768,193
`4,775,978
`4,817,035
`4,819,159
`4,849,929
`4,870,643
`4,899,342
`4,914,656
`4,942,579
`4,989,206
`4,993,030
`5,073,854
`5,088,081
`5,124,987
`5,129,088
`5,130,992
`5,134,619
`5,148,432
`5,193,184
`
`5/1987 Young et al ........................... 371/39.1
`1/1988 Flora et al. . ........................... 371/40.1
`6/1988 Varaiya et al ........................... 361/380
`8/1988 □ark et al ............................... 371/2.2
`8/1988 Takemae ................................ 371/10.3
`10/1988 Hartness ................................. 371/40.1
`3/1989 Timsit ..................................... 395/425
`4/1989 Shipley et al ........................ 371/9.1 X
`7/1989 Timsit ..................................... 395/575
`9/1989 Bultman et al ........................ 371/11.1
`2/1990 Potter et al ............................ 371/10.1
`4/1990 Dunphy, Jr. et al. .................. 371/10.2
`7/1990 Goodlander et al ................... 371/51.1
`1/1991 Dunphy, Jr. et al ................... 371/10.1
`2/1991 Krakauer et al ....................... 371/40.1
`12/1991 Martin et al ............................ 364/425
`2/1992 Farr ........................................... 369/54
`6/1992 Milligan et al ........................ 371/10.1
`7/1992 Auslander et al ...................... 395/700
`7/1992 Frey, Jr. et al ........................ 371/40.1
`7/1992 Hensen et al .......................... 371/40.1
`9/1992 Gordon et al .......................... 371/10.1
`3/1993 Belsan et al. ........................... 395/600
`
`5,249,279
`5,283,875
`
`9/1993 Schmenk et al. ....................... 395/425
`2/1994 Gibson et al. .......................... 395/400
`
`OTHER PUBLICATIONS
`
`Chen, P.,An Evaluation ofReduncantArrays of Disks Using
`an Amdahl 5890; (May 1989), Rept. No. UCB/CSD 89/506.
`Katz, R. H., Gibson, G. AS., and Patterson, D.A.,; Disk
`System Architectures for High Performance Computing
`(Mar. 1989), Rept. No. UCB/CSD 89/497.
`Gray, J., Horst, B., and Walker, M.: Parity Striping of Disc
`Arrays: Low-Cost Reliable Storage with Acceptable
`Throughput (Jan. 1990), Tandem Tech. Rept. 90.2.
`Schulze, M. E.; Considerations in the Design of a Raid
`Prototype (Aug. 1988), Rept. No. UCB/CSD 88/448.
`Clark, and Corrigan; IBM Systems Journal, vol. 23, No. 3,
`1989, "Application System/400 Performance Characteris(cid:173)
`tics", pp. 407-423.
`Meador, W., "Disk Array Systems", COMCON Spring '89,
`Feb. 27-Mar. 3, 1989, pp. 143-146.
`Schlecher, D. et al., "System Overview of the Application
`System/400", IBM Systems Journal, vol. 28, No. 3, 1989,
`pp. 360-375.
`
`

`

`U.S. Patent
`
`May 21, 1996
`
`Sheet 1 of 6
`
`5,519,844
`
`1
`
`CPU
`
`FIG. 1
`
`2
`
`3
`
`CONTROLLER
`
`4
`
`4
`
`S1
`
`S2
`
`S3
`
`S4
`
`S5
`
`S6
`
`

`

`SO
`
`S1
`
`S2
`
`PHYSICAL STORAGE UNITS
`S3
`S4
`S5
`S6
`S7
`S8
`
`S9 S1O S11
`
`LO
`X X X X
`X X X
`X X X X X
`X X X X X X X X X X X X
`L1
`X X
`X X X
`X
`X X
`X
`X
`co~ L2
`X
`X
`~ ~ L3
`X
`X
`X
`X X X X X X
`X
`X X
`~ ~ L4
`X X
`X
`X X
`X
`X X
`X
`X
`X
`X
`V> r L5 I X I X I X I X I X I X I X . I X. I X. I X. I X. I X. I FIG. 2A
`X X X
`X
`X
`X X
`X
`X
`X
`X
`X
`X X X X X
`X
`X X X
`X
`X
`L6
`X
`X
`X
`X X
`X
`X
`X
`X
`X
`L7
`X
`X
`X
`
`X = LOGICAL BLOCK LOCATION
`
`SO
`
`S1
`
`S2
`
`PHYSICAL STORAGE UNITS
`S3
`S4
`S5
`S6
`S7
`S8
`
`S9 S1O S11
`
`X X X X
`X
`X
`X
`X
`X
`X
`X
`X
`X X X X
`X X
`X
`X
`X
`X
`X
`X
`X X X X X X X X X X X
`X
`X X
`X
`X X X
`X
`X
`X X
`X
`X
`
`LO
`Ll
`L2
`co r L3
`rO
`0 G)
`~ ~ L4
`X X X
`X
`X
`X
`X
`X
`X
`X
`X
`X
`V> r L5 I X I X I X I X I X I X I X I X I X I X. I X. I ~ I FIG. 2B
`X
`X X
`X
`X
`X
`X
`X
`X
`X
`X
`X
`X X
`X
`X X
`X
`X
`X
`X
`X
`X
`X
`L6
`X X X
`X
`X
`X
`X
`X
`L7
`X
`X
`X
`X
`
`X = LOGICAL BLOCK LOCATION
`
`~ •
`00. •
`~
`Pl
`(D = '""'"
`
`~
`~
`N
`~
`"'
`~
`\C
`\C
`0-1
`
`00 =-
`m.
`N
`s,
`
`0-1
`
`Ul
`-..
`Ul
`i,,,..
`\C
`-..
`QC
`~
`~
`
`

`

`LOGICAL VOLUME #0
`
`LOGICAL VOLUME #1
`LOGICAL STORAGE UNITS
`so· St' S2' S3' S4' S5'
`S3
`S4
`S5
`
`so
`
`St
`
`S2
`
`LO
`Lt
`cob L2
`r
`(i) L3
`0
`....
`n n L4
`;:::,;:>
`V> r L5
`L6
`L7
`
`5
`4
`3
`0
`2
`1
`4
`3
`2
`1
`0
`5
`11
`10
`8
`9
`8
`11
`10
`7
`9
`6
`6
`7
`17
`16
`15
`14
`13
`12
`17
`16
`15
`14
`13
`12
`21 22 23
`19 20
`23 18
`21
`19 20
`18
`22
`29
`25 26 27 28
`29 24
`25 26
`27 28
`24
`32 33 34 35
`31
`33 34 35 30
`32
`31
`30
`36 37 38 39 40
`36 37 38 39 40
`41
`41
`44 45 46
`47
`43 44 45 46 47
`42 43
`42
`
`FIG. 2C
`
`LOGICAL VOLUME #0
`
`LOGICAL VOLUME #1
`
`so
`
`St
`
`S2
`
`LOGICAL STORAGE UNITS
`so' St' S2' S3' S4' S5'
`S3
`S4
`S5
`
`LO
`Lt
`ro b L2
`(i) L3
`r
`0
`....
`n n L4
`;:::,;:>
`w r LS
`L6
`L7
`
`8
`16
`12
`0
`4
`13
`17
`5
`1
`9
`10
`18
`14
`6
`2
`19
`15
`11
`7
`3
`24 28 32 36 40
`25 29 33 37
`41
`26 30 34 38 42
`35 39 43
`31
`27
`
`20
`21
`22
`23
`44
`45
`46
`47
`
`4
`3
`1
`2
`0
`10
`9
`8
`7
`6
`16
`15
`14
`12
`13
`22
`21
`20
`19
`18
`26 27 28
`25
`24
`32 33 34
`31
`30
`36 37 38 39 40
`45 46
`42
`44
`43
`
`5
`11
`17
`23
`29
`35
`41
`47
`
`FIG. 2D
`
`L'.l •
`00.
`•
`
`~ I
`
`~
`~
`N
`i,-.
`" i,-.
`\0
`\0 =-.
`
`r:J:J. =(cid:173)I'll
`~
`~ s,
`=-.
`
`Ul
`tll
`~
`\0
`,...
`00
`~
`~
`
`

`

`U.S. Patent
`
`May 21, 1996
`
`Sheet 4 of 6
`
`5,519,844
`
`LOGICAL VOLUME #0
`LOGICAL STORAGE UNITS
`
`SO
`p
`5
`10
`15
`20
`25
`p
`35
`
`S 1 S2
`
`S3
`
`S4
`
`S5
`
`0
`4
`2
`1
`3
`p
`9
`8
`7
`6
`p
`11
`12
`14
`13
`p
`16
`18
`19
`17
`24
`21 22 23
`p
`p
`26 27 28 29
`30
`31 32 33 34
`p
`36 37
`38 39
`
`LO
`L1
`CD r L2
`rO g ~ L3
`::,:::: n L4
`(/) ~
`L5
`L6
`L7
`
`FIG. 2E
`
`LOGICAL VOLUME #0
`
`LOGICAL STORAGE UNITS
`S 1
`S3
`S2
`S4
`S5
`0
`1
`3
`2
`6
`7
`5
`Q
`10
`11
`Q
`p
`p
`15
`Q
`14
`p
`18
`19
`Q
`21
`22 23
`p
`24 25 26 27
`29 30
`31
`Q
`
`Q
`p
`9
`13
`17
`20
`Q
`p
`
`SO
`
`p
`4
`8
`12
`16
`Q
`p
`28
`
`LO
`Lt
`~ b L2
`g ~ L3
`::,:::: n L4
`(/) ~
`L5
`L6
`L7
`
`FIG. 2F
`
`

`

`CHANNEL#
`CHANNEL#
`CHANNEL II
`CHANNEL#
`CHANNEL#
`CHANNEL#
`
`I REDUNDANCY GROUP f/0 I
`I LOGICAL VOLUME #0
`LOGICAL DISK /10
`LOGICAL DISK #1
`LOGICAL DISK #2
`LOGICAL DISK /13
`LOGICAL DISK #4
`LOGICAL DISK #5
`I LOGICAL VOLUME #1
`LOGICAL DISK #6
`CHANNEL#
`LOGICAL DISK #7
`CHANNEL#
`LOGICAL DISK #8
`CHANNEL#
`LOGICAL DISK #9
`CHANNEL#
`LOGICAL DISK 1110 CHANNEL#
`LOGICAL DISK #11 CHANNEL#
`
`STORAGE UNIT#
`STORAGE UNIT#
`STORAGE UNIT II
`STORAGE UNIT#
`STORAGE UNIT II
`STORAGE UNIT#
`
`STORAGE UNIT#
`STORAGE UNIT#
`STORAGE UNIT#
`STORAGE UNIT#
`STORAGE UNIT#
`STORAGE UNIT#
`
`FIG. 3A
`
`STARTING BLOCK#
`STARTING BLOCK#
`STARTING BLOCK#
`STARTING BLOCK#
`STARTING BLOCK I
`STARTING BLOCK#
`
`STARTING BLOCK#
`STARTING BLOCK#
`STARTING BLOCK#
`STARTING BLOCK#
`STARTING BLOCK#
`STARTING BLOCK#
`
`# OF BLOCKS
`II OF BLOCKS
`# OF BLOCKS
`II OF BLOCKS
`II OF BLOCKS
`# OF BLOCKS
`
`II OF BLOCKS
`# OF BLOCKS
`II OF BLOCKS
`II OF BLOCKS
`# OF BLOCKS
`# OF BLOCKS
`
`d •
`00
`•
`~
`
`~ ('0 = """"
`
`~
`~
`N
`!""
`....
`\C
`\C =-.
`
`00
`Q"'
`t'0
`~
`01
`s,
`=-.
`
`01
`-a
`01
`~
`\,C
`-a
`00
`.a;.
`.a;.
`
`

`

`CHANNEL#
`CHANNEL#
`CHANNEL#
`CHANNEL#
`CHANNEL fl
`CHANNEL#
`
`I REDUNDANCY GROUP #0 I
`1 LOGICAL VOLUME #0
`LOGICAL DISK #0
`LOGICAL DISK #1
`LOGICAL DISK #2
`LOGICAL DISK #3
`LOGICAL DISK #4
`LOGICAL DISK #5
`r REDUNDANCY GROUP #1 1
`f LOGICAL VOLUME #1
`LOGICAL DISK #6
`CHANNEL#
`LOGICAL DISK #7
`CHANNEL#
`LOGICAL DISK #8
`CHANNEL#
`LOGICAL DISK #9
`CHANNEL#
`LOGICAL DISK #10 CHANNEL#
`LOGICAL DISK #11 CHANNEL#
`
`STORAGE UNIT II
`STORAGE UNIT#
`STORAGE UNIT#
`STORAGE UNIT#
`STORAGE UNIT#
`STORAGE UNIT#
`
`STARTING BLOCK#
`STARTING BLOCK#
`STARTING BLOCK#
`STARTING BLOCK#
`STARTING BLOCK#
`STARTING BLOCK#
`
`# OF BLOCKS
`II OF BLOCKS
`# OF BLOCKS
`II OF BLOCKS
`# OF BLOCKS
`# OF BLOCKS
`
`STORAGE UNIT#
`STORAGE UNIT#
`STORAGE UNIT#
`STORAGE UNIT#
`STORAGE UNIT#
`STORAGE UNIT#
`
`FIG. 3B
`
`STARTING BLOCK#
`STARTING BLOCK#
`STARTING BLOCK#
`STARTING BLOCK#
`STARTING BLOCK#
`STARTING BLOCK#
`
`# OF BLOCKS
`# OF BLOCKS
`# OF BLOCKS
`# OF BLOCKS
`# OF BLOCKS
`# OF BLOCKS
`
`e •
`
`00
`•
`
`~ ea.
`6§
`
`~
`
`s N
`
`,:-'
`~
`\C
`\C
`0\
`
`11.l =(cid:173)!'ti
`
`~
`0\
`s,
`
`0\
`
`Ul
`"' Ul
`~
`\0
`"' QC)
`~
`~
`
`

`

`5,519,844
`
`1
`LOGICAL PARTITIONING OF A
`REDUNDANT ARRAY STORAGE SYSTEM
`
`This is a continuation of application Ser. No. 07/612,220,
`filed on Nov. 9, 1990, now abandoned.
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`This invention relates to computer system data storage,
`and more particularly to a redundant array storage system
`that can be configured as a RAID 1, 3, 4, or 5 system, or any
`combination of these configurations.
`2. Description of Related Art
`A typical data processing system generally involves one
`or more storage units which are connected to a Central
`Processor Unit (CPU) either directly or through a control
`unit and a channel. The function of the storage units is to
`store data and programs which the CPU uses in performing
`particular data processing tasks.
`Various type of storage units are used in current data
`processing systems. A typical system may include one or
`more large capacity tape units and/or disk drives (magnetic,
`optical, or semiconductor) connected to the system through
`respective control units for storing data.
`However, a problem exists if one of the large capacity
`storage units fails such that information contained in that
`unit is no longer available to the system. Generally, such a
`failure will shut down the entire computer system.
`The prior art has suggested several ways of solving the
`problem of providing reliable data storage. In systems where
`records are relatively small, it is possible to use error
`correcting codes which generate ECC syndrome bits that are
`appended to each data record within a storage unit. With 35
`such codes, it is possible to correct a small amount of data
`that may be read erroneously. However, such codes are
`generally not suitable for correcting or recreating long
`records which are in error, and provide no remedy at all if
`a complete storage unit fails. Therefore, a need exists for 40
`providing data reliability external to individual storage units.
`Other approaches to such "external" reliability have been
`described in the art. A research group at the University of
`California, Berkeley, in a paper entitled "A Case for Redun- 45
`dantArrays of Inexpensive Disks (RAID)", Patterson, et al.,
`Proc. ACM SIGMOD, June 1988, has catalogued a number
`of different approaches for providing such reliability when
`using disk drives as storage units. Arrays of disk drives are
`characterized in one of five architectures, under the acronym
`"RAID" (for Redundant Arrays of Inexpensive Disks).
`A RAID 1 architecture involves providing a duplicate set
`of "mirror" storage units and keeping a duplicate copy of all
`data on each pair of storage units. While such a solution
`solves the reliability problem, it doubles the cost of storage.
`A number of implementations of RAID 1 architectures have
`been made, in particular by Tandem Corporation.
`A RAID 2 architecture stores each bit of each word of
`data, plus Error Detection and Correction (EDC) bits for
`each word, on separate disk drives (this is also known as "bit
`stripping"). For example, U.S. Pat. No. 4,722,085 to Flora et
`al. discloses a disk drive memory using a plurality of
`relatively small, independently operating disk subsystems to
`function as a large, high capacity disk drive having an
`unusually high fault tolerance and a very high data transfer
`bandwidth. A data organizer adds 7 EDC bits (determined
`using the well-known Hamming code) to each 32-bit data
`
`5
`
`2
`word to provide error detection and error correction capa(cid:173)
`bility. The resultant 39-bit word is written, one bit per disk
`drive, on to 39 disk drives. If one of the 39 disk drives fails,
`the remaining 38 bits of each stored 39-bit word can be used
`to reconstruct each 32-bit data word on a word-by-word
`basis as each data word is read from the disk drives, thereby
`obtaining fault tolerance.
`An obvious drawback of such a system is the large
`number of disk drives required for a minimum system (since
`10 most large computers use a 32-bit word), and the relatively
`high ratio of drives required to store the EDC bits (7 drives
`out of 39). A further limitation of a RAID 2 disk drive
`memory system is that the individual disk actuators are
`operated in unison to write each data block, the bits of which
`
`15 :a: ~s:::t~~t~vi~e~f ~~d~~t:f~~~:c~:Jr~:~:
`
`30
`
`50
`
`disk transfers part of a block of data, the net effect being that
`the entire block is available to the computer system much
`faster than if a single drive were accessing the block. This is
`advantageous for large data blocks. However, this arrange-
`20 ment also effectively provides only a single read/write head
`actuator for the entire storage unit. This adversely affects the
`random access performance of the drive array when data
`files are small, since only one data file at a time can be
`accessed by the "single" actuator. Thus, RAID 2 systems are
`25 generally not considered to be suitable for computer systems
`designed for On-Line Transaction Processing (OLTP), such
`as in banking, financial, and reservation systems, where a
`large number of random accesses to many small data files
`comprises the bulk of data storage and transfer operations.
`A RAID 3 architecture is based on the concept that each
`disk drive storage unit has internal means for detecting a
`fault or data error. Therefore, it is not necessary to store extra
`information to detect the location of an error; a simpler form
`of parity-based error correction can thus be used. In this
`approach, the contents of all storage units subject to failure
`are "Exclusive OR'd" (XOR'd) to generate parity informa-
`tion. The resulting parity information is stored in a single
`redundant storage unit. If a storage unit fails, the data on that
`unit can be reconstructed on to a replacement storage unit by
`XOR'ing the data from the remaining storage units with the
`parity information. Such an arrangement has the advantage
`over the mirrored disk RAID 1 architecture in that only one
`additional storage unit is required for "N" storage units. A
`further aspect of the RAID 3 architecture is that the disk
`drives are operated in a coupled manner, similar to a RAID
`2 system, and a single disk drive is designated as the parity
`unit.
`One implementation of a RAID 3 architecture is the
`Micropolis Corporation Parallel Drive Array, Model 1804
`SCSI, that uses four parallel, synchronized disk drives and
`one redundant parity drive. The failure of one of the four
`data disk drives can be remedied by the use of the parity bits
`stored on the parity disk drive. Another example of a RAID
`55 3 system is described in U.S. Pat. No. 4,092,732 to Ouchi.
`A RAID 3 disk drive memory system has a much lower
`ratio of redundancy units to data units than a RAID 2 system.
`However, a RAID 3 system has the same performance
`limitation as a RAID 2 system, in that the individual disk
`60 actuators are coupled, operating in unison. This adversely
`affects the random access performance of the drive array
`when data files are small, since only one data file at a time
`can be accessed by the "single" actuator. Thus, RAID 3
`systems are generally not considered to be suitable for
`65 computer systems designed for OLTP purposes.
`A RAID 4 architecture uses the same parity error correc(cid:173)
`tion concept of the RAID 3 architecture, but improves on the
`
`

`

`5,519,844
`
`10
`
`4
`be rewritten to the disk drives. While the two Read opera(cid:173)
`tions may be done in parallel, as can the two Write opera(cid:173)
`tions, modification of a block of data in a RAID 4 or a RAID
`5 system still takes substantially longer then the same
`operation on a conventional disk. A conventional disk does
`not require the preliminary Read operation, and thus does
`have to wait for the disk drives to rotate back to the previous
`position in order to perform the Write operation. The rota(cid:173)
`tional latency time alone can amount to about 50% of the
`time required for a typical data modification operation.
`Further, two disk storage units are involved for the duration
`of each data modification operation, limiting the throughput
`of the system as a whole. Despite the Write performance
`penalty, RAID 5 type systems have become increasingly
`15 popular, since they provide high data reliability with a low
`overhead cost for redundancy, good Read performance, and
`fair Write performance.
`Although different RAID systems have been designed, to
`date, such systems are rather inflexible, in that only one type
`20 of redundant configuration is implemented in each design.
`Thus, for example, redundant array storage systems have
`generally been designed to be only a RAID 3 or only a RAID
`5 system. When the principal use of a redundant array
`storage system is known in advance, such rigidity of design
`25 may not pose a problem. However, uses of a storage system
`can vary over time. Indeed, a user may have need for
`different types of RAID systems at the same time, but not
`have the resources to acquire multiple storage systems to
`meet those needs. As importantly, different users have dif-
`30 ferent needs; designing redundant array storage systems
`with different RAID configurations to meet such disparate
`needs is expensive.
`It thus would be highly desirable to have a flexible
`RAID-architecture storage system in which the basic redun-
`35 dancy configuration could be altered for each user, or as a
`user's needs change. It would also be desirable to have a
`flexible RAID-architecture storage system in which different
`types of redundancy configuration can be simultaneously
`implemented.
`The present invention provides such a system.
`
`40
`
`3
`performance of a RAID 3 system with respect to random
`reading of small files by "uncoupling" the operation of the
`individual disk drive actuators, and reading and writing a
`larger minimum amount of data (typically, a disk sector) to
`each disk (this is also known as block stripping). A further 5
`aspect of the RAID 4 architecture is that a single storage unit
`is designated as the parity unit.
`A limitation of a RAID 4 system is that Writing a data
`block on any of the independently operating data storage
`units also requires writing a new parity block on the parity
`unit. The parity information stored on the parity unit must be
`read and XOR'd with the old data (to "remove" the infor(cid:173)
`mation content of the old data), and the resulting sum must
`then be XOR'd with the new data (to provide new parity
`information). Both the data and the parity records then must
`be rewritten to the disk drives. This process is commonly
`referred to as a "Read-Modify-Write" sequence.
`Thus, a Read and a Write on the single parity unit occurs
`each time a record is changed on any of the data storage units
`covered by the parity record on the parity unit. The parity
`unit becomes a bottle-neck to data writing operations since
`the number of changes to records which can be made per
`unit of time is a function of the access rate of the parity unit,
`as opposed to the faster access rate provided by parallel
`operation of the multiple data storage units. Because of this
`limitation, a RAID 4 system is generally not considered to
`be suitable for computer systems designed for OLTP pur(cid:173)
`poses. Indeed, it appears that a RAID 4 system has not been
`implemented for any commercial purpose.
`A RAID 5 architecture uses the same parity error correc(cid:173)
`tion concept of the RAID 4 architecture and independent
`actuators, but improves on the writing performance of a
`RAID 4 system by distributing the data and parity informa(cid:173)
`tion across all of the available disk drives. Typically, "N+ l"
`storage units in a set (also known as a "redundancy group")
`are divided into a plurality of equally sized address areas
`referred to as blocks. Each storage unit generally contains
`the same number of blocks. Blocks from each storage unit
`in a redundancy group having the same unit address ranges
`are referred to as "stripes". Each stripe has N blocks of data,
`plus one parity block on one storage unit containing parity
`for the remainder of the stripe. Further stripes each have a
`parity block, the parity blocks being distributed on different
`storage units. Parity updating activity associated with every
`modification of data in a redundancy group is therefore 45
`distributed over the different storage units. No single unit is
`burdened with all of the parity update activity.
`For example, in a RAID 5 system comprising 5 disk
`drives, the parity information for the first stripe of blocks
`may be written to the fifth drive; the parity information for
`the second stripe of blocks may be written to the fourth
`drive; the parity information for the third stripe of blocks
`may be written to the third drive; etc. The parity block for
`succeeding stripes typically "precesses" around the disk
`drives in a helical pattern (although other patterns may be
`used).
`Thus, no single disk drive is used for storing the parity
`information, and the bottleneck of the RAID 4 architecture
`is eliminated. An example of a RAID 5 system is described
`in U.S. Pat. No. 4,761,785 to Clark et al.
`As in a RAID 4 system, a limitation of a RAID 5 system
`is that a change in a data block requires a Read-Modify(cid:173)
`Write sequence comprising two Read and two Write opera(cid:173)
`tions: the old parity block and old data block must be read 65
`and XOR' d, and the resulting sum must then be XOR' d with
`the new data. Both the data and the parity blocks then must
`
`60
`
`SUMMARY OF THE INVENTION
`
`55
`
`The RAID architecture of the present invention is
`extremely flexible, and permits a redundant array storage
`system to be configured as a RAID 1,3, 4, or 5 system, or any
`combination of these configurations. The invention com(cid:173)
`prises a configuration data structure for addressing a redun(cid:173)
`dant array storage system, and a method for configuring a
`50 redundant array storage system during an initialization pro-
`cess.
`The redundant array storage system comprises a set of
`physical storage units which are accessible in terms of block
`numbers (a block comprises one or more sectors). As part of
`the initialization process, the physical storage units are each
`configured as one or more logical storage units. Each logical
`storage unit is addressed in terms of a channel number,
`storage unit number, starting block number, and offset
`number (the number of blocks to be transferred is also
`specified when doing transfers).
`Once logical storage units are defined, logical volumes
`are defined as one or more logical storage units, each logical
`volume having a depth characteristic.
`After the logical volumes are defined, redundancy groups
`are defined as one or more logical volumes. In the present
`invention, a redundancy level is specified for each redun-
`
`

`

`5,519,844
`
`6
`Physical Storage Units
`
`5
`dancy group. The redundancy level may be none, one (e.g.,
`XOR parity or an error-correction code, such as a Reed(cid:173)
`Solomon code), or two (e.g., XOR parity plus a Reed(cid:173)
`Solomon error-correction code).
`Alternatively, redundancy groups are defined as one or 5
`more logical storage units, and logical volumes are defined
`as a member of a redundancy group.
`Logical volumes are addressed by a host CPU by volume
`number, initial block number, and number of blocks to be
`transferred. The host CPU also specifies a READ or WRITE
`operation. The specified volume number, initial block num(cid:173)
`ber, and number of blocks to be transferred are then trans(cid:173)
`lated into a corresponding channel number, storage unit
`number, starting block number, offset number, and number
`of blocks to be transferred.
`With the present invention, it is possible for a logical
`volume to span across physical storage units ("vertical
`partitioning"), comprise only a portion of each such physical
`storage unit ("horizontal partitioning"), and have definable 20
`depth and redundancy characteristics.
`The details of the preferred embodiment of the present
`invention are set forth in the accompanying drawings and
`the description below. Once the details of the invention are
`known, numerous additional innovations and changes will
`become obvious to one skilled in the art.
`
`10
`
`25
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`A typical physical storage unit, such as a magnetic or
`optical disk drive, comprises a set of one or more rotating
`disks each having at least one read/write transducer head per
`surface. Data storage areas known as tracks are concentri(cid:173)
`cally arranged on the disk surfaces. A disk storage unit may
`have, for example, 500 to 2000 tracks per disk surface. Each
`track is divided into numbered sectors that are commonly
`512 bytes in size. Sectors are the smallest unit of storage area
`that can be accessed by the storage unit (data bits within a
`sector may be individually altered, but only by reading an
`entire sector, modifying selected bits, and writing the entire
`sector back into place). A disk storage unit may have 8 to 50
`sectors per track, and groups of tracks may have differing
`15 numbers of sectors per track on the same disk storage unit
`(e.g., smaller circumference inner tracks may have fewer
`sectors per track, while larger circumference outer tracks
`may have more sectors per track).
`Access to a sector ultimately requires identification of a
`sector by its axial displacement along the set of rotating
`disks, radial displacement on a disk, and circumferential
`displacement around a disk. Two common schemes are used
`for such identification. One scheme identifies a sector by a
`surface or head number (axial displacement), a track number
`(radial displacement), and a sector number (circumferential
`displacement). The second scheme treats all of the tracks
`with the same radius on all disks as a "cylinder", with tracks
`being subsets of a cylinder rather than of a surface. In this
`scheme, a sector is identified by a cylinder number (radial
`30 displacement), a track number (axial displacement), and a
`sector number (circumferential displacement). The present
`invention can be implemented using either form of physical
`identification.
`It is possible for a higher level storage controller ( or even
`the CPU) to keep track of the location of data on a storage
`unit by tracking all involved sectors. This is commonly done
`with magnetic disk drives following the well-known ST-506
`interface standard used in personal computers. Storage units
`addressed in this manner are known as sector-addressable.
`However, it is inconvenient in modem computer systems
`for a high-level storage controller to keep track of sector
`addresses by either of the addressing schemes described
`above. Therefore, in the preferred embodiment of the inven-
`tion, an alternative form of storage unit addressing is used
`that maps the sectors of a storage unit to a more tractable
`form.
`This mapping is accomplished by treating one or more
`sectors as a block, as is known in the art, and addressing each
`50 storage unit by block numbers. A block on the storage units
`used in the preferred embodiment of the inventive system
`can vary from 512 bytes up to 4096 bytes, but may be of any
`size (although commonly block sizes are limited to multiples
`of two bytes, for ease of implementation). The storage units
`55 being used must support the specified block size. In addition,
`such storage units mark defective sectors in such a way that
`they are not used to form blocks. (Some storage units can
`also dynamically "map out" defective blocks during opera(cid:173)
`tion in order to always present to external devices a set of
`60 contiguously numbered blocks). Each storage unit is then
`considered by a higher level controller to be a "perfect"
`physical device comprising a set of contiguously numbered
`logical blocks. Such units are known as block-addressable.
`For example, with storage units having a Small Computer
`System Interface ("SCSI"), each storage unit is considered
`to be a contiguous set of blocks. An access request to such
`a unit simply specifies the numbers of the blocks that are to
`
`35
`
`FIG. 1 is block diagram of a generalized RAID system in
`accordance with the present invention.
`FIG. 2A is a diagram of a model RAID system, showing
`a typical physical organization.
`FIG. 2B is a diagram of a model RAID system, showing
`a logical organization of the physical array of FIG. 2A, in
`which each physical storage unit is configured as two logical
`storage units.
`FIG. 2C is a diagram of a model RAID system

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket