`
`RAID: High-Performance, Reliable Secondary Storage
`
`Peter M. Chen
`Computer Science and Engineering Division
`Department of Electrical Engineering and Computer Science
`1301 Beal Avenue
`University of Michigan
`Ann Arbor, MI 48109-2122
`
`Edward K. Lee
`DEC Systems Research Center
`130 Lytton Avenue
`Palo Alto, CA 94301-1044
`
`Garth A. Gibson
`School of Computer Science
`Carnegie Mellon University
`5000 Forbes Avenue
`Pittsburgh, PA 15213-3891
`
`Randy H. Katz
`Computer Science Division
`Department of Electrical Engineering and Computer Science
`571 Evans Hall
`University of California
`Berkeley, CA 94720
`
`David A. Patterson
`Computer Science Division
`Department of Electrical Engineering and Computer Science
`571 Evans Hall
`University of California
`Berkeley, CA 94720
`
`Abstract: Disk arrays were proposed in the 1980s as a way to use parallelism between
`multiple disks to improve aggregate I/O performance. Today they appear in the product
`lines of most major computer manufacturers. This paper gives a comprehensive over-
`view of disk arrays and provides a framework in which to organize current and
`future work. The paper first introduces disk technology and reviews the driving forces
`that have popularized disk arrays: performance and reliability. It then discusses the two
`architectural techniques used in disk arrays: striping across multiple disks to improve per-
`formance and redundancy to improve reliability. Next, the paper describes seven disk
`array architectures, called RAID (Redundant Arrays of Inexpensive Disks) levels 0-6 and
`compares their performance, cost, and reliability. It goes on to discuss advanced research
`and implementation topics such as refining the basic RAID levels to improve performance
`and designing algorithms to maintain data consistency. Last, the paper describes five disk
`array prototypes or products and discusses future opportunities for research. The paper
`includes an annotated bibliography of disk array-related literature.
`
`Content indicators: disk array, RAID, parallel I/O, storage, striping, redundancy
`
`DHPN-1011
`Dell Inc. vs. Electronics and Telecommunications, IPR2013-00635
`Page 1 of 65
`
`
`
`1
`2
`
`3
`
`4
`
`5
`
`6
`
`3.3
`
`3.4
`
`INTRODUCTION ...................................................................................................1
`BACKGROUND .....................................................................................................3
`2.1
`Disk Terminology ................................................................................................................3
`2.2
`Data Paths ............................................................................................................................5
`2.3
`Technology Trends...............................................................................................................7
`DISK ARRAY BASICS...........................................................................................8
`3.1
`Data Striping and Redundancy ............................................................................................8
`3.2
`Basic RAID Organizations ..................................................................................................9
`3.2.1
`Non-Redundant (RAID Level 0) .........................................................................10
`3.2.2 Mirrored (RAID Level 1) ....................................................................................10
`3.2.3 Memory-Style ECC (RAID Level 2) ..................................................................12
`3.2.4
`Bit-Interleaved Parity (RAID Level 3)................................................................12
`3.2.5
`Block-Interleaved Parity (RAID Level 4) ...........................................................13
`3.2.6
`Block-Interleaved Distributed-Parity (RAID Level 5)........................................13
`3.2.7
`P+Q Redundancy (RAID Level 6) ......................................................................14
`Performance and Cost Comparisons..................................................................................15
`3.3.1
`Ground Rules and Observations..........................................................................15
`3.3.2
`Comparisons........................................................................................................17
`Reliability...........................................................................................................................19
`3.4.1
`Basic Reliability ..................................................................................................19
`3.4.2
`System Crashes and Parity Inconsistency ...........................................................21
`3.4.3
`Uncorrectable Bit-Errors .....................................................................................22
`3.4.4
`Correlated Disk Failures......................................................................................23
`3.4.5
`Reliability Revisited ............................................................................................24
`3.4.6
`Summary and Conclusions..................................................................................27
`Implementation Considerations .........................................................................................27
`3.5.1
`Avoiding Stale Data.............................................................................................28
`3.5.2
`Regenerating Parity after a System Crash...........................................................29
`3.5.3
`Operating with a Failed Disk...............................................................................30
`3.5.4
`Orthogonal RAID ................................................................................................31
`ADVANCED TOPICS...........................................................................................32
`4.1
`Improving Small Write Performance for RAID Level 5 ...................................................32
`4.1.1
`Buffering and Caching ........................................................................................32
`4.1.2
`Floating Parity .....................................................................................................34
`4.1.3
`Parity Logging.....................................................................................................34
`4.2
`Declustered Parity..............................................................................................................35
`4.3
`Exploiting On-Line Spare Disks........................................................................................38
`4.4
`Data Striping in Disk Arrays .............................................................................................40
`4.5
`Performance and Reliability Modeling..............................................................................42
`CASE STUDIES....................................................................................................44
`5.1
`Thinking Machines Corporation ScaleArray.....................................................................45
`5.2
`StorageTek Iceberg 9200 Disk Array Subsystem ..............................................................46
`5.3
`TickerTAIP/DataMesh .......................................................................................................47
`5.4
`The RAID-II Storage Server..............................................................................................49
`5.5
`IBM Hagar Disk Array Controller.....................................................................................50
`OPPORTUNITIES FOR FUTURE RESEARCH..................................................50
`6.1
`Experience with Disk Arrays.............................................................................................51
`6.2
`Interaction among New Technologies ...............................................................................51
`
`3.5
`
`October 29, 1993
`
`RAID: High-Performance, Reliable Secondary Storage
`
`i
`
`DHPN-1011 / Page 2 of 65
`
`
`
`6.3
`Scalability, Massively Parallel Computers, and Small Disks ............................................52
`6.4
`Latency...............................................................................................................................52
`CONCLUSIONS ...................................................................................................53
`ACKNOWLEDGEMENTS...................................................................................53
`ANNOTATED BIBLIOGRAPHY.........................................................................53
`
`7
`8
`9
`
`October 29, 1993
`
`RAID: High-Performance, Reliable Secondary Storage
`
`ii
`
`DHPN-1011 / Page 3 of 65
`
`
`
`1 INTRODUCTION
`
`In recent years, interest in RAID, Redundant Arrays of Inexpensive Disks1, has grown explo-
`
`sively. The driving force behind this phenomenon is the sustained exponential improvements in
`
`the performance and density of semiconductor technology. Improvements in semiconductor tech-
`
`nology make possible faster microprocessors and larger primary memory systems which in turn
`
`require larger, higher-performance secondary storage systems. More specifically, these improve-
`
`ments on secondary storage systems have both quantitative and qualitative consequences.
`
`On the quantitative side, Amdahl’s Law [Amdahl67] predicts that large improvements in
`
`microprocessors will result in only marginal improvements in overall system performance unless
`
`accompanied by corresponding improvements in secondary storage systems. Unfortunately, while
`
`RISC microprocessor performance has been improving 50% or more per year [Patterson94, pg.
`
`27], disk access times, which depend on improvements of mechanical systems, have been improv-
`
`ing less than 10% per year. Disk transfer rates, which track improvements in both mechanical sys-
`
`tems and magnetic media densities, have improved at the faster rate of approximately 20% per
`
`year. Assuming that semiconductor and disk technologies continue their current trends, we must
`
`conclude that the performance gap between microprocessors and magnetic disks will continue to
`
`widen.
`
`In addition to the quantitative effect, a second, perhaps more important, qualitative effect is
`
`driving the need for higher-performance secondary storage systems. As microprocessors become
`
`faster, they make possible new applications and greatly expand the scope of existing applications.
`
`In particular, applications such as video, hypertext and multi-media are becoming common. Even
`
`in existing application areas such as computer-aided design and scientific computing, faster micro-
`
`processors make it possible to tackle new problems requiring larger datasets. This shift in applica-
`
`tions along with a trend toward large, shared, high-performance, network-based storage systems is
`
`causing us to reevaluate the way we design and use secondary storage systems.
`
`1. Because of the restrictiveness of “Inexpensive”, RAID is sometimes said to stand for “Redundant Arrays
`of Independent Disks”.
`
`October 29, 1993
`
`1
`
`DHPN-1011 / Page 4 of 65
`
`
`
`Disk arrays, which organize multiple independent disks into a large, high-performance logical
`
`disk, are a natural solution to the problem. Disk arrays stripe data across multiple disks and access-
`
`ing them in parallel to achieve both higher data transfer rates on large data accesses and higher I/O
`
`rates on small data accesses. Data striping also results in uniform load balancing across all of the
`
`disks, eliminating hot spots that otherwise saturate a small number of disks while the majority of
`
`disks sit idle.
`
`Large disk arrays, however, are highly vulnerable to disk failures; a disk array with a hundred
`
`disks is a hundred times more likely to fail than a single disk. An MTTF (mean-time-to-failure) of
`
`200,000 hours, or approximately twenty-three years, for a single disk implies an MTTF of 2000
`
`hours, or approximately three months, for a disk array with a hundred disks. The obvious solution
`
`is to employ redundancy in the form of error-correcting codes to tolerate disk failures. This allows
`
`a redundant disk array to avoid losing data for much longer than an unprotected single disk.
`
`Redundancy, however, has negative consequences. Since all write operations must update the
`
`redundant information, the performance of writes in redundant disk arrays can be significantly
`
`worse than the performance of writes in non-redundant disk arrays. Also, keeping the redundant
`
`information consistent in the face of concurrent I/O operations and system crashes can be difficult.
`
`A number of different data striping and redundancy schemes have been developed. The com-
`
`binations and arrangements of these schemes lead to a bewildering set of options for users and
`
`designers of disk arrays. Each option presents subtle tradeoffs between reliability, performance
`
`and cost that are difficult to evaluate without understanding the alternatives. To address this prob-
`
`lem, this paper presents a systematic tutorial and survey of disk arrays. We describe seven basic
`
`disk-array organizations along with their advantages and disadvantages and compare their reliabil-
`
`ity, performance and cost. We draw attention to the general principles governing the design and
`
`configuration of disk arrays as well as practical issues that must be addressed in the implementa-
`
`tion of disk arrays. A later section of the paper describes optimizations and variations to the seven
`
`basic disk-array organizations. Finally, we discuss existing research in the modeling of disk arrays
`
`and fruitful avenues for future research. This paper should be of value to anyone interested in disk
`
`arrays, including students, researchers, designers and users of disk arrays.
`
`October 29, 1993
`
`2
`
`DHPN-1011 / Page 5 of 65
`
`
`
`2 BACKGROUND
`
`This section provides basic background material on disks, I/O datapaths, and disk technology
`
`trends for readers who are unfamiliar with secondary storage systems.
`
`2.1 Disk Terminology
`
`Figure 1 illustrates the basic components of a simplified magnetic disk drive. A disk princi-
`
`pally consists of a set of platters coated with a magnetic medium rotating at a constant angular
`
`velocity and a set of disk arms with magnetic read/write heads which are moved radially across the
`
`platters’ surfaces by an actuator. Once the heads are correctly positioned, data is read and written
`
`in small arcs called sectors on the platters’ surfaces as the platters rotate relative to the heads.
`
`Although all heads are moved collectively, in almost every disk drive, only a single head can read
`
`or write data at any given time. A complete circular swath of data is referred to as a track and each
`
`platter’s surface consists of concentric rings of tracks. A vertical collection of tracks at the same
`
`radial position is logically referred to as a cylinder. Sectors are numbered so that a sequential scan
`
`of all sectors traverses the entire disk in the minimal possible time.
`
`Given the simplified disk described above, disk service times can be broken into three pri-
`
`mary components: seek time, rotational latency, and data transfer time. Seek time is the amount of
`Inner Track
`Sector
`Outer Track
`
`Head
`
`Arm
`
`Platter
`
`Actuator
`
`Figure 1: Disk Terminology. Heads reside on arms which are positioned by actuators. Tracks
`are concentric rings on a platter. A sector is the basic unit of reads and writes. A cylinder is a
`stack of tracks at one actuator position. An HDA (head-disk assembly) is everything in the
`figure plus the airtight casing. In some devices it is possible to transfer data from multiple
`surfaces simultaneously, but this is both rare and expensive. The collection of heads that
`participate in a single logical transfer that is spread over multiple surfaces is called a head
`group.
`
`October 29, 1993
`
`3
`
`DHPN-1011 / Page 6 of 65
`
`
`
`time needed to move a head to the correct radial position and typically ranges from one to thirty
`
`milliseconds depending on the seek distance and the particular disk. Rotational latency is the
`
`amount of time needed for the desired sector to rotate under the disk head. A full rotation time for
`
`disks currently vary from eight to twenty-eight milliseconds. The data transfer time is dependent
`
`on the rate at which data can be transferred to/from a platter’s surface and is a function of the plat-
`
`ter’s rate of rotation, the density of the magnetic media, and the radial distance of the head from
`
`the center of the platter—some disks use a technique called zone-bit-recording to store more data
`
`on the longer outside tracks than the shorter inside tracks. Typical data transfer rates range from
`
`one to five megabytes per second. The seek time and rotational latency are sometimes collectively
`
`referred to as the head positioning time. Table 1 tabulates the statistics for a typical high-end disk
`
`available in 1993.
`
`The slow head positioning time and fast data transfer rate of disks lead to very different per-
`
`formance for a sequence of accesses depending on the size and relative location of each access.
`
`Suppose we need to transfer 1 MB from the disk in Table 1, and the data is laid out in two ways:
`
`sequential within a single cylinder or randomly placed in 8 KB blocks. In either case the time for
`
`Form Factor/Disk Diameter
`Capacity
`Cylinders
`Tracks Per Cylinder
`Sectors Per Track
`Bytes Per Sector
`Full Rotation Time
`Minimum Seek
`(single cylinder)
`Average Seek
`(random cylinder to cylinder)
`Maximum Seek
`(full stroke seek)
`Data Transfer Rate
`
`5.25 inch
`2.8 GB
`2627
`21
`~99
`512
`11.1 ms
`
`1.7 ms
`
`11.0 ms
`
`22.5 ms
`» 4.6 MB/s
`
`Table 1: Specifications for the Seagate ST43401N Elite-3 SCSI Disk Drive. Average seek in this table
`is calculated assuming a uniform distribution of accesses. This is the standard way manufacturers report
`average seek times. In reality, measurements of production systems show that spatial locality significantly
`lowers the effective average seek distance [Hennessy90, pg. 559].
`
`October 29, 1993
`
`4
`
`DHPN-1011 / Page 7 of 65
`
`
`
`the actual data transfer of 1 MB is about 200 ms. But the time for positioning the head goes from
`
`about 16 ms in the sequential layout to about 2000 ms in the random layout. This sensitivity to the
`
`workload is why applications are categorized as high data rate, meaning minimal head positioning
`
`via large, sequential accesses, or high I/O rate, meaning lots of head positioning via small, more
`
`random accesses.
`
`2.2 Data Paths
`
`A hierarchy of industry standard interfaces has been defined for transferring data recorded on
`
`a disk platter’s surface to or from a host computer. In this section we review the complete datapath,
`
`from the disk to a users’s application (Figure 2). We assume a read operation for the purposes of
`
`this discussion.
`
`On the disk platter’s surface, information is represented as reversals in the direction of stored
`
`magnetic fields. These “flux reversals” are sensed, amplified, and digitized into pulses by the low-
`
`est-level read electronics. The protocol ST506/412 is one standard that defines an interface to disk
`
`systems at this lowest, most inflexible, and technology-dependent level. Above this level of the
`
`read electronics path, pulses are decoded to separate data bits from timing-related flux reversals.
`
`The bit-level ESDI and SMD standards define an interface at this more flexible, encoding-indepen-
`
`dent level. Below the higher, most-flexible packet-level, these bits are aligned into bytes, error cor-
`
`recting codes applied, and the extracted data delivered to the host as data blocks over a peripheral
`
`bus interface such as SCSI (Small Computer Standard Interface), or IPI-3 (the third level of the
`
`Intelligent Peripheral Interface). These steps are performed today by intelligent on-disk control-
`
`lers, which often include speed matching and caching “track buffers”. SCSI and IPI-3 also include
`
`a level of data mapping: the computer specifies a logical block number and the controller embed-
`
`ded on the disk maps that block number to a physical cylinder, track, and sector. This mapping
`
`allows the embedded disk controller to avoid bad areas of the disk by remapping those logical
`
`blocks that are affected to new areas of the disk.
`
`The topology and devices on the data path between disk and host computer varies widely
`
`depending on the size and type of I/O system. Mainframes have the richest I/O systems, with many
`
`October 29, 1993
`
`5
`
`DHPN-1011 / Page 8 of 65
`
`
`
`devices and complex interconnection schemes to access them. An IBM channel path, the set of
`
`cables and associated electronics that transfer data and control information between an I/O device
`
`and main memory, consists of a channel, a storage director, and a head of string. The collection of
`
`disks that share the same pathway to the head of string is called a string. In the workstation/file
`
`CPU
`
`DMA
`
`I/O Controller
`or Host-Bus Adaptor
`or Channel Processor
`
`String
`
`Disk Controller/
`Storage Director
`
`& Track Buffers
`
`Formatter
`
`Clocking
`
`IPI-3, SCSI-1, SCSI-2, DEC CI/MSCP
`
`IPI-2, SCSI-1, DEC SDI,
`IBM Channel Path (data blocks)
`
`Disk Controller/
`Storage Director
`
`& Track Buffers
`
`Formatter
`
`Clocking
`
`SMD, ESDI (bits)
`
`ST506, ST412 (pulses)
`
`magnetic
`media
`
`magnetic
`media
`
`Figure 2: Host-to-Device Pathways. Data that is read from a magnetic disk must pass through
`many layers on its way to the requesting processor. Each dashed line marks a standard interface.
`The lower interfaces, such as ST506 deal more closely with the raw magnetic fields and are
`highly technology dependent. Higher layers such as SCSI deal in packets or blocks of data and are
`more technology independent. A string connects multiple disks to a single I/O controller.
`
`October 29, 1993
`
`6
`
`DHPN-1011 / Page 9 of 65
`
`
`
`server world, the channel processor is usually called an I/O controllers or host-bus adaptor (HBA)
`
`and the functionality of the storage director and head of string is contained in an embedded con-
`
`troller on the disk drive. As in the mainframe world, the use of high-level peripheral interfaces
`
`such as SCSI and IPI-3 allow multiple disks to share a single peripheral bus or string.
`
` From the host-bus adaptor, the data is transferred via direct memory access, over a system
`
`bus, such as VME, S-Bus, MicroChannel, EISA, or PCI, to the host operating system’s buffers. In
`
`most operating systems, the CPU then performs a memory to memory copy over a high-speed
`
`memory bus from the operating system buffers to buffers in the application’s address space.
`
`2.3 Technology Trends
`
`Much of the motivation for disk arrays comes from the current trends in disk technology. As
`
`Table 2 shows, magnetic disk drives have been improving rapidly by some metrics and hardly at
`
`all by other metrics. Smaller distances between the magnetic read/write head and the disk surface,
`
`more accurate positioning electronics, and more advanced magnetic media have dramatically
`
`increased the recording density on the disks. This increased density has improved disks in two
`
`ways. First, it has allowed disk capacities to stay constant or increase, even while disk sizes have
`
`1993
`
`50-150
`Mbits/sq. inch
`40,000-60,000
`bits/inch
`1,500-3,000
`tracks/inch
`
`Historical Rate
`of Improvement
`
`27% per year
`
`13% per year
`
`10% per year
`
`100-2000 MB
`
`27% per year
`
`3-4 MB/s
`7-20 ms
`
`22% per year
`8% per year
`
`Areal Density
`
`Linear Density
`
`Inter-Track Density
`
`Capacity
` (3.5” form factor)
`Transfer Rate
`Seek Time
`
`Table 2: Trends in Disk Technology. Magnetic disks are improving rapidly in density and capacity, but
`more slowly in performance. Areal density is the recording density per square inch of magnetic media. In
`1989, IBM demonstrated a 1 Gbit/sq. inch density in a laboratory environment. Linear density is the
`number of bits written along a track. Inter-track density refers to the number of concentric tracks on a
`single platter.
`
`October 29, 1993
`
`7
`
`DHPN-1011 / Page 10 of 65
`
`
`
`decreased from 5.25” in 1983 to 3.5” in 1985 to 2.5” in 1991 to 1.8” in 1992 to 1.3” in 1993. Sec-
`
`ond, the increased density, along with an increase in the rotational speed of the disk, has made pos-
`
`sible a substantial increase in the transfer rate of disk drives. Seek times, however, have improved
`
`very little, decreasing from approximately 20 ms in 1980 to 10 ms today. Rotational speeds have
`
`increased at a similar rate from 3600 revolutions per minute in 1980 to 5400-7200 today.
`
`3 DISK ARRAY BASICS
`
`This section examines basic issues in the design and implementation of disk arrays. In partic-
`
`ular, we examine the concepts of data striping and redundancy; basic RAID organizations; perfor-
`
`mance and cost comparisons between the basic RAID organizations; the reliability of RAID-based
`
`systems in the face of system crashes, uncorrectable bit-errors and correlated disk failures; and
`
`finally, issues in the implementations of block-interleaved, redundant disk arrays.
`
`3.1 Data Striping and Redundancy
`
`Redundant disk arrays employ two orthogonal concepts: data striping for improved perfor-
`
`mance and redundancy for improved reliability. Data striping transparently distributes data over
`
`multiple disks to make them appear as a single fast, large disk. Striping improves aggregate I/O
`
`performance by allowing multiple I/Os to be serviced in parallel. There are two aspects to this par-
`
`allelism. First, multiple, independent requests can be serviced in parallel by separate disks. This
`
`decreases the queueing time seen by I/O requests. Second, single, multiple-block requests can be
`
`serviced by multiple disks acting in coordination. This increases the effective transfer rate seen by
`
`a single request. The more disks in the disk array, the larger the potential performance benefits.
`
`Unfortunately, a large number of disks lowers the overall reliability of the disk array, as mentioned
`
`before. Assuming independent failures, 100 disks collectively have only 1/100th the reliability of a
`
`single disk. Thus, redundancy is necessary to tolerate disk failures and allow continuous operation
`
`without data loss.
`
`We will see that the majority of redundant disk array organizations can be distinguished based
`
`on the granularity of data interleaving and the method and pattern in which the redundant informa-
`
`October 29, 1993
`
`8
`
`DHPN-1011 / Page 11 of 65
`
`
`
`tion is computed and distributed across the disk array. Data interleaving can be characterized as
`
`either fine-grained or coarse-grained. Fine-grained disk arrays conceptually interleave data at a rel-
`
`atively small unit such that all I/O requests, regardless or their size, access all of the disks in the
`
`disk array. This results in very high data transfer rates for all I/O requests but has the disadvantage
`
`that only one logical I/O request can be in service at any given time and all disks must waste time
`
`positioning for every request. Coarse-grained disk arrays interleave data at a relatively large unit
`
`so that small I/O requests need access only a small number of disks but large requests can access
`
`all the disks in the disk array. This allows multiple small requests to be serviced simultaneously
`
`but still allows large requests the benefits of using all the disks in the disk array.
`
`The incorporation of redundancy in disk arrays brings up two somewhat orthogonal prob-
`
`lems. The first problem is selecting the method for computing the redundant information. Most
`
`redundant disk arrays today use parity but there are some that use Hamming codes or Reed-
`
`Solomon codes. The second problem is that of selecting a method for distributing the redundant
`
`information across the disk array. Although there are an unlimited number of patterns in which
`
`redundant information can be distributed, we roughly classify these patterns into two different dis-
`
`tributions schemes, those that concentrate redundant information on a small number of disks and
`
`those that distributed redundant information uniformly across all of the disks. Schemes that uni-
`
`formly distribute redundant information are generally more desirable because they avoid hot spots
`
`and other load balancing problems suffered by schemes that do not uniformly distribute redundant
`
`information. Although the basic concepts of data striping and redundancy are conceptually simple,
`
`selecting between the many possible data striping and redundancy schemes involves complex
`
`tradeoffs between reliability, performance and cost.
`
`3.2 Basic RAID Organizations
`
`This section describes the basic RAID, Redundant Arrays of Inexpensive Disks, organiza-
`
`tions which will be used as the basis for further examinations of performance, cost and reliability
`
`of disk arrays. In addition to presenting RAID levels 1 through 5 which first appeared in the land-
`
`mark paper by Patterson, Gibson and Katz [Patterson88], we present two other RAID organiza-
`
`October 29, 1993
`
`9
`
`DHPN-1011 / Page 12 of 65
`
`
`
`tions, RAID levels 0 and 6, which have since become generally accepted1. For the benefit of those
`
`unfamiliar with the original numerical classification of RAID, we will use English phrases in pref-
`
`erence to the numerical classifications. It should come as no surprise to the reader that even the
`
`original authors have sometimes been confused as to the disk array organization referred to by a
`
`particular RAID level! Figure 3 schematically illustrates the seven RAID organizations.
`
`3.2.1 Non-Redundant (RAID Level 0)
`
`The non-redundant disk array, or RAID level 0, has the lowest cost of any redundancy
`
`scheme because it does not employ redundancy at all. This scheme offers the best write perfor-
`
`mance since it never needs to update redundant information. Surprisingly, it does not have the best
`
`read performance. Redundancy schemes such as mirroring, which duplicate data, can perform bet-
`
`ter on reads by selectively scheduling requests on the disk with the shortest expected seek and rota-
`
`tional delays [Bitton88]. Without redundancy, any single disk failure will result in data-loss. Non-
`
`redundant disk arrays are widely used in supercomputing environments where performance and
`
`capacity, rather than reliability, are the primary concerns.
`
`3.2.2 Mirrored (RAID Level 1)
`
`The traditional solution, called mirroring or shadowing, uses twice as many disks as a non-
`
`redundant disk array [Bitton88]. Whenever data is written to a disk the same data is also written to
`
`a redundant disk, so that there are always two copies of the information. When data is read, it can
`
`be retrieved from the disk with the shorter queueing, seek and rotational delays [Chen90a]. If a
`
`disk fails, the second copy is used to service requests. Mirroring is frequently used in database
`
`applications where availability and transaction rate are more important than storage efficiency
`
`[Gray90].
`
`1. Strictly speaking, RAID Level 0 is not a type of redundant array of inexpensive disks since it stores no
`error-correcting codes.
`
`October 29, 1993
`
`10
`
`DHPN-1011 / Page 13 of 65
`
`
`
`Non-Redundant (RAID Level 0)
`
`Mirrored (RAID Level 1)
`
`Memory-Style ECC (RAID Level 2)
`
`Bit-Interleaved Parity (RAID Level 3)
`
`Block-Interleaved Parity (RAID Level 4)
`
`Block-Interleaved Distributed-Parity (RAID Level 5)
`
`P+Q Redundancy (RAID Level 6)
`
`Figure 3:RAID Levels 0 Through 6. All RAID levels are illustrated at a user capacity of four
`disks. Disks with multiple platters indicate block-level striping while disks without multiple
`platters indicate bit-level striping. The shaded platters represent redundant information.
`
`October 29, 1993
`
`11
`
`DHPN-1011 / Page 14 of 65
`
`
`
`3.2.3 Memory-Style ECC (RAID Level 2)
`
`Memory systems have provided recovery from failed components with much less cost than
`
`mirroring by using Hamming codes [Peterson72]. Hamming codes contain parity for distinct over-
`
`lapping subsets of components. In one version of this scheme, four data disks require three redun-
`
`dant disks, one less than mirroring. Since the number of redundant disks is proportional to the log
`
`of the total number of disks in the system, storage efficiency increases as the number of data disks
`
`increases.
`
`If a single component fails, several of the parity components will have inconsistent values,
`
`and the failed component is the one held in common by each incorrect subset. The lost information
`
`is recovered by reading the other components in a subset, including the parity component, and set-
`
`ting the missing bit to 0 or 1 to create the proper parity value for that subset. Thus, multiple redun-
`
`dant disks are needed to identify the failed disk, but only one is needed to recover the lost
`
`information.
`
`Readers unfamiliar with parity can think of the redundant disk as having the sum of all the
`
`data in the other disks. When a disk fails, you can subtract all the data on the good disks from the
`
`parity disk; the rem