`McDonald et al.
`
`I 1111111111111111 11111 1111111111 111111111111111 IIIII lllll 111111111111111111
`US006098114A
`[11] Patent Number:
`[45] Date of Patent:
`
`6,098,114
`Aug. 1, 2000
`
`[54]
`
`[75]
`
`DISK ARRAY SYSTEM FOR PROCESSING
`AND TRACKING THE COMPLETION OF 1/0
`REQUESTS
`
`Inventors: James Arthur McDonald, Palo Alto;
`John Peter Herz, Los Altos; Mitchell
`Allen Altman, San Francisco; William
`Edward Smith, III, Hayward, all of
`Calif.
`
`5,506,977
`5,530,897
`5,530,960
`5,548,712
`5,550,975
`5,574,662
`5,574,851
`5,574,882
`5,581,740
`5,586,248
`
`4/1996 Jones ....................................... 711/155
`6/1996 Meritt .......................................... 710/9
`6/1996 Parks et al. ................................. 710/5
`8/1996 Larson et al. .. ... ... .... ... ... ... ... .... ... 714/7
`8/1996 Ichinomiya et al. ...................... 714/51
`11/1996 Windrem et al. ....................... 709/219
`11/1996 Rathunde .................................... 714/7
`11/1996 Menon et al.
`.......................... 711/114
`12/1996 Jones .................................. 395/500.46
`12/1996 Alexander et al. .. .... ... ... ... ... ... .. 714/22
`
`[73] Assignee: 3Ware, Palo Alto, Calif.
`
`(List continued on next page.)
`
`[21] Appl. No.: 09/034,812
`
`[22] Filed:
`
`Mar. 4, 1998
`
`Related U.S. Application Data
`[60] Provisional application No. 60/065,848, Nov. 14, 1997.
`Int. Cl.7 ...................................................... G06F 13/14
`U.S. Cl. .................................... 710/5; 711/114; 714/6
`Field of Search .................................. 710/3, 5, 9, 27,
`710/57; 709/235; 711/4, 201, 114; 714/6,
`5, 7; 395/500.46
`
`[51]
`[52]
`[58]
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,210,860
`5,278,838
`5,283,875
`5,297,258
`5,309,451
`5,313,585
`5,313,626
`5,315,602
`5,345,565
`5,367,669
`5,388,108
`5,390,327
`5,455,934
`5,473,761
`5,479,611
`5,479,653
`5,483,641
`5,487,160
`5,499,385
`5,502,836
`
`5/1993 Pfeffer et al. ............................. 714/42
`1/1994 Ng et al. ..................................... 714/6
`2/1994 Gibson et al. .............................. 711/4
`3/1994 Hale et al. .............................. 711/114
`5/1994 Noya et al. ............................. 714/766
`5/1994 Jeffries et al. .......................... 711/201
`5 /1994 Jones et al. .... ... ... ... ... .... ... ... ... ... . 714/5
`5/1994 Noya et al. ............................. 714/766
`9/1994 Jibbe et al.
`............................. 710/130
`11/1994 Holland et al. ... ... ... ... .... ... ... ... ... . 714/7
`2/1995 DeMoss et al.
`............................ 714/6
`2/1995 Lubbers et al. ... ... ... ... .... ... ... ... ... . 714/7
`10/1995 Holland et al. ............................. 711/4
`12/1995 Parks et al. ................................. 711/4
`12/1995 Oyama ...................................... 714/48
`12/1995 Jones .. ... ... ... ... .... ... ... ... ... ... .... ... ... 714/5
`1/1996 Jones et al. .... ... ... ... ... .... ... ... ... ... . 710/3
`1/1996 Bemis ... ... ... ... .... ... ... ... ... ... .... .. 711/114
`3/1996 Farmwald et al. .......................... 710/3
`3/1996 Hale et al. .............................. 711/170
`
`Primary Examiner-Thomas C. Lee
`Assistant Examiner-Harold Kim
`Attorney, Agent, or Firm-Knobbe, Martens, Olson & Bear,
`LLP
`
`[57]
`
`ABSTRACT
`
`A high-performance RAID system for a PC comprises a
`controller card which controls an array of ATA disk drives.
`The controller card includes an array of automated disk
`drive controllers, each of which controls one respective disk
`drive. The disk drive controllers are connected to a micro(cid:173)
`controller by a control bus and are connected to an auto(cid:173)
`mated coprocessor by a packet-switched bus. The coproces(cid:173)
`sor accesses system memory and a local buffer. In operation,
`the disk drive controllers respond to controller commands
`from the microcontroller by accessing their respective disk
`drives, and by sending packets to the coprocessor over the
`packet-switched bus. The packets carry 1/0 data (in both
`directions, with the coprocessor filling-in packet payloads
`on 1/0 writes), and carry transfer commands and target
`addresses that are used by the coprocessor to access the
`buffer and system memory. The packets also carry special
`completion values (generated by the microcontroller) and
`1/0 request identifiers that are processed by a logic circuit of
`the coprocessor to detect the completion of processing of
`each 1/0 request. The coprocessor grants the packet(cid:173)
`switched bus to the disk drive controllers using a round robin
`arbitration protocol which guarantees a minimum 1/0 band(cid:173)
`width to each disk drive. This minimum 1/0 bandwidth is
`preferably greater than the sustained transfer rate of each
`disk drive, so that all drives of the array can operate at the
`sustained transfer rate without the formation of a bottleneck.
`
`23 Claims, 9 Drawing Sheets
`
`DRIVE-SPECIFIC
`CONTROLLER COMMANDS
`
`PACKETS
`
`TOKEN
`
`1/0 REQ
`+
`TOKEN
`
`98
`
`RAM
`
`82 ~~--~8~
`
`106
`
`TOKEN
`
`I O REQ
`
`1/0 DATA
`
`102
`
`SYSTEM MEMORY
`
`TOKEN STATUS
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1011, p. 1
`
`
`
`6,098,114
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`5,596,708
`5,598,549
`5,619,723
`5,619,728
`5,651,132
`5,664,096
`5,671,349
`5,687,390
`
`1/1997 Weber ......................................... 714/6
`1/1997 Rathunde . ... ... .... ... ... ... ... ... .... .. 711/114
`4/1997 Jones et al. .... ... ... ... ... .... ... ... ... ... . 710/3
`4/1997 Jones et al. .... ... ... ... ... .... ... ... ... .. 710/27
`7 /1997 Honda et al. .. ... ... ... ... .... ... ... ... 711/114
`9/1997 Ichinomiya et al. ...................... 714/48
`9/1997 Hashemi et al. .......................... 714/48
`11/1997 McMillan, Jr.
`. ............................ 710/5
`
`.............................. 710/57
`2/1998 Grieff et al.
`5,717,954
`2/1998 Wilkes et al. .. ... ... .... ... ... ... ... .... ... 714/6
`5,720,025
`3/1998 Weber ..................................... 710/128
`5,729,705
`3/1998 Cohn et al. ............................. 711/134
`5,734,861
`4/1998 Yanai et al. ............................. 711/162
`5,742,792
`7/1998 Miller et al. ... ... ... .... ... ... ... ... ... 709 /235
`5,784,569
`1/1999 Dekoning et al. ...................... 711/114
`5,860,091
`8/1999 Jantz ....................................... 711/114
`5,937,428
`5,974,502 10/1999 Dekoning et al. ...................... 711/114
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1011, p. 2
`
`
`
`"'-'
`~
`~
`....
`00
`
`.... = \0
`
`0--,
`
`\0
`
`'"""' 0 ....,
`~ ....
`'JJ. =(cid:173)~
`
`0
`0
`0
`N
`'"""' ~
`
`$
`
`~ =
`......
`~ ......
`~
`•
`r:JJ.
`d •
`
`46
`
`50
`
`CONTROLLER
`
`ARRAY
`
`CONTROLLER
`
`SCSI
`
`/30
`
`I""' ""I SCSI DRIVE J
`
`32
`
`• • •
`
`,---a(cid:141)-i1 SCSI DRIVE
`
`32
`
`38
`
`42
`
`)-1.P
`HOST
`
`MEMORY
`SYSTEM
`
`40
`
`/r 34
`
`HOST PC
`
`\
`
`BUFFER
`
`60
`
`58
`
`)-1.C
`
`56
`
`PCI-TO-PCI
`
`BRIDGE
`
`HOST PCI ~US
`------------------------
`_______________________ J
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`SCSI
`I
`I
`---------------------~---------1--7
`
`LOCAL PCI BUS
`
`50
`
`CONTROLLER
`
`SCSI
`
`I
`I
`I
`I
`' I
`,-----------
`L __________ _
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`1
`
`44
`
`50
`
`CONTROLLER
`
`52
`
`• • •
`
`-1 SCSI DRIVE J
`
`SCSI DRIVE J 52~ •
`
`1--
`
`52
`
`•
`• •
`
`(PRIOR ARl}
`FIG 1
`
`~---, SCSI DRIVE
`
`...--a~-i, SCSI DRIVE
`
`32
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1011, p. 3
`
`
`
`"'--
`~
`~
`....
`00
`
`.... = \0
`
`0-.,
`
`\0
`0 ....,
`N
`~ ....
`'JJ. =(cid:173)~
`
`0
`0
`0
`N
`'"""' ~
`
`$
`
`~ =
`.....
`~ .....
`~
`•
`rJJ.
`~ •
`
`:
`I
`I
`I
`:
`r----------------------------------1/~
`V
`I ___________________________________ J
`:
`I
`I
`
`I
`
`HOST PC
`
`1----------'--.....,...---,---"--42 __ --.
`
`HOST BUS PCI
`
`J/.P
`HOST
`
`-....L--/ 38
`
`!OO
`
`40
`
`DEVICE DRIVER
`SYSTEM MEMORY
`
`:
`I
`I ___ ......._ _ __.___
`I
`:
`
`102
`
`STATUS TABLE
`
`I
`I
`I
`
`FIG 2
`
`98
`
`---
`
`).)C
`
`____ __,,_...---RAM
`
`COPROCESSOR
`
`AUTOMATED
`
`ARRAY
`
`....__-...J
`
`~---'~
`
`BUFFER
`
`94
`
`1
`
`1
`
`I
`I
`I
`I
`I
`
`I
`I
`I
`1
`
`96
`
`86
`
`I
`I
`I
`1
`
`(PCI CARD)
`
`'"~~~~~~~~~~~~~g<~~~
`~><X'><S8om~~~rnNr""'"'rmwi~(cid:143)
`
`l7i~~
`
`CONTROLLER
`AUTOMATED
`
`N
`
`• • •
`
`84
`
`84
`
`2
`
`1
`
`-
`
`-
`
`CONTROLLER
`AUTOMATED
`,__ _
`__.__........._--.
`r-----r-------i-76--
`
`1
`
`/
`
`-75 84
`
`I
`I
`
`1
`I
`I
`I
`I
`: CONTROLLER
`AUTOMATED
`
`--x;,, ARRAY CON{:c:: I
`
`DRIVE N
`
`ATA L-72
`
`• • •
`
`v,,,.-72
`
`DRIVE 2
`
`ATA
`
`72
`
`DRIVE 1
`
`ATA
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1011, p. 4
`
`
`
`"--
`~
`~
`....
`00
`
`.... = \0
`
`O"I
`
`~
`
`1.0
`0 ....,
`~ ....
`rJJ. =(cid:173)~
`
`0
`0
`0
`N
`:-'
`~
`>
`
`(0 =
`• ;p
`00
`~ •
`
`"""'"
`"""'"
`
`TOKEN 1/0 REQ
`
`106
`
`DR1 DR2 • • • I DRNf I
`
`108
`
`RAM
`
`98
`
`108
`
`FIG 3
`
`I flC
`
`82
`
`1/0 REQ
`
`TOKEN
`
`+
`
`TOKEN
`
`CONTROLLER COMMANDS
`
`DRIVE-SPECIFIC
`
`(FIG, 4)
`
`100
`
`{.
`
`I
`I
`
`DRIVER
`I DEVICE
`
`40
`
`TOK~
`
`,.
`
`,,
`COPROCESSOR
`
`ARRAY
`
`80
`
`(FIG, 5)
`PACKETS
`
`CONTROLLERS
`AUTOMATED
`
`84
`
`I
`COMPLETION
`
`FLAG
`
`/
`
`SYSTEM MEMORY
`
`/
`1/0 DATA
`
`102
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1011, p. 5
`
`
`
`U.S. Patent
`
`Aug. 1, 2000
`
`Sheet 4 of 9
`
`6,098,114
`
`CONTROLLER COMMAND
`
`COMMAND BLOCK
`
`TARGET ADDRESS
`
`TRANSFER INFO
`
`I
`1
`
`.
`TOKEN
`:
`~-------------------,
`:
`DISK COMPLETION VALUE
`r------------------- I I
`\
`STATUS TABLE ADDRESS
`1
`____________________ J
`
`1
`
`F/C 4
`
`PACKET:
`
`TRANSFER
`COMMAND
`
`TARGET
`ADDRESS
`
`OPTIONAL
`PAYLOAD
`
`F/C 5
`
`(cid:141)
`
`TIME
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1011, p. 6
`
`
`
`"'--
`~
`~
`....
`00
`
`.... = \0
`
`0-.,
`
`\0
`0 ....,
`Ul
`~ ....
`'JJ. =(cid:173)~
`
`$
`
`0
`0
`0
`N
`'""" ~
`
`~ =
`.....
`~ .....
`~
`•
`rJJ.
`~ •
`
`FIG 6
`
`))C
`
`1-------0--i INT
`
`80
`
`(TO PCI BUS)
`
`PCI 1/F
`
`INT
`
`148
`
`142
`COPROCESSOR
`
`STATE MACH/NE
`
`ARBITRATION
`
`ARRAY
`124
`
`PROCESSOR
`
`AUTOMATED
`
`PACKET
`
`134
`
`CTRL
`\., .. --··r,-·····-1 BUFFER
`
`136
`
`BUSCLK
`
`PEC,_7
`
`PEC,0
`
`82
`
`7
`
`--ACz-s
`RDY2_8 (FROM)
`
`!JO
`
`130
`
`ACz-s
`
`REO2-s
`
`GNTz-s LJ} TO
`
`126
`
`7
`
`REQ1 /
`
`11
`
`GNT
`
`124
`
`126
`(rgz_s)
`
`'.32
`
`.90A
`
`DATA
`
`(TO AC1-s)
`
`J
`
`120
`
`32
`
`RDY1
`
`180
`
`BUFFER
`
`CMD
`
`XFER/CMD CONTROL
`
`32
`
`172
`
`32
`
`18·
`
`170
`
`FIFO
`WRITE
`
`FIFO
`READ
`
`176
`
`AC1
`
`84
`
`179
`
`16
`
`16
`
`16
`
`16
`
`CNTRL
`DATA
`ATA
`TO ATA DRIVE 1
`
`178,
`
`182
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1011, p. 7
`
`
`
`U.S. Patent
`
`Aug. 1, 2000
`
`Sheet 6 of 9
`
`6,098,114
`
`START
`
`N=1
`
`FIG, 7
`
`2/0
`
`ASSERT GNTN
`AND RECEIVE
`PACKET HEADER
`(TRANSFER COMMAND
`+ TARGET ADDRESS)
`
`212
`
`YES
`
`TRANSFER
`COMMAND =
`"WRITE PCI
`COMPLETE"
`
`NO
`
`216
`
`READ
`PAYLOAD
`UNAVAILABLE
`
`YES
`,___ _ _ _
`
`DEASSERT GNTN
`TO TERM IN A TE
`SLOT
`
`220
`
`NO
`
`TRANSMIT OR
`RECEIVE PAYLOAD
`THEN DEASSERT
`GNTN TO ENO
`SLOTN
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1011, p. 8
`
`
`
`U.S. Patent
`
`Aug. 1, 2000
`
`Sheet 7 of 9
`
`6,098,114
`
`,--144
`
`1244
`
`- COMPARE
`
`INT
`
`1487-
`
`/
`
`/
`
`FF
`
`/'s 1240
`'
`
`/ ,248
`I,
`I
`
`~8 bits
`
`0
`1
`2
`•
`•
`•
`F
`
`TOl<EN
`
`/1/8
`
`/
`
`/8
`
`1242
`
`-
`
`OR
`
`ISK D
`COMP
`LETION
`VA
`LUE
`
`FIG 8
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1011, p. 9
`
`
`
`"'--
`~
`~
`....
`00
`
`.... = \0
`
`0-.,
`
`\0
`0 ....,
`00
`~ ....
`'JJ. =(cid:173)~
`
`0
`0
`0
`N
`'"""' ~
`
`$
`
`~ =
`.....
`~ .....
`~
`•
`rJJ.
`~ •
`
`90A
`
`DATA
`
`LOCAL CNTRL
`
`86A
`
`ROY
`
`130
`
`1
`
`7
`
`-1_75'
`
`,
`
`32
`
`FIFOs
`
`170, 172
`
`16
`
`-_I
`
`-
`
`-
`
`-
`
`I 32
`
`272-
`
`) REGS
`
`XFER CMD
`
`FIG, g
`
`276
`
`CNTRL
`FIFO
`
`16
`
`178
`
`L __ _
`
`1
`
`179C
`179B1 /0 READY
`
`DRIVE
`ATA
`
`~ R
`
`ENGINE
`
`CMD
`
`,,.-262
`
`I
`
`268
`DONE
`
`ENGINE
`XFER
`
`--'
`
`START)
`
`-
`
`264
`
`260
`
`XFER/CMD CONTROL
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`..... -
`
`I
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`179A
`CHIP SELECTS
`
`STROBES
`
`IRQ
`
`1
`I
`1--
`
`1790
`
`72
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1011, p. 10
`
`
`
`U.S. Patent
`
`Aug. 1, 2000
`
`Sheet 9 of 9
`
`6,098,114
`
`STATUS READ
`
`(GOOD)
`
`COMMAND WRITE
`
`DATA TRANSFER
`
`FIG, 10
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1011, p. 11
`
`
`
`6,098,114
`
`1
`DISK ARRAY SYSTEM FOR PROCESSING
`AND TRACKING THE COMPLETION OF 1/0
`REQUESTS
`
`PRIORITY CLAIM
`
`This application claims the benefit of U.S. Provisional
`Appl. No. 60/065,848, filed Nov. 14, 1997, titled HIGH
`PERFORMANCE ARCHITECTURE FOR DISK ARRAY
`SYSTEM.
`
`FIELD OF THE INVENTION
`
`The present invention relates to disk arrays, and more
`particularly, relates to hardware and software architectures
`for hardware-implemented RAID (Redundant Array oflnex(cid:173)
`pensive Disks) and other disk array systems.
`
`BACKGROUND OF THE INVENTION
`
`A RAID system is a computer data storage system in
`which data is spread or "striped" across multiple disk drives.
`In many implementations, the data is stored in conjunction
`with parity information such that any data lost as the result
`of a single disk drive failure can be automatically recon(cid:173)
`structed.
`One simple type of RAID implementation is known as
`"software RAID." With software RAID, software (typically
`part of the operating system) which runs on the host com(cid:173)
`puter is used to implement the various RAID control func(cid:173)
`tions. These control functions include, for example, gener(cid:173)
`ating drive-specific read/write requests according to a
`striping algorithm, reconstructing lost data when drive fail(cid:173)
`ures occur, and generating and checking parity. Because
`these tasks occupy CPU bandwidth, and because the transfer
`of parity information occupies bandwidth on the system bus,
`software RAID frequently produces a degradation in per(cid:173)
`formance over single disk drive systems.
`Where performance is a concern, a "hardware(cid:173)
`implemented RAID" system may be used. With hardware(cid:173)
`implemented RAID, the RAID control functions are handled
`by a dedicated array controller (typically a card) which
`presents the array to the host computer as a single, compos-
`ite disk drive. Because little or no host CPU bandwidth is
`used to perform the RAID control functions, and because no
`RAID parity traffic flows across the system bus, little or no
`degradation in performance occurs.
`One potential benefit of RAID systems is that the input/
`output ("1/0") data can be transferred to and from multiple
`disk drives in parallel. By exploiting this parallelism
`(particularly within a hardware-implemented RAID
`system), it is possible to achieve a higher degree of perfor(cid:173)
`mance than is possible with a single disk drive. The two
`basic types of performance that can potentially be increased
`are the number of 1/0 requests processed per second
`("transactional performance") and the number of megabytes
`of 1/0 data transferred per second ("streaming
`performance").
`Unfortunately, few hardware-implemented RAID systems
`provide an appreciable increase in performance. In many
`cases, this failure to provide a performance improvement is
`the result of limitations in the array controller's bus archi(cid:173)
`tecture. Performance can also be adversely affected by
`frequent interrupts of the host computer's processor.
`In addition, attempts to increase performance have often
`relied on the use of expensive hardware components. For
`example, some RAID array controllers rely on the use of a
`relatively expensive microcontroller that can process 1/0
`
`2
`data at a high transfer rate. Other designs rely on complex
`disk drive interfaces, and thus require the use of expensive
`disk drives.
`The present invention addresses these and other limita-
`tions in existing RAID architectures.
`
`5
`
`20
`
`SUMMARY OF THE INVENTION
`The present invention provides a high-performance archi(cid:173)
`tecture for a hardware-implemented RAID or other disk
`10 array system. An important benefit of the architecture is that
`it provides a high degree of performance (both transactional
`and streaming) without the need for disk drives that are
`based on expensive or complex disk drive interfaces.
`In a preferred embodiment, the architecture is embodied
`15 within a PC-based disk array system which comprises an
`array controller card which controls an array of ATA disk
`drives. The controller card includes an array of automated
`ATA disk drive controllers, each of which controls a single,
`respective ATA drive.
`The controller card also includes an automated coproces-
`sor which is connected to each disk drive controller by a
`packet-switched bus, and which connects as a busmaster to
`the host PC bus. The coprocessor is also connected to a local
`1/0 data buffer of the card. As described below, a primary
`25 function of the coprocessor is to transfer 1/0 data between
`the disk drive controllers, the system memory, and the buffer
`in response to commands received from the disk drive
`controllers. Another function of the coprocesor is to control
`all accesses by the disk drive controllers to the packet-
`30 switched bus, to thereby control the flow of 1/0 data.
`The controller card further includes a microcontroller
`which connects to the disk drive controllers and to the
`coprocessor by a local control bus. The microcontroller runs
`35 a control program which implements a RAID storage con(cid:173)
`figuration. Because the microcontroller does not process or
`directly monitor the flow of 1/0 data (as described below),
`a low-cost, low-performance microcontroller can advanta-
`geously be used.
`In operation, the controller card processes multiple 1/0
`requests in at-a-time, and can process multiple 1/0 requests
`without interrupting the host computer. As 1/0 requests are
`received from the host computer, the microcontroller gen(cid:173)
`erates drive-specific sequences of controller commands
`45 (based on the particular RAID configuration), and dis(cid:173)
`patches these controller commands over the local control
`bus to the disk drive controllers. In addition to containing
`disk drive commands, these controller commands include
`transfer commands and target addresses that are
`50 (subsequently) used by the coprocessor to transfer 1/0 data
`to and from system memory and the local buffer.
`Some of the controller commands also include disk
`completion values and tokens (1/0 request identifiers) that
`are used by the coprocessor to monitor the completion status
`55 of pending 1/0 requests. The disk completion values are
`generated by the microcontroller such that the application of
`a specific logic function to all of the disk completion values
`for a given 1/0 request produces a final completion value
`that is known a priori to the coprocessor. As described
`60 below, this enables the coprocessor to detect the completion
`of processing of an 1/0 request without prior knowledge of
`the details (number of invoked disk drives, etc.) of the 1/0
`request.
`In response to the controller commands, the disk drive
`65 controllers access their respective disk drives and send
`packets to the coprocessor over the packet-switched bus.
`These packets carry 1/0 data (in both directions, with the
`
`40
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1011, p. 12
`
`
`
`5
`
`3
`coprocessor filling-in packet payloads on 1/0 writes), and
`carry transfer commands and target addresses that are used
`by the coprocessor to access the buffer and system memory.
`During this process, the coprocessor grants the packet(cid:173)
`switched bus to the disk drive controllers (for the transmis-
`sion of a single packet) using a round robin arbitration
`protocol which guarantees a minimum 1/0 bandwidth to
`each disk drive. The minimum bandwidth is equal to 1/N of
`total 1/0 bandwidth of the packet-switched bus, where N is
`the number of disk drive controllers (and disk drives) in the 10
`array.
`Because this minimum 1/0 bandwidth is greater than or
`equal to the sustained transfer rate of each disk drive, all N
`drives can operate concurrently at the sustained transfer rate
`indefinitely without the formation of a bottleneck. When the
`packet-switched bus is not being used by all of the disk drive
`controllers (i.e., one or more disk drive controllers has no
`packets to transmit), the arbitration protocol allows other
`disk drive controllers to use more than the guaranteed
`minimum 1/0 bandwidth. This additional 1/0 bandwidth
`may be used, for example, to transfer 1/0 data at rate higher
`than the sustained transfer rate when the requested 1/0 data
`resides in the disk drive's cache.
`The disk drive controllers process their respective
`sequences of controller commands asynchronously to one 25
`another; thus, the disk drive controllers that are invoked by
`a given 1/0 request can finish processing the 1/0 request in
`any order. When a given disk drive controller finishes
`processing an 1/0 request, the controller sends a special
`completion packet to the coprocessor. This completion 30
`packet contains the completion value that was assigned to
`the disk drive controller, and contains an identifier (token) of
`the 1/0 request.
`Upon receiving the completion packet, the coprocessor
`cumulatively applies the logic function to the completion
`value and all other completion values (if any) that have been
`received for the same 1/0 request, and compares the result
`to the final completion value. If a match occurs, indicating
`that all disk drives invoked by the 1/0 request have finished
`processing the 1/0 request, the coprocessor uses the token to
`inform the host computer and the microcontroller of the
`identity of the completed 1/0 request. Thus, the microcon(cid:173)
`troller monitors the completion status of pending 1/0
`requests without directly monitoring the flow of 1/0 data.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`There and other features of the architecture will now be
`described in further detail with reference to the drawings of
`the preferred embodiment, in which:
`FIG. 1 illustrates a prior art disk array architecture.
`FIG. 2 illustrates a disk array system in accordance with
`a preferred embodiment of the present invention.
`FIG. 3 illustrates the general flow of information between
`the primary components of the FIG. 2 system.
`FIG. 4 illustrates the types of information included within
`the controller commands.
`FIG. 5 illustrates a format used for the transmission of
`packets.
`FIG. 6 illustrates the architecture of the system in further
`detail.
`FIG. 7 is a flow diagram which illustrates a round robin
`arbitration protocol which is used to control access to the
`packet-switched bus of FIG. 2.
`FIG. 8 illustrates the completion logic circuit of FIG. 6 in
`further detail.
`
`6,098,114
`
`4
`FIG. 9 illustrates the transfer/command control circuit of
`FIG. 6 in further detail.
`FIG. 10 illustrates the operation of the command engine
`of FIG. 9.
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENT
`I. Existing RAID Architectures
`To illustrate several of the motivations behind the present
`invention, a prevalent prior art architecture used within
`existing PC-based RAID systems will initially be described
`with reference to FIG. 1. As depicted in FIG. 1, the archi(cid:173)
`tecture includes an array controller card 30 ("array
`controller") that couples an array of SCSI (Small Computer
`15 Systems Interface) disk drives 32 to a host computer (PC)
`34. The array controller 30 plugs into a PCI (Peripheral
`Component Interconnect) expansion slot of the host com(cid:173)
`puter 34, and communicates with a host processor 38 and a
`system memory 40 via a host PCI bus 42. For purposes of
`20 this description and the description of the preferred
`embodiment, it may be assumed that the host processor 38
`is an Intel Pentium™ or other X86-compatible
`microprocessor, and that the host computer 34 is operating
`under either the Windows™ 95 or the Windows™ NT
`operating system.
`The array controller 30 includes a PCI-to-PCI bridge 44
`which couples the host PCI bus 42 to a local PCI bus 46 of
`the controller 30, and which acts as a bus master with respect
`to both busses 42, 46. Two or more SCSI controllers 50
`(three shown in FIG. 1) are connected to the local PCI bus
`46. Each SCSI controller 50 controls the operation of two or
`more SCSI disk drives 32 via a respective shared cable 52.
`The array controller 30 also includes a microcontroller 56
`and a buffer 58, both of which are coupled to the local PCI
`35 bus by appropriate bridge devices (not shown). The buffer
`58 will typically include appropriate exclusive-OR (XOR)
`logic 60 for performing the XOR operations associated with
`RAID storage protocols.
`In operation, the host processor 38 (running under the
`40 control of a device driver) sends input/output (1/0) requests
`to the microcontroller 56 via the host PCI bus 42, the
`PCI-to-PCI bridge 44, and the local PCI bus 46. Each 1/0
`request typically consists of a command descriptor block
`(CDB) and a scatter-gather list. The CDB is a SCSI drive
`45 command that specifies such parameters as the disk opera(cid:173)
`tion to be performed ( e.g., read or write), a disk drive logical
`block address, and a transfer length. The scatter-gather list is
`an address list of one of more contiguous blocks of system
`memory for performing the 1/0 operation.
`The microcontroller 56 runs a firmware program which
`translates these 1/0 requests into component, disk-specific
`SCSI commands based on a particular RAID configuration
`(such as RAID 4 or RAID 5), and dispatches these com(cid:173)
`mands to corresponding SCSI controllers 50. For example,
`55 if, based on the particular RAID configuration implemented
`by the system, a given 1/0 request requires data to be read
`from every SCSI drive 32 of the array, the microcontroller
`56 sends SCSI commands to each of the SCSI controllers 50.
`The SCSI controllers in-turn arbitrate for control of the local
`60 PCI bus 46 to transfer 1/0 data between the SCSI disks 32
`and system memory 40. 1/0 data that is being transferred
`from system memory 40 to the disk drives 32 is initially
`stored in the buffer 58. The buffer 58 is also typically used
`to perform XOR operations, rebuild operations (in response
`65 to disk failures), and other operations associated with the
`particular RAID configuration. The microcontroller 56 also
`monitors the processing of the dispatched SCSI commands,
`
`50
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1011, p. 13
`
`
`
`6,098,114
`
`5
`and interrupts the host processor 38 to notify the device
`driver of completed transfer operations.
`The FIG. 1 architecture suffers from several deficiencies
`that are addressed by the present invention. One such
`deficiency is that the SCSI drives 32 are expensive in
`comparison to ATA (AT Attachment) drives. While it is
`possible to replace the SCSI drives with less expensive ATA
`drives (see, for example, U.S. Pat. No. 5,506,977), the use
`of ATA drives would generally result in a decrease in
`performance. One reason for the decreased performance is
`that ATA drives do not buffer multiple disk commands; thus
`eachATAdrive would normally remain inactive while a new
`command is being retrieved from the microcontroller 56.
`One goal of the present invention is thus to provide an
`architecture in which ATA and other low-cost drives can be
`used while maintaining a high level of performance.
`Another problem with the FIG. 1 architecture is that the
`local PCI bus and the shared cables 52 are susceptible to
`being dominated by a single disk drive 32. Such dominance
`can result in increased transactional latency, and a corre(cid:173)
`sponding degradation in performance. A related problem is
`that the local PCI bus 46 is used both for the transfer of
`commands and the transfer of 1/0 data; increased command
`traffic on the bus 46 can therefore adversely affect the
`throughput and latency of data traffic. As described below,
`the architecture of the preferred embodiment overcomes
`these and other problems by using separate control and data
`busses, and by using a round-robin arbitration protocol to
`grant the local data bus to individual drives.
`Another problem with the prior art architecture is that
`because the microcontroller 56 has to monitor the compo(cid:173)
`nent 1/0 transfers that are performed as part of each 1/0
`request, a high-performance microcontroller generally must
`be used. As described below, the architecture of the preferred
`embodiment avoids this problem by shifting the completion
`monitoring task to a separate, non-program-controlled
`device that handles the task of routing 1/0 data, and by
`embedding special completion data values within the 1/0
`data stream to enable such monitoring. This effectively
`removes the microcontroller from the 1/0 data path,
`enabling the use of a lower cost, lower performance micro(cid:173)
`controller.
`Another problem, in at least some RAID
`implementations, is that the microcontroller 56 interrupts the
`host processor 38 multiple times during the processing of a
`single 1/0 request. For example, it is common for the
`microcontroller 56 to interrupt the host processor 38 at least
`once for each contiguous block of system memory refer(cid:173)
`enced by the scatter-gather list. Because there is significant
`overhead associated with the processing of an interrupt, the 50
`processing of the interrupts significantly detracts from the
`processor bandwidth that is available for handling other
`types of tasks. It is therefore an object of the present
`invention to provide an architecture in which the array
`controller interrupts the host processor no more than once
`per 1/0 request.
`A related problem, in many RAID architectures, is that
`when the array controller 30 generates an interrupt request
`to the host processor 38, the array controller suspends
`operation, or at least postpones generating the following
`interrupt request, until after the pending interrupt request has
`been serviced. This creates a potential bottleneck in the flow
`of 1/0 data, and increases the number of interrupt requests
`that need to be serviced by the host processor 56. It is
`therefore an object of the invention to provide an architec- 65
`ture in which the array controller continues to process
`subsequent 1/0 requests while an interrupt request is
`
`6
`pending, so that the device driver can process multiple
`completed 1/0 requests when the host processor eventually
`services an interrupt request.
`The present invention provides a high performance disk
`5 array architecture which addresses these and other problems
`with prior art RAID systems. An important aspect of the
`invention is that the primary performance benefits provided
`by the architecture are not tied to a particular type of disk
`drive interface. Thus, the architecture can be implemented
`10 using ATA drives ( as in the preferred embodiment described
`below) and other types of relatively low-cost drives while
`providing a high level of performance.
`II. System Overview
`A disk array system which embodies the various features
`15 of the present invention will now be described with refer(cid:173)
`ence to the remaining drawings. Throughout this
`description, reference will be made to various
`implementation-specific details, including, for example, part
`numbers, industry standards, timing parameters, message
`20 formats, and widths of data paths. These details are provided
`in order to fully set forth a preferred embodiment of the
`invention, and not to limit the scope of the invention. The
`scope of the invention is set forth in the appended claims.
`As depicted in FIG. 2, the disk array system comprises an
`25 array controller card 70 ("array controller") that plugs into
`a PCI slot of the host computer 34. The array controller 70
`links the host computer to an array of ATA disk drives 72
`(numbered 1-N in FIG. 2), with each drive connected to the
`array controller by a respective ATA cable 76. In one
`30 implementation, the array controller 70 includes eight ATA
`ports to permit the connection of up to eight ATA drives. The
`use of a separate port per drive 72 enables the drives to be
`tightly controlled by the array controller 70, as is desirable
`for achieving a high level of performance. In the preferred
`35 embodiment, the array controller 70 supports both the ATA
`mode 4 standard (also known as Enhanced IDE) and the
`Ultra ATA standard (also known as Ultra DMA), permitting
`the use of both types of drives.
`As described below, the ability to use less expensive ATA
`40 drives, while maintaining a high level of performance, is an
`important feature of the invention. It will be recognized,
`however, that many of the architectural features of the
`invention can be used