throbber
United States Patent [19J
`McDonald et al.
`
`I 1111111111111111 11111 1111111111 111111111111111 IIIII lllll 111111111111111111
`US006098114A
`[11] Patent Number:
`[45] Date of Patent:
`
`6,098,114
`Aug. 1, 2000
`
`[54]
`
`[75]
`
`DISK ARRAY SYSTEM FOR PROCESSING
`AND TRACKING THE COMPLETION OF 1/0
`REQUESTS
`
`Inventors: James Arthur McDonald, Palo Alto;
`John Peter Herz, Los Altos; Mitchell
`Allen Altman, San Francisco; William
`Edward Smith, III, Hayward, all of
`Calif.
`
`5,506,977
`5,530,897
`5,530,960
`5,548,712
`5,550,975
`5,574,662
`5,574,851
`5,574,882
`5,581,740
`5,586,248
`
`4/1996 Jones ....................................... 711/155
`6/1996 Meritt .......................................... 710/9
`6/1996 Parks et al. ................................. 710/5
`8/1996 Larson et al. .. ... ... .... ... ... ... ... .... ... 714/7
`8/1996 Ichinomiya et al. ...................... 714/51
`11/1996 Windrem et al. ....................... 709/219
`11/1996 Rathunde .................................... 714/7
`11/1996 Menon et al.
`.......................... 711/114
`12/1996 Jones .................................. 395/500.46
`12/1996 Alexander et al. .. .... ... ... ... ... ... .. 714/22
`
`[73] Assignee: 3Ware, Palo Alto, Calif.
`
`(List continued on next page.)
`
`[21] Appl. No.: 09/034,812
`
`[22] Filed:
`
`Mar. 4, 1998
`
`Related U.S. Application Data
`[60] Provisional application No. 60/065,848, Nov. 14, 1997.
`Int. Cl.7 ...................................................... G06F 13/14
`U.S. Cl. .................................... 710/5; 711/114; 714/6
`Field of Search .................................. 710/3, 5, 9, 27,
`710/57; 709/235; 711/4, 201, 114; 714/6,
`5, 7; 395/500.46
`
`[51]
`[52]
`[58]
`
`[56]
`
`References Cited
`
`U.S. PATENT DOCUMENTS
`
`5,210,860
`5,278,838
`5,283,875
`5,297,258
`5,309,451
`5,313,585
`5,313,626
`5,315,602
`5,345,565
`5,367,669
`5,388,108
`5,390,327
`5,455,934
`5,473,761
`5,479,611
`5,479,653
`5,483,641
`5,487,160
`5,499,385
`5,502,836
`
`5/1993 Pfeffer et al. ............................. 714/42
`1/1994 Ng et al. ..................................... 714/6
`2/1994 Gibson et al. .............................. 711/4
`3/1994 Hale et al. .............................. 711/114
`5/1994 Noya et al. ............................. 714/766
`5/1994 Jeffries et al. .......................... 711/201
`5 /1994 Jones et al. .... ... ... ... ... .... ... ... ... ... . 714/5
`5/1994 Noya et al. ............................. 714/766
`9/1994 Jibbe et al.
`............................. 710/130
`11/1994 Holland et al. ... ... ... ... .... ... ... ... ... . 714/7
`2/1995 DeMoss et al.
`............................ 714/6
`2/1995 Lubbers et al. ... ... ... ... .... ... ... ... ... . 714/7
`10/1995 Holland et al. ............................. 711/4
`12/1995 Parks et al. ................................. 711/4
`12/1995 Oyama ...................................... 714/48
`12/1995 Jones .. ... ... ... ... .... ... ... ... ... ... .... ... ... 714/5
`1/1996 Jones et al. .... ... ... ... ... .... ... ... ... ... . 710/3
`1/1996 Bemis ... ... ... ... .... ... ... ... ... ... .... .. 711/114
`3/1996 Farmwald et al. .......................... 710/3
`3/1996 Hale et al. .............................. 711/170
`
`Primary Examiner-Thomas C. Lee
`Assistant Examiner-Harold Kim
`Attorney, Agent, or Firm-Knobbe, Martens, Olson & Bear,
`LLP
`
`[57]
`
`ABSTRACT
`
`A high-performance RAID system for a PC comprises a
`controller card which controls an array of ATA disk drives.
`The controller card includes an array of automated disk
`drive controllers, each of which controls one respective disk
`drive. The disk drive controllers are connected to a micro(cid:173)
`controller by a control bus and are connected to an auto(cid:173)
`mated coprocessor by a packet-switched bus. The coproces(cid:173)
`sor accesses system memory and a local buffer. In operation,
`the disk drive controllers respond to controller commands
`from the microcontroller by accessing their respective disk
`drives, and by sending packets to the coprocessor over the
`packet-switched bus. The packets carry 1/0 data (in both
`directions, with the coprocessor filling-in packet payloads
`on 1/0 writes), and carry transfer commands and target
`addresses that are used by the coprocessor to access the
`buffer and system memory. The packets also carry special
`completion values (generated by the microcontroller) and
`1/0 request identifiers that are processed by a logic circuit of
`the coprocessor to detect the completion of processing of
`each 1/0 request. The coprocessor grants the packet(cid:173)
`switched bus to the disk drive controllers using a round robin
`arbitration protocol which guarantees a minimum 1/0 band(cid:173)
`width to each disk drive. This minimum 1/0 bandwidth is
`preferably greater than the sustained transfer rate of each
`disk drive, so that all drives of the array can operate at the
`sustained transfer rate without the formation of a bottleneck.
`
`23 Claims, 9 Drawing Sheets
`
`DRIVE-SPECIFIC
`CONTROLLER COMMANDS
`
`PACKETS
`
`TOKEN
`
`1/0 REQ
`+
`TOKEN
`
`98
`
`RAM
`
`82 ~~--~8~
`
`106
`
`TOKEN
`
`I O REQ
`
`1/0 DATA
`
`102
`
`SYSTEM MEMORY
`
`TOKEN STATUS
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1011, p. 1
`
`

`

`6,098,114
`Page 2
`
`U.S. PATENT DOCUMENTS
`
`5,596,708
`5,598,549
`5,619,723
`5,619,728
`5,651,132
`5,664,096
`5,671,349
`5,687,390
`
`1/1997 Weber ......................................... 714/6
`1/1997 Rathunde . ... ... .... ... ... ... ... ... .... .. 711/114
`4/1997 Jones et al. .... ... ... ... ... .... ... ... ... ... . 710/3
`4/1997 Jones et al. .... ... ... ... ... .... ... ... ... .. 710/27
`7 /1997 Honda et al. .. ... ... ... ... .... ... ... ... 711/114
`9/1997 Ichinomiya et al. ...................... 714/48
`9/1997 Hashemi et al. .......................... 714/48
`11/1997 McMillan, Jr.
`. ............................ 710/5
`
`.............................. 710/57
`2/1998 Grieff et al.
`5,717,954
`2/1998 Wilkes et al. .. ... ... .... ... ... ... ... .... ... 714/6
`5,720,025
`3/1998 Weber ..................................... 710/128
`5,729,705
`3/1998 Cohn et al. ............................. 711/134
`5,734,861
`4/1998 Yanai et al. ............................. 711/162
`5,742,792
`7/1998 Miller et al. ... ... ... .... ... ... ... ... ... 709 /235
`5,784,569
`1/1999 Dekoning et al. ...................... 711/114
`5,860,091
`8/1999 Jantz ....................................... 711/114
`5,937,428
`5,974,502 10/1999 Dekoning et al. ...................... 711/114
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1011, p. 2
`
`

`

`"'-'
`~
`~
`....
`00
`
`.... = \0
`
`0--,
`
`\0
`
`'"""' 0 ....,
`~ ....
`'JJ. =(cid:173)~
`
`0
`0
`0
`N
`'"""' ~
`
`$
`
`~ =
`......
`~ ......
`~
`•
`r:JJ.
`d •
`
`46
`
`50
`
`CONTROLLER
`
`ARRAY
`
`CONTROLLER
`
`SCSI
`
`/30
`
`I""' ""I SCSI DRIVE J
`
`32
`
`• • •
`
`,---a(cid:141)-i1 SCSI DRIVE
`
`32
`
`38
`
`42
`
`)-1.P
`HOST
`
`MEMORY
`SYSTEM
`
`40
`
`/r 34
`
`HOST PC
`
`\
`
`BUFFER
`
`60
`
`58
`
`)-1.C
`
`56
`
`PCI-TO-PCI
`
`BRIDGE
`
`HOST PCI ~US
`------------------------
`_______________________ J
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`SCSI
`I
`I
`---------------------~---------1--7
`
`LOCAL PCI BUS
`
`50
`
`CONTROLLER
`
`SCSI
`
`I
`I
`I
`I
`' I
`,-----------
`L __________ _
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`I
`1
`
`44
`
`50
`
`CONTROLLER
`
`52
`
`• • •
`
`-1 SCSI DRIVE J
`
`SCSI DRIVE J 52~ •
`
`1--
`
`52
`
`•
`• •
`
`(PRIOR ARl}
`FIG 1
`
`~---, SCSI DRIVE
`
`...--a~-i, SCSI DRIVE
`
`32
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1011, p. 3
`
`

`

`"'--
`~
`~
`....
`00
`
`.... = \0
`
`0-.,
`
`\0
`0 ....,
`N
`~ ....
`'JJ. =(cid:173)~
`
`0
`0
`0
`N
`'"""' ~
`
`$
`
`~ =
`.....
`~ .....
`~
`•
`rJJ.
`~ •
`
`:
`I
`I
`I
`:
`r----------------------------------1/~
`V
`I ___________________________________ J
`:
`I
`I
`
`I
`
`HOST PC
`
`1----------'--.....,...---,---"--42 __ --.
`
`HOST BUS PCI
`
`J/.P
`HOST
`
`-....L--/ 38
`
`!OO
`
`40
`
`DEVICE DRIVER
`SYSTEM MEMORY
`
`:
`I
`I ___ ......._ _ __.___
`I
`:
`
`102
`
`STATUS TABLE
`
`I
`I
`I
`
`FIG 2
`
`98
`
`---
`
`).)C
`
`____ __,,_...---RAM
`
`COPROCESSOR
`
`AUTOMATED
`
`ARRAY
`
`....__-...J
`
`~---'~
`
`BUFFER
`
`94
`
`1
`
`1
`
`I
`I
`I
`I
`I
`
`I
`I
`I
`1
`
`96
`
`86
`
`I
`I
`I
`1
`
`(PCI CARD)
`
`'"~~~~~~~~~~~~~g<~~~
`~><X'><S8om~~~rnNr""'"'rmwi~(cid:143)
`
`l7i~~
`
`CONTROLLER
`AUTOMATED
`
`N
`
`• • •
`
`84
`
`84
`
`2
`
`1
`
`-
`
`-
`
`CONTROLLER
`AUTOMATED
`,__ _
`__.__........._--.
`r-----r-------i-76--
`
`1
`
`/
`
`-75 84
`
`I
`I
`
`1
`I
`I
`I
`I
`: CONTROLLER
`AUTOMATED
`
`--x;,, ARRAY CON{:c:: I
`
`DRIVE N
`
`ATA L-72
`
`• • •
`
`v,,,.-72
`
`DRIVE 2
`
`ATA
`
`72
`
`DRIVE 1
`
`ATA
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1011, p. 4
`
`

`

`"--
`~
`~
`....
`00
`
`.... = \0
`
`O"I
`
`~
`
`1.0
`0 ....,
`~ ....
`rJJ. =(cid:173)~
`
`0
`0
`0
`N
`:-'
`~
`>
`
`(0 =
`• ;p
`00
`~ •
`
`"""'"
`"""'"
`
`TOKEN 1/0 REQ
`
`106
`
`DR1 DR2 • • • I DRNf I
`
`108
`
`RAM
`
`98
`
`108
`
`FIG 3
`
`I flC
`
`82
`
`1/0 REQ
`
`TOKEN
`
`+
`
`TOKEN
`
`CONTROLLER COMMANDS
`
`DRIVE-SPECIFIC
`
`(FIG, 4)
`
`100
`
`{.
`
`I
`I
`
`DRIVER
`I DEVICE
`
`40
`
`TOK~
`
`,.
`
`,,
`COPROCESSOR
`
`ARRAY
`
`80
`
`(FIG, 5)
`PACKETS
`
`CONTROLLERS
`AUTOMATED
`
`84
`
`I
`COMPLETION
`
`FLAG
`
`/
`
`SYSTEM MEMORY
`
`/
`1/0 DATA
`
`102
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1011, p. 5
`
`

`

`U.S. Patent
`
`Aug. 1, 2000
`
`Sheet 4 of 9
`
`6,098,114
`
`CONTROLLER COMMAND
`
`COMMAND BLOCK
`
`TARGET ADDRESS
`
`TRANSFER INFO
`
`I
`1
`
`.
`TOKEN
`:
`~-------------------,
`:
`DISK COMPLETION VALUE
`r------------------- I I
`\
`STATUS TABLE ADDRESS
`1
`____________________ J
`
`1
`
`F/C 4
`
`PACKET:
`
`TRANSFER
`COMMAND
`
`TARGET
`ADDRESS
`
`OPTIONAL
`PAYLOAD
`
`F/C 5
`
`(cid:141)
`
`TIME
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1011, p. 6
`
`

`

`"'--
`~
`~
`....
`00
`
`.... = \0
`
`0-.,
`
`\0
`0 ....,
`Ul
`~ ....
`'JJ. =(cid:173)~
`
`$
`
`0
`0
`0
`N
`'""" ~
`
`~ =
`.....
`~ .....
`~
`•
`rJJ.
`~ •
`
`FIG 6
`
`))C
`
`1-------0--i INT
`
`80
`
`(TO PCI BUS)
`
`PCI 1/F
`
`INT
`
`148
`
`142
`COPROCESSOR
`
`STATE MACH/NE
`
`ARBITRATION
`
`ARRAY
`124
`
`PROCESSOR
`
`AUTOMATED
`
`PACKET
`
`134
`
`CTRL
`\., .. --··r,-·····-1 BUFFER
`
`136
`
`BUSCLK
`
`PEC,_7
`
`PEC,0
`
`82
`
`7
`
`--ACz-s
`RDY2_8 (FROM)
`
`!JO
`
`130
`
`ACz-s
`
`REO2-s
`
`GNTz-s LJ} TO
`
`126
`
`7
`
`REQ1 /
`
`11
`
`GNT
`
`124
`
`126
`(rgz_s)
`
`'.32
`
`.90A
`
`DATA
`
`(TO AC1-s)
`
`J
`
`120
`
`32
`
`RDY1
`
`180
`
`BUFFER
`
`CMD
`
`XFER/CMD CONTROL
`
`32
`
`172
`
`32
`
`18·
`
`170
`
`FIFO
`WRITE
`
`FIFO
`READ
`
`176
`
`AC1
`
`84
`
`179
`
`16
`
`16
`
`16
`
`16
`
`CNTRL
`DATA
`ATA
`TO ATA DRIVE 1
`
`178,
`
`182
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1011, p. 7
`
`

`

`U.S. Patent
`
`Aug. 1, 2000
`
`Sheet 6 of 9
`
`6,098,114
`
`START
`
`N=1
`
`FIG, 7
`
`2/0
`
`ASSERT GNTN
`AND RECEIVE
`PACKET HEADER
`(TRANSFER COMMAND
`+ TARGET ADDRESS)
`
`212
`
`YES
`
`TRANSFER
`COMMAND =
`"WRITE PCI
`COMPLETE"
`
`NO
`
`216
`
`READ
`PAYLOAD
`UNAVAILABLE
`
`YES
`,___ _ _ _
`
`DEASSERT GNTN
`TO TERM IN A TE
`SLOT
`
`220
`
`NO
`
`TRANSMIT OR
`RECEIVE PAYLOAD
`THEN DEASSERT
`GNTN TO ENO
`SLOTN
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1011, p. 8
`
`

`

`U.S. Patent
`
`Aug. 1, 2000
`
`Sheet 7 of 9
`
`6,098,114
`
`,--144
`
`1244
`
`- COMPARE
`
`INT
`
`1487-
`
`/
`
`/
`
`FF
`
`/'s 1240
`'
`
`/ ,248
`I,
`I
`
`~8 bits
`
`0
`1
`2
`•
`•
`•
`F
`
`TOl<EN
`
`/1/8
`
`/
`
`/8
`
`1242
`
`-
`
`OR
`
`ISK D
`COMP
`LETION
`VA
`LUE
`
`FIG 8
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1011, p. 9
`
`

`

`"'--
`~
`~
`....
`00
`
`.... = \0
`
`0-.,
`
`\0
`0 ....,
`00
`~ ....
`'JJ. =(cid:173)~
`
`0
`0
`0
`N
`'"""' ~
`
`$
`
`~ =
`.....
`~ .....
`~
`•
`rJJ.
`~ •
`
`90A
`
`DATA
`
`LOCAL CNTRL
`
`86A
`
`ROY
`
`130
`
`1
`
`7
`
`-1_75'
`
`,
`
`32
`
`FIFOs
`
`170, 172
`
`16
`
`-_I
`
`-
`
`-
`
`-
`
`I 32
`
`272-
`
`) REGS
`
`XFER CMD
`
`FIG, g
`
`276
`
`CNTRL
`FIFO
`
`16
`
`178
`
`L __ _
`
`1
`
`179C
`179B1 /0 READY
`
`DRIVE
`ATA
`
`~ R
`
`ENGINE
`
`CMD
`
`,,.-262
`
`I
`
`268
`DONE
`
`ENGINE
`XFER
`
`--'
`
`START)
`
`-
`
`264
`
`260
`
`XFER/CMD CONTROL
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`..... -
`
`I
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`-
`
`179A
`CHIP SELECTS
`
`STROBES
`
`IRQ
`
`1
`I
`1--
`
`1790
`
`72
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1011, p. 10
`
`

`

`U.S. Patent
`
`Aug. 1, 2000
`
`Sheet 9 of 9
`
`6,098,114
`
`STATUS READ
`
`(GOOD)
`
`COMMAND WRITE
`
`DATA TRANSFER
`
`FIG, 10
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1011, p. 11
`
`

`

`6,098,114
`
`1
`DISK ARRAY SYSTEM FOR PROCESSING
`AND TRACKING THE COMPLETION OF 1/0
`REQUESTS
`
`PRIORITY CLAIM
`
`This application claims the benefit of U.S. Provisional
`Appl. No. 60/065,848, filed Nov. 14, 1997, titled HIGH
`PERFORMANCE ARCHITECTURE FOR DISK ARRAY
`SYSTEM.
`
`FIELD OF THE INVENTION
`
`The present invention relates to disk arrays, and more
`particularly, relates to hardware and software architectures
`for hardware-implemented RAID (Redundant Array oflnex(cid:173)
`pensive Disks) and other disk array systems.
`
`BACKGROUND OF THE INVENTION
`
`A RAID system is a computer data storage system in
`which data is spread or "striped" across multiple disk drives.
`In many implementations, the data is stored in conjunction
`with parity information such that any data lost as the result
`of a single disk drive failure can be automatically recon(cid:173)
`structed.
`One simple type of RAID implementation is known as
`"software RAID." With software RAID, software (typically
`part of the operating system) which runs on the host com(cid:173)
`puter is used to implement the various RAID control func(cid:173)
`tions. These control functions include, for example, gener(cid:173)
`ating drive-specific read/write requests according to a
`striping algorithm, reconstructing lost data when drive fail(cid:173)
`ures occur, and generating and checking parity. Because
`these tasks occupy CPU bandwidth, and because the transfer
`of parity information occupies bandwidth on the system bus,
`software RAID frequently produces a degradation in per(cid:173)
`formance over single disk drive systems.
`Where performance is a concern, a "hardware(cid:173)
`implemented RAID" system may be used. With hardware(cid:173)
`implemented RAID, the RAID control functions are handled
`by a dedicated array controller (typically a card) which
`presents the array to the host computer as a single, compos-
`ite disk drive. Because little or no host CPU bandwidth is
`used to perform the RAID control functions, and because no
`RAID parity traffic flows across the system bus, little or no
`degradation in performance occurs.
`One potential benefit of RAID systems is that the input/
`output ("1/0") data can be transferred to and from multiple
`disk drives in parallel. By exploiting this parallelism
`(particularly within a hardware-implemented RAID
`system), it is possible to achieve a higher degree of perfor(cid:173)
`mance than is possible with a single disk drive. The two
`basic types of performance that can potentially be increased
`are the number of 1/0 requests processed per second
`("transactional performance") and the number of megabytes
`of 1/0 data transferred per second ("streaming
`performance").
`Unfortunately, few hardware-implemented RAID systems
`provide an appreciable increase in performance. In many
`cases, this failure to provide a performance improvement is
`the result of limitations in the array controller's bus archi(cid:173)
`tecture. Performance can also be adversely affected by
`frequent interrupts of the host computer's processor.
`In addition, attempts to increase performance have often
`relied on the use of expensive hardware components. For
`example, some RAID array controllers rely on the use of a
`relatively expensive microcontroller that can process 1/0
`
`2
`data at a high transfer rate. Other designs rely on complex
`disk drive interfaces, and thus require the use of expensive
`disk drives.
`The present invention addresses these and other limita-
`tions in existing RAID architectures.
`
`5
`
`20
`
`SUMMARY OF THE INVENTION
`The present invention provides a high-performance archi(cid:173)
`tecture for a hardware-implemented RAID or other disk
`10 array system. An important benefit of the architecture is that
`it provides a high degree of performance (both transactional
`and streaming) without the need for disk drives that are
`based on expensive or complex disk drive interfaces.
`In a preferred embodiment, the architecture is embodied
`15 within a PC-based disk array system which comprises an
`array controller card which controls an array of ATA disk
`drives. The controller card includes an array of automated
`ATA disk drive controllers, each of which controls a single,
`respective ATA drive.
`The controller card also includes an automated coproces-
`sor which is connected to each disk drive controller by a
`packet-switched bus, and which connects as a busmaster to
`the host PC bus. The coprocessor is also connected to a local
`1/0 data buffer of the card. As described below, a primary
`25 function of the coprocessor is to transfer 1/0 data between
`the disk drive controllers, the system memory, and the buffer
`in response to commands received from the disk drive
`controllers. Another function of the coprocesor is to control
`all accesses by the disk drive controllers to the packet-
`30 switched bus, to thereby control the flow of 1/0 data.
`The controller card further includes a microcontroller
`which connects to the disk drive controllers and to the
`coprocessor by a local control bus. The microcontroller runs
`35 a control program which implements a RAID storage con(cid:173)
`figuration. Because the microcontroller does not process or
`directly monitor the flow of 1/0 data (as described below),
`a low-cost, low-performance microcontroller can advanta-
`geously be used.
`In operation, the controller card processes multiple 1/0
`requests in at-a-time, and can process multiple 1/0 requests
`without interrupting the host computer. As 1/0 requests are
`received from the host computer, the microcontroller gen(cid:173)
`erates drive-specific sequences of controller commands
`45 (based on the particular RAID configuration), and dis(cid:173)
`patches these controller commands over the local control
`bus to the disk drive controllers. In addition to containing
`disk drive commands, these controller commands include
`transfer commands and target addresses that are
`50 (subsequently) used by the coprocessor to transfer 1/0 data
`to and from system memory and the local buffer.
`Some of the controller commands also include disk
`completion values and tokens (1/0 request identifiers) that
`are used by the coprocessor to monitor the completion status
`55 of pending 1/0 requests. The disk completion values are
`generated by the microcontroller such that the application of
`a specific logic function to all of the disk completion values
`for a given 1/0 request produces a final completion value
`that is known a priori to the coprocessor. As described
`60 below, this enables the coprocessor to detect the completion
`of processing of an 1/0 request without prior knowledge of
`the details (number of invoked disk drives, etc.) of the 1/0
`request.
`In response to the controller commands, the disk drive
`65 controllers access their respective disk drives and send
`packets to the coprocessor over the packet-switched bus.
`These packets carry 1/0 data (in both directions, with the
`
`40
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1011, p. 12
`
`

`

`5
`
`3
`coprocessor filling-in packet payloads on 1/0 writes), and
`carry transfer commands and target addresses that are used
`by the coprocessor to access the buffer and system memory.
`During this process, the coprocessor grants the packet(cid:173)
`switched bus to the disk drive controllers (for the transmis-
`sion of a single packet) using a round robin arbitration
`protocol which guarantees a minimum 1/0 bandwidth to
`each disk drive. The minimum bandwidth is equal to 1/N of
`total 1/0 bandwidth of the packet-switched bus, where N is
`the number of disk drive controllers (and disk drives) in the 10
`array.
`Because this minimum 1/0 bandwidth is greater than or
`equal to the sustained transfer rate of each disk drive, all N
`drives can operate concurrently at the sustained transfer rate
`indefinitely without the formation of a bottleneck. When the
`packet-switched bus is not being used by all of the disk drive
`controllers (i.e., one or more disk drive controllers has no
`packets to transmit), the arbitration protocol allows other
`disk drive controllers to use more than the guaranteed
`minimum 1/0 bandwidth. This additional 1/0 bandwidth
`may be used, for example, to transfer 1/0 data at rate higher
`than the sustained transfer rate when the requested 1/0 data
`resides in the disk drive's cache.
`The disk drive controllers process their respective
`sequences of controller commands asynchronously to one 25
`another; thus, the disk drive controllers that are invoked by
`a given 1/0 request can finish processing the 1/0 request in
`any order. When a given disk drive controller finishes
`processing an 1/0 request, the controller sends a special
`completion packet to the coprocessor. This completion 30
`packet contains the completion value that was assigned to
`the disk drive controller, and contains an identifier (token) of
`the 1/0 request.
`Upon receiving the completion packet, the coprocessor
`cumulatively applies the logic function to the completion
`value and all other completion values (if any) that have been
`received for the same 1/0 request, and compares the result
`to the final completion value. If a match occurs, indicating
`that all disk drives invoked by the 1/0 request have finished
`processing the 1/0 request, the coprocessor uses the token to
`inform the host computer and the microcontroller of the
`identity of the completed 1/0 request. Thus, the microcon(cid:173)
`troller monitors the completion status of pending 1/0
`requests without directly monitoring the flow of 1/0 data.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`
`There and other features of the architecture will now be
`described in further detail with reference to the drawings of
`the preferred embodiment, in which:
`FIG. 1 illustrates a prior art disk array architecture.
`FIG. 2 illustrates a disk array system in accordance with
`a preferred embodiment of the present invention.
`FIG. 3 illustrates the general flow of information between
`the primary components of the FIG. 2 system.
`FIG. 4 illustrates the types of information included within
`the controller commands.
`FIG. 5 illustrates a format used for the transmission of
`packets.
`FIG. 6 illustrates the architecture of the system in further
`detail.
`FIG. 7 is a flow diagram which illustrates a round robin
`arbitration protocol which is used to control access to the
`packet-switched bus of FIG. 2.
`FIG. 8 illustrates the completion logic circuit of FIG. 6 in
`further detail.
`
`6,098,114
`
`4
`FIG. 9 illustrates the transfer/command control circuit of
`FIG. 6 in further detail.
`FIG. 10 illustrates the operation of the command engine
`of FIG. 9.
`
`DETAILED DESCRIPTION OF THE
`PREFERRED EMBODIMENT
`I. Existing RAID Architectures
`To illustrate several of the motivations behind the present
`invention, a prevalent prior art architecture used within
`existing PC-based RAID systems will initially be described
`with reference to FIG. 1. As depicted in FIG. 1, the archi(cid:173)
`tecture includes an array controller card 30 ("array
`controller") that couples an array of SCSI (Small Computer
`15 Systems Interface) disk drives 32 to a host computer (PC)
`34. The array controller 30 plugs into a PCI (Peripheral
`Component Interconnect) expansion slot of the host com(cid:173)
`puter 34, and communicates with a host processor 38 and a
`system memory 40 via a host PCI bus 42. For purposes of
`20 this description and the description of the preferred
`embodiment, it may be assumed that the host processor 38
`is an Intel Pentium™ or other X86-compatible
`microprocessor, and that the host computer 34 is operating
`under either the Windows™ 95 or the Windows™ NT
`operating system.
`The array controller 30 includes a PCI-to-PCI bridge 44
`which couples the host PCI bus 42 to a local PCI bus 46 of
`the controller 30, and which acts as a bus master with respect
`to both busses 42, 46. Two or more SCSI controllers 50
`(three shown in FIG. 1) are connected to the local PCI bus
`46. Each SCSI controller 50 controls the operation of two or
`more SCSI disk drives 32 via a respective shared cable 52.
`The array controller 30 also includes a microcontroller 56
`and a buffer 58, both of which are coupled to the local PCI
`35 bus by appropriate bridge devices (not shown). The buffer
`58 will typically include appropriate exclusive-OR (XOR)
`logic 60 for performing the XOR operations associated with
`RAID storage protocols.
`In operation, the host processor 38 (running under the
`40 control of a device driver) sends input/output (1/0) requests
`to the microcontroller 56 via the host PCI bus 42, the
`PCI-to-PCI bridge 44, and the local PCI bus 46. Each 1/0
`request typically consists of a command descriptor block
`(CDB) and a scatter-gather list. The CDB is a SCSI drive
`45 command that specifies such parameters as the disk opera(cid:173)
`tion to be performed ( e.g., read or write), a disk drive logical
`block address, and a transfer length. The scatter-gather list is
`an address list of one of more contiguous blocks of system
`memory for performing the 1/0 operation.
`The microcontroller 56 runs a firmware program which
`translates these 1/0 requests into component, disk-specific
`SCSI commands based on a particular RAID configuration
`(such as RAID 4 or RAID 5), and dispatches these com(cid:173)
`mands to corresponding SCSI controllers 50. For example,
`55 if, based on the particular RAID configuration implemented
`by the system, a given 1/0 request requires data to be read
`from every SCSI drive 32 of the array, the microcontroller
`56 sends SCSI commands to each of the SCSI controllers 50.
`The SCSI controllers in-turn arbitrate for control of the local
`60 PCI bus 46 to transfer 1/0 data between the SCSI disks 32
`and system memory 40. 1/0 data that is being transferred
`from system memory 40 to the disk drives 32 is initially
`stored in the buffer 58. The buffer 58 is also typically used
`to perform XOR operations, rebuild operations (in response
`65 to disk failures), and other operations associated with the
`particular RAID configuration. The microcontroller 56 also
`monitors the processing of the dispatched SCSI commands,
`
`50
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1011, p. 13
`
`

`

`6,098,114
`
`5
`and interrupts the host processor 38 to notify the device
`driver of completed transfer operations.
`The FIG. 1 architecture suffers from several deficiencies
`that are addressed by the present invention. One such
`deficiency is that the SCSI drives 32 are expensive in
`comparison to ATA (AT Attachment) drives. While it is
`possible to replace the SCSI drives with less expensive ATA
`drives (see, for example, U.S. Pat. No. 5,506,977), the use
`of ATA drives would generally result in a decrease in
`performance. One reason for the decreased performance is
`that ATA drives do not buffer multiple disk commands; thus
`eachATAdrive would normally remain inactive while a new
`command is being retrieved from the microcontroller 56.
`One goal of the present invention is thus to provide an
`architecture in which ATA and other low-cost drives can be
`used while maintaining a high level of performance.
`Another problem with the FIG. 1 architecture is that the
`local PCI bus and the shared cables 52 are susceptible to
`being dominated by a single disk drive 32. Such dominance
`can result in increased transactional latency, and a corre(cid:173)
`sponding degradation in performance. A related problem is
`that the local PCI bus 46 is used both for the transfer of
`commands and the transfer of 1/0 data; increased command
`traffic on the bus 46 can therefore adversely affect the
`throughput and latency of data traffic. As described below,
`the architecture of the preferred embodiment overcomes
`these and other problems by using separate control and data
`busses, and by using a round-robin arbitration protocol to
`grant the local data bus to individual drives.
`Another problem with the prior art architecture is that
`because the microcontroller 56 has to monitor the compo(cid:173)
`nent 1/0 transfers that are performed as part of each 1/0
`request, a high-performance microcontroller generally must
`be used. As described below, the architecture of the preferred
`embodiment avoids this problem by shifting the completion
`monitoring task to a separate, non-program-controlled
`device that handles the task of routing 1/0 data, and by
`embedding special completion data values within the 1/0
`data stream to enable such monitoring. This effectively
`removes the microcontroller from the 1/0 data path,
`enabling the use of a lower cost, lower performance micro(cid:173)
`controller.
`Another problem, in at least some RAID
`implementations, is that the microcontroller 56 interrupts the
`host processor 38 multiple times during the processing of a
`single 1/0 request. For example, it is common for the
`microcontroller 56 to interrupt the host processor 38 at least
`once for each contiguous block of system memory refer(cid:173)
`enced by the scatter-gather list. Because there is significant
`overhead associated with the processing of an interrupt, the 50
`processing of the interrupts significantly detracts from the
`processor bandwidth that is available for handling other
`types of tasks. It is therefore an object of the present
`invention to provide an architecture in which the array
`controller interrupts the host processor no more than once
`per 1/0 request.
`A related problem, in many RAID architectures, is that
`when the array controller 30 generates an interrupt request
`to the host processor 38, the array controller suspends
`operation, or at least postpones generating the following
`interrupt request, until after the pending interrupt request has
`been serviced. This creates a potential bottleneck in the flow
`of 1/0 data, and increases the number of interrupt requests
`that need to be serviced by the host processor 56. It is
`therefore an object of the invention to provide an architec- 65
`ture in which the array controller continues to process
`subsequent 1/0 requests while an interrupt request is
`
`6
`pending, so that the device driver can process multiple
`completed 1/0 requests when the host processor eventually
`services an interrupt request.
`The present invention provides a high performance disk
`5 array architecture which addresses these and other problems
`with prior art RAID systems. An important aspect of the
`invention is that the primary performance benefits provided
`by the architecture are not tied to a particular type of disk
`drive interface. Thus, the architecture can be implemented
`10 using ATA drives ( as in the preferred embodiment described
`below) and other types of relatively low-cost drives while
`providing a high level of performance.
`II. System Overview
`A disk array system which embodies the various features
`15 of the present invention will now be described with refer(cid:173)
`ence to the remaining drawings. Throughout this
`description, reference will be made to various
`implementation-specific details, including, for example, part
`numbers, industry standards, timing parameters, message
`20 formats, and widths of data paths. These details are provided
`in order to fully set forth a preferred embodiment of the
`invention, and not to limit the scope of the invention. The
`scope of the invention is set forth in the appended claims.
`As depicted in FIG. 2, the disk array system comprises an
`25 array controller card 70 ("array controller") that plugs into
`a PCI slot of the host computer 34. The array controller 70
`links the host computer to an array of ATA disk drives 72
`(numbered 1-N in FIG. 2), with each drive connected to the
`array controller by a respective ATA cable 76. In one
`30 implementation, the array controller 70 includes eight ATA
`ports to permit the connection of up to eight ATA drives. The
`use of a separate port per drive 72 enables the drives to be
`tightly controlled by the array controller 70, as is desirable
`for achieving a high level of performance. In the preferred
`35 embodiment, the array controller 70 supports both the ATA
`mode 4 standard (also known as Enhanced IDE) and the
`Ultra ATA standard (also known as Ultra DMA), permitting
`the use of both types of drives.
`As described below, the ability to use less expensive ATA
`40 drives, while maintaining a high level of performance, is an
`important feature of the invention. It will be recognized,
`however, that many of the architectural features of the
`invention can be used

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket