throbber
PCT
`INTERNATIONAL APPLICATION PUBLISHED UNDER IBE PATENT COOPERATION TREATY (PCT)
`
`WORLD INTELLECTUAL PROPERTY ORGANIZATION
`International Bureau
`
`(51) International Patent Classification 6 :
`
`G06F 13/14, 13/00, 13/10, 13/12
`
`(11) International Publication Number:
`
`WO 99/26150
`
`Al
`
`(43) International Publication Date:
`
`27 May 1999 (27.05.99)
`
`(21) International Application Number:
`
`PCT/US98/21203
`
`(22) International Filing Date:
`
`8 October 1998 (08.10.98)
`
`(30) Priority Data:
`60/065,848
`09/034,247
`09/034,248
`09/034,812
`
`14 November 1997 (14.11.97) US
`US
`4 March 1998 (04.03.98)
`US
`4 March 1998 (04.03.98)
`US
`4 March 1998 (04.03.98)
`
`(81) Designated States: AL, AM, AT, AU, AZ, BA, BB, BG, BR,
`BY, CA, CH, CN, CU, CZ, DE, DK, EE, ES, FI, GB, GE,
`GH, GM, HR, HU, ID, IL, IS, JP, KE, KG, KP, KR, KZ,
`LC, LK, LR, LS, LT, LU, LV, MD, MG, MK, MN, MW,
`MX, NO, NZ, PL, PT, RO, RU, SD, SE, SG, SI, SK, SL, TJ,
`TM, TR, TT, UA, UG, UZ, VN, YU, ZW, ARIPO patent
`(GH, GM, KE, LS, MW, SD, SZ, UG, ZW), Eurasian patent
`(AM, AZ, BY, KG, KZ, MD, RU, TJ, TM), European patent
`(AT, BE, CH, CY, DE, DK, ES, Fl, FR, GB, GR, IE, IT,
`LU, MC, NL, PT, SE), OAP! patent (BF, BJ, CF, CG, CI,
`CM, GA, GN, GW, ML, MR, NE, SN, TD, TO).
`
`(71) Applicant: 3WARE, INC. [US/US]; 420 Waverly Street, Palo
`Alto, CA 94301 (US).
`
`(72) Inventors: MCDONALD, James, A.; 940 Colonial Lane,
`Palo Alto, CA 94301 (US). HERZ, John, Peter; 36 Pine
`Lane, Los Altos, CA 94022 (US). ALTMAN, Mitchell,
`A.; 572 Hill Street, #Penthouse, San Francisco, CA 94114
`(US). SMITH, William, Edward, III; 23797 Thurston Court,
`Hayward, CA 94568 (US).
`
`(74) Agent: SIMPSON, Andrew, H.; Knobbe, Martens, Olson and
`Bear, LLP, 16th floor, 620 Newport Center Drive, Neuport
`Beach, CA 92660 (US).
`
`Published
`With international search report.
`Before the expiration of the time limit for amending the
`claims and to be republished in the event of the receipt of
`amendments.
`
`(54) Title: HIGH-PERFORMANCE ARCHITECTURE FOR DISK ARRAY CONTROLLER
`
`(57) Abstract
`
`ATA
`DRI\IE 1
`
`72
`
`ATA
`DRI\IE 2
`
`72
`• • •
`
`ATA
`DRIVE N
`
`7.1
`
`._ __ w
`
`'T.~~-~~l,;nrli'L-&..-----1
`.__,_;;.'ffl..;.;;....;.;.~-~"""'----~
`
`' mmm•••~cm:i~~ml
`
`,--- ------- ----------- ______ L~
`ARRAY CONTROLLER 1
`76
`76
`,,,,
`76
`1
`e,; ,......_,..__.. __ (PCI CARO)
`I ,......_.___,_...,""' ,,____......_...,'u.
`:
`I
`AUTOMA TEO
`I
`CONTROI..LER
`I
`I
`N
`I
`t
`I'-,-:~--
`I
`I
`I
`I
`1
`l
`I
`1
`I
`I
`I
`I
`1
`:
`
`I
`I
`I
`I
`:
`1
`
`~
`
`HOST PC
`
`I
`I
`I
`I - - - - - " ' -~
`I
`I - - - - -
`- - . . - - - -
`I
`I
`I
`I
`
`H
`
`S
`
`_ _.__,~~
`
`.------~
`' - - - - •
`
`1
`I
`I
`I
`I
`I
`I
`I
`I
`t
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1031, p. i
`
`A high-performance RAID system for a PC
`comprises a controller card (70) which controls an
`array of AT A disk drives (72). The controller card
`(70) includes an array of automated disk drive con­
`trollers (84), each of which controls one respective
`disk drive (72). The disk drive controllers (84) are
`connected to a microcontroller (82) by a control
`bus (86) and are connected to an automated copro­
`cessor (80) by a packet-switched bus (90). The
`coprocessor (80) accesses system memory ( 40) and
`a local buffer (94). In operation, the disk drive con­
`trollers (84) respond to controller commands from
`the microcontroller (82) by accessing their respec­
`tive disk drives (72), and by sending packets to
`the coprocessor (80) over the packet-switched bus
`(90). The packets carry 1/0 data (in both directions,
`with the coprocessor filling-in packet payloads on
`I/0 writes), and carry transfer commands and tar­
`get addresses that are used by the coprocessor (80)
`to access the buffer (94) and system memory (40).
`The packets also carry special completion values
`(generated by the microcontroller) and I/0 request
`identifiers that are processed by a logic circuit (144)
`of the coprocessor (80) to detect the completion
`of processing of each I/0 request. The coproces­
`sor (80) grants the packet-switched bus (90) to the
`desk drive controllers (84) using a round robin ar­
`bitration protocol which guarantees a minimum 1/0
`bandwidth to each disk drive (72). This minimum
`I/0 bandwidth is preferably greater than the sus­
`tained transfer rate of each disk drive (72), so that all drives of the array can operate at the sustained transfer rate without the formation of
`a bottleneck.
`
`H
`,_.....__
`BUFFER
`...___
`
`.>--:1&..-.a..-.
`
`~-IM~I
`
`COPROCESSOR
`
`$/J
`
`RAM
`
`L------------ ---------------------l
`r--------~-------------------------v-.U
`
`

`

`Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT.
`
`FOR THE PURPOSES OF INFORMATION ONLY
`
`Albania
`Annenia
`Austria
`Australia
`Azerbaijan
`Bosnia and Herzegovina
`Barbados
`Belgium
`Burkina Faso
`Bulgaria
`Benin
`Brazil
`Belarus
`Canada
`Central African Republic
`Congo
`Switzerland
`C<lte d'Ivoire
`Cameroon
`China
`Cuba
`Czech Republic
`Gennany
`Denmark
`Estonia
`
`AL
`AM
`
`AT
`AU
`AZ
`BA
`BB
`BE
`BF
`BG
`BJ
`BR
`BY
`CA
`CF
`
`CG
`CH
`CI
`CM
`CN
`
`cu
`CZ
`DE
`
`DK
`EE
`
`ES
`FI
`FR
`
`GA
`GB
`GE
`GH
`GN
`GR
`HU
`IE
`
`IL
`
`IS
`IT
`JP
`KE
`KG
`KP
`
`KR
`
`KZ
`LC
`LI
`
`LK
`LR
`
`Spain
`Finland
`France
`Gabon
`United Kingdom
`Georgia
`Ghana
`Guinea
`Greece
`Hungary
`Ireland
`Israel
`Iceland
`Italy
`Japan
`Kenya
`Kyrgyzstan
`Democratic People's
`Republic of Korea
`Republic of Korea
`Kazakstan
`Saint Lucia
`Liechtenstein
`Sri Lanka
`Liberia
`
`Lesotho
`Lithuania
`Luxembourg
`Latvia
`Monaco
`Republic of Moldova
`Madagascar
`The fonner Yugoslav
`Republic of Macedonia
`Mali
`Mongolia
`Mauritania
`Malawi
`Mexico
`Niger
`Netherlands
`Norway
`New Zealand
`Poland
`Portugal
`Romania
`Russian Federation
`Sudan
`Sweden
`Singapore
`
`LS
`LT
`LU
`
`LV
`MC
`MD
`MG
`MK
`
`ML
`MN
`MR
`
`MW
`MX
`NE
`NL
`NO
`NZ
`PL
`PT
`
`RO
`RU
`SD
`
`SE
`SG
`
`Slovenia
`Slovakia
`Senegal
`Swaziland
`Chad
`Togo
`Tajikistan
`Turkmenistan
`Turkey
`Trinidad and Tobago
`Ukraine
`Uganda
`United States of America
`Uzbekistan
`Viet Nam
`Yugoslavia
`Zimbabwe
`
`SI
`SK
`SN
`
`sz
`TD
`TG
`TJ
`TM
`TR
`TT
`UA
`UG
`
`us
`
`uz
`VN
`YU
`
`zw
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1031, p. ii
`
`

`

`WO 99/26150
`
`-1-
`
`PCT /US98/21203
`
`HIGH-PERFORMANCE ARCHITECTURE
`FOR DISK ARRAY CONTROLLER
`FIELD OF THE INVENTION
`The present invention relates to disk arrays, and more particularly, relates to hardware and software
`architectures for hardware-implemented RAID (Redundant Array of Inexpensive Disks) and other disk array systems.
`BACKGROUND OF THE INVENTION
`A RAID system is a computer data storage system in which data is spread or "striped" across multiple disk
`In many implementations, the data is stored in conjunction with parity information such that any data lost
`drives.
`as the result of a single disk drive failure can be automatically reconstructed.
`One simple type of RAID implementation is known as "software RAID." With software RAID, software
`(typically part of the operating system) which runs on the host computer is used to implement the various RAID
`control functions. These control functions include, for example, generating drive-specific read/write requests according
`to a striping algorithm, reconstructing lost data when drive failures occur, and generating and checking parity.
`Because these tasks occupy CPU bandwidth, and because the transfer of parity information occupies bandwidth on
`the system bus, software RAID frequently produces a degradation in performance over single disk drive systems.
`Where performance is a concern, a "hardware-implemented RAID" system may be used. With hardware(cid:173)
`implemented RAID, the RAID control functions are handled by a dedicated array controller (typically a card) which
`presents the array to the host computer as a single, composite disk drive. Because little or no host CPU bandwidth
`is used to perform the RAID control functions, and because no RAID parity traffic flows across the system bus, little
`or no degradation in performance occurs.
`One potential benefit of RAID systems is that the input/output ("l/0") data can be transferred to and from
`multiple disk drives in parallel. By exploiting this parallelism (particularly within a hardware-implemented RAID
`system), it is possible to achieve a higher degree of performance than is possible with a single disk drive. The two
`basic types of performance that can potentially be increased are the number of 1/0 requests processed per second
`("transactional performance") and the number of megabytes of 1/0 data transferred per second ("streaming
`performance").
`Unfortunately, few hardware-implemented RAID systems provide an appreciable increase in performance.
`In many cases, this failure to provide a performance improvement is the result of limitations in the array controller's
`bus architecture. Performance can also be adversely affected by frequent interrupts of the host computer's
`processor.
`In addition, attempts to increase performance have often relied on the use of expensive hardware
`components. For example, some RAID array controllers rely on the use of a relatively expensive microcontroller that
`can process 1/0 data at a high transfer rate. Other designs rely on complex disk drive interfaces, and thus require
`the use of expensive disk drives.
`The present invention addresses these and other limitations in existing RAID architectures.
`
`5
`
`1 O
`
`15
`
`20
`
`25
`
`30
`
`35
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1031, p. 1
`
`

`

`WO 99/26150
`
`-2-
`
`PCT/US98/21203
`
`SUMMARY OF THE INVENTION
`The present invention provides a high-performance architecture for a hardware-implemented RAID or other
`disk array system. An important benefit of the architecture is that it provides a high degree of performance (both
`transactional and streaming) without the need for disk drives that are based on expensive or complex disk drive
`interfaces.
`In a preferred embodiment, the architecture is embodied within a PC-based disk array system which
`comprises an array controller card which controls an array of ATA disk drives. The controller card includes an array
`of automated ATA disk drive controllers, each of which controls a single, respective ATA drive.
`The controller card also includes an automated coprocessor which is connected to each disk drive controller
`by a packet-switched bus, and which connects as a busmaster to the host PC bus. The coprocessor is also
`connected to a local 1/0 data buffer of the card. As described below, a primary function of the coprocessor is to
`transfer 1/0 data between the disk drive controllers, the system memory, and the buffer in response to commands
`received from the disk drive controllers. Another function of the coprocesor is to control all accesses by the disk
`drive controllers to the packet-switched bus, to thereby control the flow of 1/0 data.
`The controller card further includes a microcontroller which connects to the disk drive controllers and to
`the coprocessor by a local control bus. The microcontroller runs a control program which implements a RAID storage
`configuration. Because the microcontroller does not process or directly monitor the flow of 1/0 data (as described
`below), a low-cost, low-performance microcontroller can advantageously be used.
`In operation, the controller card processes multiple 1/0 requests in at-a-time, and can process multiple 1/0
`requests without interrupting the host computer. As l/0 requests are received from the host computer, the
`microcontroller generates drive-specific sequences of controller commands (based on the particular RAID configuration),
`In addition t?
`and dispatches these controller commands over the local control bus to the disk drive controllers.
`containing disk drive commands, these controller commands include transfer commands and target addresses that
`are (subsequently) used by the coprocessor to transfer 1/0 data to and from system memory and the local buffer.
`Some of the controller commands also include disk completion values and tokens (1/0 request identifiers)
`that are used by the coprocessor to monitor the completion status of pending 1/0 requests. The disk completion
`values are generated by the microcontroller such that the application of a specific logic function to all of the disk
`completion values for a given 1/0 request produces a final completion value that is known a priori to the coprocessor.
`As described below, this enables the coprocessor to detect the completion of processing of an 1/0 request without
`prior knowledge of the details (number of invoked disk drives, etc.) of the 1/0 request.
`In response to the controller commands, the disk drive controllers access their respective disk drives and
`send packets to the coprocessor over the packet-switched bus. These packets carry 1/0 data (in both directions,
`with the coprocessor filling-in packet payloads on 1/0 writes), and carry transfer commands and target addresses that
`are used by the coprocessor to access the buffer and system memory. During this process, the coprocessor grants
`the packet-switched bus to the disk drive controllers (for the transmission of a single packet) using a round robin
`arbitration protocol which guarantees a minimum 1/0 bandwidth to each disk drive. The minimum bandwidth is equal
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1031, p. 2
`
`

`

`WO 99/26150
`
`.3.
`
`PCT/US98/21203
`
`to 1/N of total 1/0 bandwidth of the packet-switched bus, where N is the number of disk drive controllers (and disk
`drives) in the array.
`Because this minimum 1/0 bandwidth is greater than or equal to the sustained transfer rate of each disk
`drive, all N drives can operate concurrently at the sustained transfer rate indefinitely without the formation of a
`bottleneck. When the packet-switched bus is not being used by all of the disk drive controllers (i.e., one or more
`disk drive controllers has no packets to transmit), the arbitration protocol allows other disk drive controllers to use
`more than the guaranteed minimum 1/0 bandwidth. This additional 1/0 bandwidth may be used, for example, to
`transfer 1/0 data at rate higher than the sustained transfer rate when the requested 1/0 data resides in the disk
`drive's cache.
`The disk drive controllers process their respective sequences of controller commands asynchronously to one
`another; thus, the disk drive controllers that are invoked by a given 1/0 request can finish processing the 1/0 request
`in any order. When a given disk drive controller finishes processing an 1/0 request, the controller sends a special
`completion packet to the coprocessor. This completion packet contains the completion value that was assigned to
`the disk drive controller, and contains an identifier (token) of the 1/0 request.
`Upon receiving the completion packet, the coprocessor cumulatively applies the logic function to the
`completion value and all other completion values (if any) that have been received for the same 1/0 request, and
`compares the result to the final completion value. If a match occurs, indicating that all disk drives invoked by the
`1/0 request have finished processing the 1/0 request, the coprocessor uses the token to inform the host computer
`and the microcontroller of the identity of the completed 1/0 request. Thus, the microcontroller monitors the
`completion status of pending 1/0 requests without directly monitoring the flow of 1/0 data.
`BRIEF DESCRIPTION OF THE DRAWINGS
`There and other features of the architecture will now be described in further detail with reference to the
`drawings of the preferred embodiment, in which:
`Figure 1 illustrates a prior art disk array architecture.
`Figure 2 illustrates a disk array system in accordance with a preferred embodiment of the present invention.
`Figure 3 illustrates the general flow of information between the primary components of the Figure 2 system.
`Figure 4 illustrates the types of information included within the controller commands.
`Figure 5 illustrates a format used for the transmission of packets.
`Figure 6 illustrates the architecture of the system in further detail.
`Figure 7 is a flow diagram which illustrates a round robin arbitration protocol which is used to control
`access to the packet-switched bus of Figure 2.
`Figure 8 illustrates the completion logic circuit of Figure 6 in further detail.
`Figure 9 illustrates the transfer/command control circuit of Figure 6 in further detail.
`Figure 10 illustrates the operation of the command engine of Figure 9.
`
`DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1031, p. 3
`
`

`

`WO 99/26150
`
`-4-
`
`PCT/US98/21203
`
`I.
`
`Existing RAIO Architectures
`To illustrate several of the motivations behind the present invention, a prevalent prior art architecture used
`within existing PC-based RAIO systems will initially be described with reference to Figure 1. As depicted in Figure
`1, the architecture includes an array controller card 30 ("array controller") that couples an array of SCSI (Small
`Computer Systems Interface) disk drives 32 to a host computer (PC) 34. The array controller 30 plugs into a PCI
`(Peripheral Component Interconnect) expansion slot of the host computer 34, and communicates with a host processor
`38 and a system memory 40 via a host PCI bus 42. For purposes of this description and the description of the
`preferred embodiment, it may be assumed that the host processor 38 is an Intel Pentium™ or other X86-compatible
`microprocessor, and that the host computer 34 is operating under either the Windows™ 95 or the Windows™ NT
`operating system.
`The array controller 30 includes a PCl-to-PCI bridge 44 which couples the host PCI bus 42 to a local PCI
`bus 46 of the controller 30, and which acts as a bus master with respect to both busses 42, 46. Two or more
`SCSI controllers 50 (three shown in Figure 1 l are connected to the local PCI bus 46. Each SCSI controller 50
`controls the operation of two or more SCSI disk drives 32 via a respective shared cable 52. The array controller
`30 also includes a microcontroller 56 and a buffer 58, both of which are coupled to the local PCI bus by appropriate
`bridge devices (not shown). The buffer 58 will typically include appropriate exclusive-OR (XOR) logic 60 for
`performing the XOR operations associated with RAIO storage protocols.
`In operation, the host processor 38 (running under the control of a device driver) sends input/output (1/0)
`requests to the microcontroller 56 via the host PCI bus 42, the PCl-to-PCI bridge 44, and the local PCI bus 46. Each
`1/0 request typically consists of a command descriptor block (CDB) and a scatter-gather list. The CDB is a SCSI
`drive command that specifies such parameters as the disk operation to be performed (e.g., read or write), a disk drive
`logical block address, and a transfer length. The scatter-gather list is an address list of one of more contiguous
`blocks of system memory for performing the 1/0 operation.
`The microcontroller 56 runs a firmware program which translates these 1/0 requests into component, disk•
`specific SCSI commands based on a particular RAIO configuration {such as RAIO 4 or RAIO 5), and dispatches these
`commands to corresponding SCSI controllers 50. For example, if, based on the particular RAIO configuration
`implemented by the system, a given 1/0 request requires data to be read from every SCSI drive 32 of the array, the
`microcontroller 56 sends SCSI commands to each of the SCSI controllers 50. The SCSI controllers in-turn arbitrate
`for control of the local PCI bus 46 to transfer 1/0 data between the SCSI disks 32 and system memory 40. 1/0
`data that is being transferred from system memory 40 to the disk drives 32 is initially stored in the buffer 58. The
`buffer 58 is also typically used to perform XOR operations, rebuild operations (in response to disk failures), and other
`operations associated with the particular RAID configuration. The microcontroller 56 also monitors the processing
`of the dispatched SCSI commands, and interrupts the host processor 38 to notify the device driver of completed
`transfer operations.
`The Figure 1 architecture suffers from several deficiencies that are addressed by the present invention.
`One such deficiency is that the SCSI drives 32 are expensive in comparison to ATA (AT Attachment) drives. While
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1031, p. 4
`
`

`

`WO 99/26150
`
`PCT/0S98/21203
`
`it is possible to replace the SCSI drives with less expensive A TA drives (see, for example, U.S. Pat. No. 5,506,977),
`the use of ATA drives would generally result in a decrease in performance. One reason for the decreased
`performance is that ATA drives do not buffer multiple disk commands; thus each ATA drive would normally remain
`inactive while a new command is being retrieved from the microcontroller 56. One goal of the present invention is
`thus to provide an architecture in which ATA and other low-cost drives can be used while maintaining a high level
`of performance.
`Another problem with the Figure 1 architecture is that the local PCI bus and the shared cables 52 are
`susceptible to being dominated by a single disk drive 32. Such dominance can result in increased transactional
`latency, and a corresponding degradation in performance. A related problem is that the local PCI bus 46 is used
`both for the transfer of commands and the transfer of 1/0 data; increased command traffic on the bus 46 .can
`therefore adversely affect the throughput and latency of data traffic. As described below, the architecture of the
`preferred embodiment overcomes these and other problems by using separate control and data busses, and by using
`a round-robin arbitration protocol to grant the local data bus to individual drives.
`Another problem with the prior art architecture is that because the microcontroller 56 has to monitor the
`component 1/0 transfers that are performed as part of each 1/0 request, a high-performance microcontroller generally
`must be used. As described below, the architecture of the preferred embodiment avoids this problem by shifting the
`completion monitoring task to a separate, non-program-controlled device that handles the task of routing 1/0 data,
`and by embedding special completion data values within the 1/0 data stream to enable such monitoring. This
`effectively removes the microcontroller from the 1/0 data path, enabling the use of a lower cost, lower performance
`microcontroller.
`Another problem, in at least some RAIO implementations, is that the microcontroller 56 interrupts the host
`processor 38 multiple times during the processing of a single 1/0 request. For example, it is common for the
`microcontroller 56 to interrupt the host processor 38 at least once for each contiguous block of system memory
`referenced by the scatter-gather list. Because there is significant overhead associated with the processing of an
`interrupt, the processing of the interrupts significantly detracts from the processor bandwidth that is available for
`handling other types of tasks. It is therefore an object of the present invention to provide an architecture in which
`the array controller interrupts the host processor no more than once per 1/0 request.
`A related problem, in many RAID architectures, is that when the array controller 30 generates an interrupt
`request to the host processor 38, the array controller suspends operation, or at least postpones generating the
`following interrupt request, until after the pending interrupt request has been serviced. This creates a potential
`bottleneck in the flow of 1/0 data, and increases the number of interrupt requests that need to be serviced by the
`host processor 56. It is therefore an object of the invention to provide an architecture in which the array controller
`continues to process subsequent 1/0 requests while an interrupt request is pending, so that the device driver can
`process multiple completed 1/0 requests when the host processor eventually services an interrupt request.
`The present invention provides a high performance disk array architecture which addresses these and other
`problems with prior art RAID systems. An important aspect of the invention is that the primary performance benefits
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1031, p. 5
`
`

`

`WO 99/26150
`
`-6-
`
`PCT/US98/21203
`
`provided by the architecture are not tied to a particular type of disk drive interface. Thus. the architecture can be
`implemented using ATA drives (as in the preferred embodiment described below) and other types of relatively low-cost
`drives while providing a high level of performance.
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`II.
`
`System Overview
`A disk array system which embodies the various features of the present invention will now be described
`with reference to the remaining drawings. Throughout this description, reference will be made to various
`implementation-specific details, including, for example, part numbers, industry standards, timing parameters, message
`formats, and widths of data paths. These details are provided in order to fully set forth a preferred embodiment
`of the invention, and not to limit the scope of the invention. The scope of the invention is set forth in the appended
`claims.
`
`As depicted in Figure 2, the disk array system comprises an array controller card 70 ("array controller")
`that plugs into a PCI slot of the host computer 34. The array controller 70 links the host computer to an array of
`ATA disk drives 72 (numbered 1-N in Figure 2), with each drive connected to the array controller by a respective
`In one implementation, the array controller 70 includes eight A TA ports to permit the connection of
`ATA cable 76.
`up to eight ATA drives. The use of a separate port per drive 72 enables the drives to be tightly controlled by the
`array controller 70, as is desirable for achieving a high level of performance. In the preferred embodiment, the array
`controller 70 supports both the ATA mode 4 standard (also known as Enhanced IDE) and the Ultra ATA standard
`(also known as Ultra OMA), permitting the use of both types of drives.
`As described below, the ability to use less expensive ATA drives, while maintaining a high level of
`performance, is an important feature of the invention. It will be recognized, however, that many of the architectural
`features of the invention can be used to increase the performance of disk array systems that use other types of
`It will also be recognized that the disclosed array controller 70 can be adapted for
`drives, including SCSI drives.
`use with other types of disk drives (including CO-ROM and DVD drives) and mass storage devices (including FLASH
`and other solid state memory drives).
`In the preferred embodiment, the array of ATA drives 72 is operated as a RAID array using, for example,
`a RAIO 4 or a RAID 5 configuration. The array controller 70 can alternatively be configured through firmware to
`operate the drives using a non-RAID implementation, such as a JBOO (Just a Bunch of Disks) configuration.
`With further reference to Figure 2, the array controller 70 includes an automated array coprocessor 80, a
`microcontroller 82, and an array of automated controllers 84 (one per ATA drive 72), all of which are interconnected
`by a local control bus 86 that is used to transfer command and other control information. (As used herein, the term
`"automated" refers to a data processing unit which operates without fetching and executing sequences of macro(cid:173)
`instructions.) The automated controllers 84 are also connected to the array coprocessor 80 by a packet-switched
`bus 90. As further depicted in Figure 2, the array coprocessor 80 is locally connected to a buffer 94, and the
`microcontroller 82 is locally connected to a read-only memory (ROM) 96 and a random-access memory (RAM) 98.
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1031, p. 6
`
`

`

`PCT/US98/21203
`
`WO 99/26150
`
`.7.
`The packet-switched bus 90 handles all 1/0 data transfers between the automated controllers 84 and the
`array coprocessor 80. All transfers on the packet-switched bus 90 flow either to or from the array coprocessor 80,
`and all accesses to the packet-switched bus are controlled by the array coprocessor. These aspects of the bus
`architecture provide for a high degree of data flow performance without the complexity typically associated with PCI
`and other peer-to-peer type bus architectures.
`As described below, the packet-switched bus 90 uses a packet-based round robin protocol that guarantees
`that at least 1 /N of the bus's 1/0 bandwidth will be available to each drive during each round robin cycle (and thus
`throughout the course of each 1/0 transfer). Because this amount (1 /N) of bandwidth is equal to or exceeds the
`sustained data transfer rate of each ATA drive 72 (which is typically in the range of 10 Mbytes/sec.), all N drives
`can operate concurrently at the sustained data rate without the formation of a bottleneck. For example, in an 8·
`drive configuration, all 8 drives can continuously stream 10 Mbytes/second of data to their respective automated
`controllers 84, in which case the packet-switched bus 90 will transfer the 1/0 data to the array coprocessor at a
`rate of 80 Mbytes/second. When less than N drives are using the packet-switched bus 90, each drive is allocated
`more than 1 IN of the bus's bandwidth, allowing each drive to transfer data at a rate which exceeds the sustained
`data transfer rate (such as when the requested 1/0 data resides in the disk drive' s cache).
`In the preferred embodiment, the array coprocessor 80 is implemented using an FPGA, such as a Xilinx
`4000-series FPGA. An application-specific integrated circuit (ASIC) or other type of device may alternatively be used.
`The general functions performed by the array coprocessor 80 include the following: (i) forwarding 1/0 requests from
`the host processor 38 to the microcontroller 82, (ii) controlling arbitration on the packet-switched bus 90, (iii) routing
`1/0 data between the automated controllers 84, the system memory 40, and the buff er 94, (iv) performing exclusive•
`OR, read-modify-write, and other RAID-related logic operations involving 1/0 data using the buffer 94; and (v)
`monitoring and reporting the completion status of 110 requests. With respect to the PCI bus 42 of the host computer
`34, the array coprocessor 80 acts as a PCI initiator (a type of PCI bus master) which initiates memory read and
`write operations based on commands received from the automated controllers 84. The operation of the array
`coprocessor 80 is further described below.
`The buffer 94 is preferably either a 1 megabyte (MB) or 4 MB volatile, random access memory.
`Synchronous ORAM or synchronous SAAM may be used for this purpose. All data that is written from the host
`In addition, the array coprocessor 80 uses this
`computer 34 to the disk array is initially written to this buffer 94.
`buffer 94 for volume rebuilding (such as when a drive or a drive sector goes bad) and parity generation. Although
`the buffer 94 is external to the array coprocessor in the preferred embodiment, it may alternatively be integrated
`into the same chip.
`The microcontroller 82 used in the preferred embodiment is a Siemens 163. The microcontroller 82 is
`controlled by a firmware control program (stored in the ROM 96) that implements a particular RAID or non-RAIO
`storage protocol. The primary function performed by the microcontroller is to translate 110 requests from the host
`computer 34 into sequences of disk-specific controller commands, and to dispatch these commands over the local
`control bus 86 to specific automated controllers 84 for processing. As described below, the architecture is such
`
`5
`
`1 O
`
`15
`
`20
`
`25
`
`30
`
`35
`
`Petitioners Microsoft Corporation and HP Inc. - Ex. 1031, p. 7
`
`

`

`WO 99/26150
`
`-8-
`
`PCT/US98/21203
`
`that the microcontroller 82 does not have to directly monitor the 1/0 transfers that result from the dispatched
`controller commands, as this task is allocated to the array coprocessor 80 (using an efficient completion token
`scheme which is described below). This aspect of the architecture enables a relatively low cost, low performance
`microcontroller to be used, and reduces the complexity of the control program.
`Although the microcontroller 82 is a separate device in the preferred embodiment, the microcontroller could
`alternatively be integrated into

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket