`High-Bandwidth Network File Service
`
`John H. Hartman
`Peter M. Chen
`Edward K. Lee
`Ann L. Chervenak Drapeau
`Ethan L. Miller
`Randy H. Katz
`Garth A. Gibson
`David A. Patterson
`
`Abstract
`
`RAID-II RAID the second is a scalable high-bandwidth network le server for heterogeneous
`computing environments characterized by a mixture of high-bandwidth scienti c, engineering
`and multi-media applications and low-latency high-transaction-rate UNIX applications. RAID-II
`is motivated by three observations: applications are becoming more bandwidth intensive, the
`IO bandwidth of workstations is decreasing with respect to MIPS, and recent technological
`developments in high-performance networks and secondary storage systems make it economical
`to build high-bandwidth network storage systems.
`Unlike most existing le servers that use a bus as a system backplane, RAID-II achieves
`scalability by treating the network as the system backplane. RAID-II is notable because it phys-
`ically separates les service, the management of le metadata, from storage service, the storage
`and transfer of le data; stripes les over multiple storage servers for improved performance and
`reliability; provides separate mechanisms for high-bandwidth and low-latency IO requests; im-
`plements a RAID level storage system; and runs LFS, the Log-Structured File System, which
`is speci cally designed to support high-bandwidth IO and RAID level storage systems.
`
`Key words: high-bandwidth network le service, network storage, mass storage system, RAID.
`
`
`
`Oracle Exhibit 1011, page 1
`
`
`
` Motivation
`
`RAID-II RAID the second is a scalable high-bandwidth network le server being developed at
`the University of California at Berkeley as part of a project to study high performance, large
`capacity and highly reliable storage systems. RAID-II is designed for the heterogeneous computing
`environments of the future consisting of diskless supercomputers, visualization workstations, multi-
`media platforms and UNIX workstations. The primary purpose of RAID-II is to serve as a vehicle
`for research of storage architectures and le systems for the future.
`RAID-II incorporates the fruits of two novel research projects. First, as the name of the proto-
`type implies, the storage system is based on RAID , the Redundant Arrays of Inexpensive Disk
`technology developed by the RAID group at Berkeley. RAID replaces large expensive disks with
`many small inexpensive disks to provide higher performance and reliability at a lower cost. Second,
`RAID-II will run LFS , the Log-Structured File System, developed by the Sprite operating
`system group at Berkeley. LFS is a le system specially optimized to support high-bandwidth IO
`and fast crash recovery. RAID and LFS are synergistic technologies that together obtain far greater
`performance than when either is used alone.
`The development of RAID-II is motivated by three key observations. First, we notice a trend
`toward bandwidth-intensive applications: multi-media, CAD, object-oriented data bases and scien-
`ti c visualization. Even in well established application areas such as scienti c computing, reductions
`in the cost of secondary storage and the introduction of faster supercomputers have caused a rapid
`growth in the size of datasets, requiring faster IO systems to transfer the increasing amounts of
`data.
`Second, experience with RAID-I , our rst prototype, shows that most of today’s workstation-
`based le servers are incapable of supporting high-bandwidth IO. Moreover, the future IO per-
`formance of server workstations is likely to degrade relative to the overall performance of their
`client workstations even if applications do not become more IO intensive. This is because today’s
`workstations achieve high performance by using large fast caches without signi cantly improving
`the performance of the primary memory and IO systems. Table is paraphrased from a paper by
`Ousterhout that discusses why operating system performance is not scaling with microprocessor
`performance. It illustrates the trend of decreasing memory system performance relative to MIPS
`in workstations.
`The third key observation is that recent technological developments in networks and secondary
`storage systems make it possible to build high-bandwidth supercomputer le servers at workstation
`prices. Until recently, anyone wishing to build a high-bandwidth supercomputer IO system had to
`invest millions of dollars in proprietary high-bandwidth network technology and expensive parallel-
`transfer disks. But with the standardization of high-performance interconnects and networks such
`as HIPPI and FDDI, and the commercialization of the RAID technology, high-bandwidth networks
`and secondary storage systems have suddenly become a ordable. What is lacking, and the point
`that RAID-II addresses, is a storage architecture that can exploit these developments.
`This paper describes RAID-II, a scalable high-bandwidth network le server for heterogeneous
`computing environments. We focus primarily on the hardware components of the system, although
`the software is also discussed where relevant. The paper starts with a brief survey of existing le
`server architectures. The survey is followed by the architecture and implementation of RAID-II.
`We end the paper with our current status, summary and conclusions.
`
`
`
`Oracle Exhibit 1011, page 2
`
`
`
`MicrovaxII
`sun-
`IBM RT-APC
`VAX
`Sun-
`SPARCstation-
`DECstation
`H-P - CHX
`DECstation
`MIPS M
`
`MBs MIPS MBsMIPS
` .
` .
` .
`.
` .
` .
`.
`.
`.
` .
`
`.
`.
`
` .
`.
`
` .
`.
`
` .
`.
`
` .
` .
`
` .
` .
`
` .
`
`Table : Memory Bandwidth vs. Mips. Throughput for a bcopy procedure when copying a block
`of data that is much larger than the CPU cache. Note the decreasing trend in MBsMIPS as
`MIPS increases.
`
` Existing File Server Architectures
`
`In this section, we examine several existing le server architectures. The material in this section
`will serve as a background for the discussion of various aspects of RAID-II in the sections to follow.
`First we examine RAID-I, a workstation-based le server we modi ed to provide high-bandwidth
` le service. Next we look at the Auspex NS le server which is highly successful in providing
`scalable high-performance NFS le service. Finally, we examine several mass storage systems MSS
`currently used by supercomputing centers for high-capacity shared storage.
`
`. RAID-I
`
`RAID-I RAID the rst was designed to test the ability of workstation-based le servers to
`provide access to the high bandwidth and high IO rates supported by disk arrays. The prototype
`was constructed using a Sun workstation with MB of memory, inch SCSI disks
`and four dual-string SCSI controllers.
`Experiments with RAID-I show that it is good at sustaining small random IO’s, performing
`approximately KB random IOs per second. However, RAID-I has proven woefully inadequate
`for high-bandwidth IO, sustaining at best . MBs to a user-level application on RAID-I. In
`comparison, a single disk on RAID-I can sustain . MBs.
`There are several reasons why RAID-I is ill-suited for high-bandwidth IO. The most serious
`is the memory contention experienced on the Sun server during IO operations. The copy
`operations performed in moving data between the kernel DMA bu ers and bu ers in user space
`saturate the memory system when IO bandwidth reaches . MBs. Second, because all IO on the
`Sun goes through the CPU’s virtually addressed cache, data transfers experience interference
`from cache ushes. Finally, high-bandwidth performance is limited by the low bandwidth of the
`Sun ’s VME system bus. Although nominally rated at MBs, the bus becomes saturated
`at MBs.
`The above problems are typical of many "CPU-centric" workstations that are designed for good
`processor performance but fail to support adequate IO bandwidth. In such systems, the memory
`
`
`
`Oracle Exhibit 1011, page 3
`
`
`
`system is designed so that the CPU has the fastest and highest-bandwidth path to memory. For
`busses or backplanes farther away from the CPU, the available bandwidth to memory drops quickly.
`This is in sharp contrast to mainframe computers where the memory system is designed speci cally
`to support high-bandwidth IO.
`To summarize, our experience with RAID-I indicates that the memory systems of workstations
`are in general poorly suited for supporting high-bandwidth IO. In the design of RAID-II, our
`second prototype, we have given this careful consideration.
`
`. Auspex NS
`
`The Auspex NS is designed for high-performance NFS le service. NFS is the most common
`network le system protocol for workstation-based computing environments. NFS is designed to
`e ciently support operations on small and medium sized les, but because NFS transfers les in
`small individual packets, it is ine cient for large les.
`In the NS , the network processing, le system management, and disk control are handled
`by separate dedicated processors. This functional multiprocessing, in contrast to symmetric mul-
`tiprocessing, makes synchronization between processes explicit and allows the performance of the
` le server to easily scale by adding processors, network attachments and disks. This is in contrast
`to a typical NFS le server that performs all functions on a single processor.
`In such systems,
`performance can be scaled to only a very limited degree by adding additional network attachments
`and disks because the processor will quickly become a bottleneck.
`Although the NS is good at supporting small low-latency NFS requests, it is unsuitable for
`high-bandwidth applications. The use of a single MBs VME bus to connect the networks, disks
`and memory signi cantly limits the aggregate IO bandwidth of the system. More importantly,
`NFS is very ine cient for large les because it always breaks up les into small packets which
`are sent individually over the network. This results in fragmentation of the available network
`bandwidth and forces the receiving system to handle a large number of interrupts.
`
`. Supercomputer Mass Storage Systems
`
`Almost all supercomputer mass storage systems use a mainframe as a high-performance le server.
`The mainframe maintains the le system metadata, and provides a high-bandwidth data path
`between its channel-based IO system and supercomputer clients via a high-performance channel
`or network interface. Because most supercomputer mass storage systems are designed primarily
`for capacity, very few support data transfer rates over MBs. For performance, supercomputer
`applications rely on directly attached parallel-transfer disks. Supercomputer mass storage systems
`also are not designed to service a large number of small le requests, and are rarely used as primary
`storage systems for large numbers of client workstations. The following brie y describes the MSS-II,
`NCAR and LSS mass storage systems.
`MSS-II , the NASA Ames mass storage system, uses an Amdahl as a le server. MSS-II
`achieves data transfer rates up to MBs by striping data over multiple disks and transferring
`data over multiple network channels.
`The mass storage system at NCAR , the National Center for Atmospheric Research, is im-
`plemented using Hyperchannel and an IBM mainframe running MVS. The NCAR mass storage
`system is unique in that it provides a direct data path between supercomputers and the IBM main-
`
`
`
`Oracle Exhibit 1011, page 4
`
`
`
`frame’s channel-based storage controllers. On a le access, data can bypass the mainframe and be
`transferred directly between the storage devices and the supercomputers.
`The Lawrence Livermore National Laboratory’s LINCS Storage System LSS , is closely
`modeled after the Mass Storage System MSS Reference Model. The MSS Reference Model
`identi es and de nes the function and software interfaces for six elements of a mass storage system:
`name server, bit le client, bit le server, storage server, physical volume repository and bit le mover.
`A client le access in LSS begins at the name server, which maps a human readable name
`to a bit le ID. The bit le server uses the bit le ID to perform authentication checks and main-
`tainlookup metadata related the le, such as the logical volume and bitstream o sets at which the
`bit le is stored on the storage server. The storage server maps the logical volume and bitstream
`o sets to physical device and block o sets and retrieves the requested data. If the data is stored
`on an unmounted storage media, the physical volume repository is instructed to mount the media.
`The bit le mover then transfers the data between the storage server and the client.
`A notable aspect of LSS is that control and data messages are always transmitted independently.
`This allows the control and data messages to take di erent paths through the system. For example,
`a control message requesting a write would be sent to the bit le server but the data itself would
`be sent directly to the storage server, bypassing the bit le server. This is e cient for large data
`transfers but the extra overhead of treating control and data independently is likely to degrade the
`performance of small NFS-type requests.
`
`. Summary
`
`We have examined the architecture of several existing le servers and discussed why they are unsuit-
`able for heterogeneous computing environments consisting of both high-bandwidth data transfers
`and high-transaction rate NFS-type requests. As RAID-I illustrates, the relatively low performance
`of workstation memory systems make them unsuitable for use as high-bandwidth le servers. Not
`surprisingly, specially designed le servers such as the Auspex NS , while supporting large
`numbers of NFS clients, provide only marginal performance for high-bandwidth clients.
`Although most supercomputer mass storage systems can transfer data at relatively high rates of
`up to MBs, this is still not good enough to support diskless supercomputers. Furthermore, they
`neglect the performance of small NFS-type requests in order to optimize the performance of high-
`bandwidth data transfers. Finally, even if the supercomputer mass storage systems were optimized
`to support NFS-type requests, it would be economically infeasible to use mainframes as le servers
`for workstations. In the following sections, we describe the architecture and implementation of
`RAID-II and how it economically supports both high-bandwidth data transfers and low-latency
`high-transaction-rate NFS-type requests.
`
` RAID-II Storage Architecture
`
`Architecturally, RAID-II resembles the previously described LSS mass storage system and roughly
`follows the structure of the Mass Storage System Reference Model. RAID-II, however, is distinctive
`in its attention to low-latency high-transaction-rate NFS-type requests characteristic of UNIX client
`workstation.
`It is a speci c design goal of RAID-II to service NFS-type requests as well as, if
`not better than, today’s workstation-based le servers. Thus, the design of RAID-II is driven
`by very di erent motivations than the design of supercomputer mass storage systems. We think
`
`
`
`Oracle Exhibit 1011, page 5
`
`
`
`of RAID-II as more of an evolution of workstation-based le servers than supercomputer mass
`storage systems. Even without supercomputers, architectures like RAID-II are needed due to the
`rapid trend in workstation-based computing environments toward larger datasets and bandwidth
`intensive applications.
`Section explains the reasoning which lead us to the RAID-II Storage Architecture and brie y
`describes the architecture. Sections . and . describes the hardware structural and software
`functional aspects of the two main components of the RAID-II Storage Architecture: the RAID-II
`storage server and the RAID-II le server. Finally, Section . describes an innovative le system
`called the Log-Structured File System which runs on the RAID-II le server, and without which
`the performance of RAID-II would be no better than the average workstation-based le server.
`
` Network as Backplane
`
`Today, high-bandwidth le service for supercomputers is provided by mainframe computers. Un-
`fortunately, it is highly ine cient to use mainframes to provide NFS le service for a large number
`of client workstations. NFS le service is a highly CPU intensive task for which mainframes have
`little advantage over RISC workstations and which would leave the expensive high-bandwidth IO
`systems of mainframes virtually unutilized. Unfortunately, the other alternative, using workstation-
`based le servers to serve both workstation and supercomputer clients is even worse. As our expe-
`riences with RAID-I strongly point out, workstations are unlikely to support high-bandwidth IO
`in the near future.
`Ideally, we would like to combine the high-bandwidth IO systems of mainframes with the
`inexpensive computing power of RISC workstations. Moreover, we would like to be able to inde-
`pendently scale the bandwidth and CPU power of the network le server to accommodate a wide
`range of supercomputer and NFS type workloads. Clearly a exible interconnection scheme that
`could scale in both bandwidth and the number of attached components is the key to solving the
`above problems. With this in mind, we considered a wide array of bus-based and bit-sliced server
`architectures. Unfortunately, all of the architectures we considered required large amounts of spe-
`cialized hardware. In the end, however, the solution was staring us in the face; the network itself
`is the most exible and general interconnect.
`RAID-II is radically di erent from conventional workstation-based network le servers. These
`use a bus as the primary system backplane between the network, secondary storage system and
`CPU whereas RAID-II uses the network as the primary system backplane. Instead of connecting the
`high-bandwidth secondary storage system to the high-bandwidth network via a low-bandwidth bus,
`we connect it directly to the high-bandwidth network and refer to it as the RAID-II storage server.
`Similarly, the CPU, now referred to as the RAID-II le server, is also connected to the network.
`The result is a high-performance storage system that is more scalable than conventional network
` le servers in much the same way networks are more scalable than busses. Figure illustrates the
`RAID-II storage architecture. A variety of clients from supercomputers to desktop workstations,
`RAID-II storage servers and RAID-II le servers are interconnected over a network composed of
`high-bandwidth network switches, FDDI networks, Ethernets and network routers.
`
`
`
`Oracle Exhibit 1011, page 6
`
`
`
`Supercomputer
`
`Visualization
`
`Workstation
`
`Client
`
`Client
`
`Client
`
`Workstation
`
`Workstation
`
`Workstation
`
`Ethernet (10 Mb/s)
`
`High Bandwidth
`
`Network Switch
`
`(Ultranet)
`
`Cisco
`
`Router
`
`HIPPI (1 Gb/s)
`
`FDDI (100 Mb/s)
`
`RAID-II
`
`RAID-II
`
`Storage Server
`
`Storage Server
`
`RAID-II
`
`File Server
`
`RAID-II
`
`File Server
`
`Client
`
`Workstation
`
`Figure : RAID-II Storage Architecture.
`
`. Storage and File Servers: Independence and Interdependence
`
`As mentioned in the previous section, the RAID-II storage server corresponds to the secondary
`storage system of conventional network le servers. As such, the RAID-II storage server implements
`a logical device-level interface and loosely corresponds to the de nition of storage server given in
`the MSS Reference Model. Storage on the RAID-II storage server is addressed using a logical
`device name and o set within the device. The RAID-II storage server is implemented with custom
`hardware and software.
`The RAID-II le server combines the functionality of the MSS Reference Model’s name server
`and bit le server.
`In particular, it translates hierarchical human-readable le names to logical
`device addresses for use by the RAID-II storage server, manages the metadata associated with each
` le, and provides a cache of recently accessed les. The RAID-II le server is an o -the-shelf RISC
`workstation running the Log-Structured File System.
`Separating the RAID-II storage system into storage servers and le servers has several advan-
`tages. If we need additional IO bandwidth, we simply add more storage servers without a ecting
`the number of le servers.
`If we nd the le servers overutilized, we can add more le servers
`without adding more storage servers. To some extent adding storage servers is like adding disks
`to a conventional le server but an important distinction is that when you add a storage server,
`you increase the IO bandwidth available to the network whereas when you add a disk you only
`increase the IO bandwidth available to the given le servers, whose backplane, if the le server is
`a typical workstation, is easily saturated.
`Additionally, we can increase data transfer bandwidth for servicing individual IO requests by
`
`
`
`Oracle Exhibit 1011, page 7
`
`
`
`striping les over multiple storage servers. That is, we can build a le system on a logical disk
`spanning multiple storage servers. Striping over multiple storage servers is also useful for load
`balancing. Note that striping over multiple storage servers is conceptually simple|we can treat it
`similarly to disk striping|because synchronization and logical le operations for a given le are
`still handled by a single le server.
`Being able to scale by striping data over multiple storage servers is important for several reasons.
`First, when designing a shared storage system for a heterogeneous environment, it is uneconomical
`to build a storage server capable of supplying the full IO bandwidth of the fastest supercomputer
`client. The storage server’s capabilities would be wasted on the majority of clients, which are
`desktop workstations. Second, even if one built such a super storage server, it would quickly
`become inadequate when the next generation of supercomputers becomes available. Clearly a
`storage architecture whose bandwidth for individual requests is incrementally scalable is highly
`desirable.
`Another advantage of physically separating the storage and le servers is apparent when one
`considers redundancy schemes designed to tolerate server failures. It is easy to implement di erent
`redundancy schemes to protect storage servers in comparison to le servers, since they are physically
`separated. For example, since storage capacity is very important for storage servers, we can use a
`parity storage server, similar to a parity disk in RAID, to provide access to the contents of a failed
`storage server. By computing the parity over a large number of storage servers, we can make the
`storage cost of parity arbitrarily small. On the other hand, we can use simpler higher-performance
`redundancy schemes such as mirroring or logging to a backup le server to protect le servers
`from failures, where the storage cost for redundancy is less because there is less state to maintain.
`Finally, since the storage servers interface directly to the network, it is easy to distribute the storage
`servers over a wide geographic area to protect against natural disasters such as earthquakes and
` oods.
`On the negative side, separating the storage and le servers increases the overhead to access
`the storage system. For large high-bandwidth requests, however, the overhead is dwarfed by the
`data transfer time. We also feel that small requests will not be signi cantly hurt because most
`small read requests can be e ciently cached and LFS performs writes asynchronously in the
`background.
`
`. Standard and High-Bandwidth Modes
`
`In the previous section, we considered several advantages and a disadvantage of the RAID-II hard-
`ware architecture. However, the software mechanisms by which requests are serviced has not yet
`been described.
`It is evident at this point that the RAID-II hardware architecture can provide
`scalable high-bandwidth le service, but it is not clear if the software can e ectively utilize the
`hardware. In particular, the RAID-II software must e ciently support both standard low-latency
`requests and high-bandwidth data transfers.
`We de ne low-latency requests as small data transfers and le system operations such as open,
`close and fstat while high-bandwidth requests are de ned as large data transfers. Low-latency
`requests and high-bandwidth requests obviously have widely di erent performance characteristics.
`The performance of low-latency requests is determined primarily by xed network and software
`overheads. The time spent transferring data is not as signi cant.
`In contrast, high-bandwidth
`requests spend most of their time transferring data and are insensitive to xed overheads in the
`
`
`
`Oracle Exhibit 1011, page 8
`
`
`
`network or software. Thus, it is unlikely that a network protocol that works well for low-latency
`requests would work well for high-bandwidth requests and visa versa. Low-latency requests are best
`handled using protocols with small xed software overheads that reduce the number of round-trip
`messages between clients and servers. In contrast, high-bandwidth requests are best serviced using
`the highest bandwidth network protocols, even if they require more overhead to setup the data
`transfers.
`RAID-II services low-latency requests in standard mode. In standard mode, data and control
`messages are combined and transmitted together to reduce the number of network messages and
`software overheads. On reads, the le server returns the data with the acknowledgement and on
`writes, the client sends the data with the write request. For most reads, a client’s request will be
`satis ed from the le server’s cache, resulting in low-latency accesses. When a miss occurs, the le
`server treats the storage server as a locally attached disk. The requested data is transferred from
`the storage server to the le server’s cache and from there to the client. The scenario is exactly the
`same as for conventional workstation-based le servers except that the disks are attached to the
`network rather than the le server’s private backplane.
`In contrast to the standard mode, the high-bandwidth mode is optimized for large data transfers.
`In high-bandwidth mode, the le server processes each data request by setting up data transfers
`directly between the storage servers and client. The data completely bypasses the low-bandwidth
` le server. In addition, high-bandwidth accesses use a connection-based protocol between the client
`and the storage servers. Typical network le systems use a connectionless protocol that breaks up
`data into many small packets sent individually over the network. Each packet contains a protocol
`header specifying the le and le o set to which it corresponds. For each packet, the recipient
`takes an interrupt, processes the header and reassembles the contents of the packet. For large
`high-bandwidth transfers, this results in high processing overhead. The UltraNet, for example, has
`a maximum packet size of KB. To sustain a data rate of MBs, the recipient of the data
`must be able to process over packets per second.
`The connection-based approach, on the other hand, results in signi cantly less processing over-
`head. When a client requests a high-bandwidth data transfer, the le server rst performs some
`preliminary processing to prepare for the transfer. Then it creates a connection between the storage
`servers and client. Once the connection is established, the client transfers data by issuing read and
`write requests on the connection. Each request consists of only le data; headers are not needed
`since the le and le o set are implicit in the connection. The connection abstraction is easily
`supported by the storage server’s network interface, so that the server workstation need only be
`noti ed when the entire data transfer is complete.
`A common problem in designing systems to support two very di erent types of requests is
`interference between the two request types. Large high-bandwidth requests should not block the
`service of low-latency requests and small low-latency requests should not waste the bandwidth
`available on the high-bandwidth network. Our solution is to provide each RAID-II storage server
`with both a high-bandwidth Gbs HIPPI interface and a low-latency Mbs FDDI interface.
`Each storage server is fully functional with only one of the two network connections, and either
`request type may use either network, but performance is enhanced if low-latency requests use the
`FDDI network and high-bandwidth requests use the HIPPI network.
`
`
`
`Oracle Exhibit 1011, page 9
`
`
`
`. Log-Structured File System
`
`Even with custom hardware and special network protocols, RAID-II could not support high-
`bandwidth data transfers and high-transaction-rate NFS-type le requests without the Log-Structured
`File System LFS . Standard UNIX le systems cannot support high-bandwidth IO because
`they store les in small xed-size blocks that often end up scattered over a disk. Also, because
`we are implementing a RAID level storage system, we are concerned with the throughput and
`latency for small writes, which are ine cient in RAID level systems because each small write
`results in four disk accesses. Two disk accesses are needed to read the old data and old parity
`followed by two accesses to write the new data and new parity. Standard UNIX le systems, which
`perform many small synchronous writes, would greatly degrade the performance of RAID-II.
`LFS, developed by the Sprite operating system group at Berkeley, solves the above problems
`by laying out data in large contiguous segments and grouping small write requests together. More
`generally, LFS always appends new data, and the corresponding metadata, in large sequential
`segments at the end of a log. Small writes are grouped and bu ered in main memory until enough
`data to ll a log segment are accumulated. In practice, segments are forced to disk after a certain
`time limit even if the segment is partially empty. As disk space lls up, segments that become
`partially empty due to the deletion or overwriting of existing les are garbage collected. An added
`bonus of LFS is that crash recovery is very quick because data is always written sequentially in a
`log-like fashion. It takes less than a second to perform an LFS le system check, compared with
`approximately minutes to check the consistency of a UNIX le system.
`
`. Summary
`
`The RAID-II storage system treats the network as the primary system backplane that interconnects
`the storage servers, le servers and clients. The physical separation of storage and le servers allows
`great exibility in con guring the storage system to meet a wide range of performance requirements.
`The separation of storage and le servers also makes it easy to scale peak data transfer rates to
`clients by striping data over multiple storage servers. In addition to the method for striping data
`described in this section, we are also investigating several alternatives. One of the alternatives is
`similar to that used in Swift . In Swift, the le servers are contacted only when les are opened
`and closed. Clients perform individual read and write requests directly with the storage servers.
`This can result in lower-latency le accesses and a reduction in le server load in exchange for a
`more complex storage server.
`Unfortunately, the separation of storage and le servers increases the overhead to access physical
`storage. However, this increased overhead is insigni cant for large transfers. For small data trans-
`fers, our recent study indicates that caches combined with asynchronous writes will e ectively
`mask the additional overhead for most applications.
`The RAID-II storage system supports low-latency and high-bandwidth requests by providing
`the standard and high-bandwidth modes of access, respectively. During the servicing of high-
`bandwidth data requests, data are transferred directly to the high-bandwidth client and bypass
`the low-bandwidth le server. Providing separate networks for servicing low-latency and high-
`bandwidth requests allows the bandwidth of the high-bandwidth network to be e ciently utilized
`and prevents large high-bandwidth requests from blocking small low-latency requests. Finally,
`RAID-II will run LFS which is optimized to support large high-bandwidth data transfers and
`RAID level storage systems.
`
`
`
`Oracle Exhibit 1011, page 10
`
`
`
` RAID-II Storage Server
`
`The RAID-II storage server is based on a chassis from TMC Thinking Machines Corporation
`and is designed to provide a high-bandwidth path between the secondary storage system and
`the network. Figure illustrates the RAID-II storage server architecture. The storage server’s
`backplane consists of two high-ban