throbber
High-Performance Network and
`Channel Based Storage
`
`RANDY H. KATZ, SENIOR MEMBER, IEEE
`Invited Paper
`
`In the traditional mainframe-cenlered view of a compuier
`sy.>Jem, storage devicts art! coupled to the system through comp/a
`hardware subsystems called 110 channels.
`ffith the dramatic
`shift loward workstarU!n-based compuiing. and its associarcd
`client/server model of computation, storage facilities are now
`found aJtadu!d to file servers and Jistribui<!d throughout the
`network. In this paper, we disc~l the underlying technology trends
`that are leading to high-performance network-based storage,
`namely adva11ces in networks, storage dei ices, and hO comroller
`and serwr architectures. We rev~w se1Y-ral commercial systems
`and rr.search prototypes that are leading to a new approach to
`high-perfonna11ce computing based 011 network-attached s1orage.
`
`J.
`
`INTRODUCTION
`The traditional mainframe-centered model of comput(cid:173)
`ing can be characterized by small numbers of large-scale
`mainframe computers, with shared storage devices attached
`via 1/0 channel hardware. Today, we are experiencing a
`major paradigm shift away from centralized mainframes to
`a distributed model of computation based on workstations
`and file serven. connecte<i via high-performance networks.
`What makes this new paradigm po~ible is the rapid
`devdopment and acceptance of the client/server model
`of computation. The clicnt/~crver model is a mcssage(cid:173)
`bascd protocol in which clients make requests of service
`providers, which are called :.ervi:~. Perhaps the most
`successful application of this concept is the wide~pread
`use of tile servers in networks of computer Y.orkstations
`and personal computers. Even a high-end workstation
`has rather
`limited capabilities for data storage. A
`distinguished machine on the network, customized either
`by hardware, software, or both, provides a file service. It
`
`Mtnuscripl re.ccivcd October I, 1991; revised March 24, 1992. This
`work was supponcd by the Ikfc:nse Adv~nc~d Re~cb Pro;ccts Ai;r ncy
`and the Nau oil&] Aeronautics Md Space Adm· nistration under co:iir~ct
`NA02-591 (Disk.less Supercomputers: High Performam:e LO for the
`Ter~Op Technology Base). Additional support w~~ provided by the State
`of California MICRO Program in eonjunction with indu-.'Iial malclting
`suppol1 ptovided by DEC. Emulcx, 1'.ubytc, U!M. Nl II., ~ml Storage
`Technology rorpontions.
`The_ aufuor. is "'.ith the Computer Sdrn« Divis.ion, Department of
`Elcctncal E1.1g1nccrmg and Compuler Sciences, Univeh1ly of O.lifomia,
`Berkeley, CA 94720.
`IEEE Log Numbc:r 9203670.
`
`accepts network mcs~.iges from client machines containing
`op~n/close/read/write file requests ;snd proce.-ses these,
`transmitting the requested data back a.nd forth across the
`network.
`This is in contrast 10 the pure distributed storage model,
`in which the files are dispersed among the storage on work(cid:173)
`stations rather than centralized in a server. The advantages
`of a distributed organiZlltion are lhat resources are placed
`ne r where they are needed, leading lo better performance,
`and that the environment can be more autonomous becau~e
`individu;il machines continue to perform useful work even
`in. the face of network failures. While this bas bl!en 1fte
`more popular approach over the last few yearS, there has
`emerged a growing awareness of the advantages of the
`centraliud view. That is, every user sees the same file
`system, independent of the machine they ure currently
`using. 111e view of storage is pervasive and transparent.
`Further, it is much easier to administer a centralized system.
`to provide software updatcs z.nd urchjval backups. The re(cid:173)
`sulting organization combines distributed processing poy,cr
`with a centralized view of storage.
`Admittedly, centralized stornge also hus its weaknesses.
`A server or network failure renders the client wurksrations
`unusable and the network represents lhe critical perfor(cid:173)
`mance bottleneck, A highly tuned remote file system on a
`10 megabit (Mbit) per second Ethernet can provide perhaps
`SOOK bytes per ~cond to remote client applications. Sixty
`8K byte f/O's per second would fully utilize this bandwidth.
`Obtdining the right balance of workstations to qervers
`depends on their relative proce~ing power, the amount
`of memory dedicated to file caches on workstations a11d
`5crvcrs, the available nel\\ork banJwidth, and the I/O
`b.mdwidth of the !.erver. It is interesting 10 note that today's
`servers arc not J/O limited: the Ethernet bandwidth can be
`fully utilized by the 1/0 bandwidth of only two magnetic
`disls'
`Meanwhile, other technology developments in proces(cid:173)
`sors, networks, and storage systems are affecting the re(cid:173)
`lationship between clients and servers. II is well known
`that processor performance, as measured in MIPS ratings,
`
`1238
`
`OCIJ8-9219V2$03.00 Cl 1992 IEEE
`
`1 of 24
`
`PROCF.ffil,.;Gs OF Tilf men. VOL 110. NO. &, AUCl'~ 1992
`CROSSROADS EXHIBIT 2038
`Oracle Corp., et al v. Crossroads Systems, Inc.
`IPR2014-01207 and IPR2014-1209
`
`

`

`
`is increasing at an astonishing rate, doubling on the order of is increasing at an astonishing rate, doubling on the order of
`
`once every 18 months to two years. The newest generation once every 18 months to two years. The newest generation
`
`of RISC processors has performance in the 50 10 60 MIPS of RISC processors has performance in the 50 10 60 MIPS
`
`range. For example, a recent workstation announced by range. For example, a recent workstation announced by
`
`the Hewlett·Packard Corporation, the HP 9OOOn30, has the Hewlett·Packard Corporation, the HP 9OOOn30, has
`
`been rated at 72 SPECMarks (I SPECMark is roughly the been rated at 72 SPECMarks (I SPECMark is roughly the
`
`processing power of a single Digital Equipment Corporation processing power of a single Digital Equipment Corporation
`
`VAX IIn80 on a particular benchmark set). Powerful VAX IIn80 on a particular benchmark set). Powerful
`
`shared memory multiprocessor systems, now available from shared memory multiprocessor systems, now available from
`
`companies such as Silicon Graphics and Solborne, provide companies such as Silicon Graphics and Solborne, provide
`
`well over 100 MIPS performance. One of Amdahl's famous well over 100 MIPS performance. One of Amdahl's famous
`
`laws equated one MIPS of processing power with one laws equated one MIPS of processing power with one
`
`megabit of VO per second. Obviously such processing rates megabit of VO per second. Obviously such processing rates
`
`far exceed anything Ihat can be delivcred by existing server, far exceed anything Ihat can be delivcred by existing server,
`
`network, or storage architectures. network, or storage architectures.
`
`Unlike processor power, network technology cvolves at Unlike processor power, network technology cvolves at
`
`a slower rate, but when it advances, it does so in order a slower rate, but when it advances, it does so in order
`
`of magnitude steps. In the last decadc we have advanced of magnitude steps. In the last decadc we have advanced
`
`from 3 Mbil/second Ethernet to 10 MbiUsecond Ethernet. from 3 Mbil/second Ethernet to 10 MbiUsecond Ethernet.
`
`We are now on the verge of a new generation of network We are now on the verge of a new generation of network
`
`technology, based on fiber·optic intemlnnect, called FOOl. technology, based on fiber·optic intemlnnect, called FOOl.
`
`This technology promises 100 Mbil~ per second, and at This technology promises 100 Mbil~ per second, and at
`
`least initially, it will move the server bottleneck from tile least initially, it will move the server bottleneck from tile
`
`network to the server CPU or its storage system. Witll network to the server CPU or its storage system. Witll
`
`more powerful processors available on the horizon, the more powerful processors available on the horizon, the
`
`pcrfonnance challenge i$ very likely to be in the storage pcrfonnance challenge i$ very likely to be in the storage
`
`system, where a typical magnetic disk can service 30 8K system, where a typical magnetic disk can service 30 8K
`
`byte VO ·s per second and can sustain a data rate in the range byte VO ·s per second and can sustain a data rate in the range
`
`of I to 3 Mbytes per $econd. And even faster networks and of I to 3 Mbytes per $econd. And even faster networks and
`
`interconnects, in the gigabit range, are now commercially interconnects, in the gigabit range, are now commercially
`
`available and will become more widespread as their costs available and will become more widespread as their costs
`
`begin 10 drop [1). begin 10 drop [1).
`
`To keep up with the advances in processors and networks, To keep up with the advances in processors and networks,
`
`storage system..~ are also experiencing rapid improvements. storage system..~ are also experiencing rapid improvements.
`
`Magnetic disks have been doubling in stol1lge capacity Magnetic disks have been doubling in stol1lge capacity
`
`once every three years. As disk form factors shrink from once every three years. As disk form factors shrink from
`
`14 inch to 3.5 incll and below, the disk~ can be made 14 inch to 3.5 incll and below, the disk~ can be made
`
`to spin faster, thus increasing the sequential transfer rate. to spin faster, thus increasing the sequential transfer rate.
`
`Unfortunately, the random VO rate is improving only very Unfortunately, the random VO rate is improving only very
`
`slowly, owing to mccllanically limited positioning delays. slowly, owing to mccllanically limited positioning delays.
`
`Since I/O and data rates are primarily disk actuator limited, Since I/O and data rates are primarily disk actuator limited,
`
`a new storage system approach called disk arrays addresses a new storage system approach called disk arrays addresses
`
`this problem by replacing a small number of large-format this problem by replacing a small number of large-format
`
`disks by a very large number of small·format disks. Disk. disks by a very large number of small·format disks. Disk.
`
`arrays maintain the high capacity of the ~torage system, arrays maintain the high capacity of the ~torage system,
`
`while enormously increasing the system's disk actuators while enormously increasing the system's disk actuators
`
`and thus the aggregate VO and data rate. and thus the aggregate VO and data rate.
`
`The confluence of developments in processors, networts, The confluence of developments in processors, networts,
`
`and storage olTers the possibility of extending the client,/ and storage olTers the possibility of extending the client,/
`
`server model so effectively used in workstation environ(cid:173)server model so effectively used in workstation environ(cid:173)
`
`ments to higher performance environments, which inte· ments to higher performance environments, which inte·
`
`grate supercomputer, near supercomputers, workstations, grate supercomputer, near supercomputers, workstations,
`
`and stoTlige services on a very high performance network. and stoTlige services on a very high performance network.
`
`The technology is rapidly reaching the point where it is The technology is rapidly reaching the point where it is
`
`possible to think in terms of diskless supercomputers in possible to think in terms of diskless supercomputers in
`
`much the same way as we think aoom diskless workstations. much the same way as we think aoom diskless workstations.
`
`Thus. the network is emerging as the future ·'backplane·' Thus. the network is emerging as the future ·'backplane·'
`
`of high-performance systems. The challenge is to develop of high-performance systems. The challenge is to develop
`
`
`
`\(Au, NETWOR K AND ct!ANN~:L !lASED STORAGE \(Au, NETWOR K AND ct!ANN~:L !lASED STORAGE
`
`
`the new hardware and software architectures that will be the new hardware and software architectures that will be
`
`suitable for this world of network."based storage. suitable for this world of network."based storage.
`
`The emphasis of this paper is on the integration of storage The emphasis of this paper is on the integration of storage
`
`and network services, and the challenges of managing and network services, and the challenges of managing
`
`the complex storage hierarchy of the future: file caches, the complex storage hierarchy of the future: file caches,
`
`on·line disk storage, near-line data libraries, and olT-line on·line disk storage, near-line data libraries, and olT-line
`
`archives. We specifically ignore existing mainframe I/O archives. We specifically ignore existing mainframe I/O
`
`architectures, as these are well described elsewhere (for architectures, as these are well described elsewhere (for
`
`example, in [2». The rest of this paper is organized as example, in [2». The rest of this paper is organized as
`
`follows. In the next three sections, we will review the recent follows. In the next three sections, we will review the recent
`
`advances in interconnect, storage devices. and distributed advances in interconnect, storage devices. and distributed
`
`software, to belter undenitand the underlying changes in software, to belter undenitand the underlying changes in
`
`network, storage, and software tcchnologies. Section V con· network, storage, and software tcchnologies. Section V con·
`
`tains detailed case studies of commercially available high· tains detailed case studies of commercially available high·
`
`performance networks. storage servers. and file servers, as performance networks. storage servers. and file servers, as
`
`well as a prototype high-performance network·attached VO well as a prototype high-performance network·attached VO
`
`controller being developed at the University of California, controller being developed at the University of California,
`
`Berkeley. Our summary. conclusions, and suggestions for Berkeley. Our summary. conclusions, and suggestions for
`
`future research are found in Section VI. future research are found in Section VI.
`
`
`
`II. II.
`
`
`
`INTERCONNECT TRENDS INTERCONNECT TRENDS
`
`
`A. Networks, Channels, and Backplanes A. Networks, Channels, and Backplanes
`
`Interconnect is a generic term for the "glue" that inter· Interconnect is a generic term for the "glue" that inter·
`
`faces the components of a computer system. Interconnect faces the components of a computer system. Interconnect
`
`consists of high.speed hardware interfaces and the asso(cid:173)consists of high.speed hardware interfaces and the asso(cid:173)
`
`ciated logical protocols. The former consists of physical ciated logical protocols. The former consists of physical
`
`wires or control registers. The latter may be interpreted wires or control registers. The latter may be interpreted
`
`by either hardware or software. From the viewpoint of by either hardware or software. From the viewpoint of
`
`the storage system, interconnect can be classified as high· the storage system, interconnect can be classified as high·
`
`speed network::;, processor-to·storage channels, or system speed network::;, processor-to·storage channels, or system
`
`backplanes that prov ide ports to a memory system through backplanes that prov ide ports to a memory system through
`
`direct memory access techniques. direct memory access techniques.
`
`Networks, channels, and backplanes dilTer in terms of Networks, channels, and backplanes dilTer in terms of
`
`the interconnection distances they can support, the band(cid:173)the interconnection distances they can support, the band(cid:173)
`
`width and latencies they can achieve, and the fund amental width and latencies they can achieve, and the fund amental
`
`assumptions about the inherent unreliability of data trans(cid:173)assumptions about the inherent unreliability of data trans(cid:173)
`
`mission. While no statement we can make is universally mission. While no statement we can make is universally
`
`true, in general, backplanes can be characterized by parallel true, in general, backplanes can be characterized by parallel
`
`wide data patlls and centralized arbitration, and are oriented wide data patlls and centralized arbitration, and are oriented
`
`toward readlwrite "memory mapped" operations. Thai is, toward readlwrite "memory mapped" operations. Thai is,
`
`access to control registers is treated identically to memory access to control registers is treated identically to memory
`
`word access. Networks, on the other hand, provide serial word access. Networks, on the other hand, provide serial
`
`data, distribUTed arbitration, and support more message(cid:173)data, distribUTed arbitration, and support more message(cid:173)
`
`oriented protocols. The latter require a more comple)!: oriented protocols. The latter require a more comple)!:
`
`handshake, usually involving the e)[change of high.level handshake, usually involving the e)[change of high.level
`
`request and acknowledgment messages. Channels fall be· request and acknowledgment messages. Channels fall be·
`
`tween the two extremes, consisting of wide data paths tween the two extremes, consisting of wide data paths
`
`of medium distance and often incorporating simplified of medium distance and often incorporating simplified
`
`versions of networkJike prOlocols. versions of networkJike prOlocols.
`
`These considerations are summarized These considerations are summarized
`
`in Table in Table
`
`l. l.
`
`Networks Networks
`
`typically sp;!rl more typically sp;!rl more
`
`than 1 km, sustain than 1 km, sustain
`
`10 Mbil/second (Ethemet) to 100 Mbit/second (FDD!) 10 Mbil/second (Ethemet) to 100 Mbit/second (FDD!)
`
`and beyond, e)[perience latencies measured in several and beyond, e)[perience latencies measured in several
`
`milliseconds (ms), and the network medium milliseconds (ms), and the network medium
`
`itself is itself is
`
`considered to be inherently unreliable. Networks include considered to be inherently unreliable. Networks include
`
`extensive data integrity features within their protocols, extensive data integrity features within their protocols,
`
`2 of 24
`
`

`

`CIoI...w1 -
`
`lG- tOO m
`
`4U-1OOO M~'
`
`Di>llIno;:e
`
`N~!WOIk
`
`~HXlO m
`
`a...<tv.'id\h
`
`tG-tOO Mnts
`
`U"""
`
`high (>mol
`
`medium
`
`'m
`J20- tooo.. ",.
`low «".)
`hiSh
`B)'I' Parily
`
`R.li.bit ity
`
`medium
`Bylc !'I,ily
`
`,~
`Elt1ensivc
`cec
`TIle COlIl.po.riSOll " ~ .. pon lile i.1eralnnetlKm dilIHCe, tran$(cid:173)
`rnis.sion blolOdwid!~. transmis.siorl t.I<1I<')'. irl ilerenl ,<:Iiabilily. alOd Iypical
`lIr<'hniqllu 101' irllP' ... i~g data i"1egriLy.
`
`including eRe c hecksums !lllhe packet and message levels,
`and the explicit aCknowledgment of received packel ...
`Channels span small 10'$ of meters, transmit at anywhere
`from 4.5 MbyteS/second (IBM channel interfaces) to 100
`Mbylcslsecond (HiPPI channels). incur latencies of under
`100 Ill) per transfer, and ha ve medium rel iabili ty. Byle
`parity at the individual transfer word is usuall y supponed,
`althoug h packet-level check-summing might also be sup(cid:173)
`ported.
`Backplanes are about I m in length, transfer from 40
`(VM E) to over 100 (FutureBus) MBytes/second, incur sub
`liS latencies. and the interconnect is considered to be
`highly reliable. Backplanes typically support byte parity,
`although some backplanes (unfortunately) dispense with
`pari ty altoget her,
`In the remainder of this section, we will look at each
`of the three ki nds of interconnect, network, c han nel, and
`bac kpl ane, in more delail.
`
`8 . CommunicotiOl1s Networks ol1d NeMOrlc CQIIlroliers
`An exccllent overview of networking technology can be
`found in [3J. For B fu!Uristic.: view, S« [4J and IS). The
`decade of the 198O's has seen a slow matunltion of network
`technology, but the 1990's promise much more rapid devel(cid:173)
`opments. Today, 10 MbiVsecond Ethernets are pervasive,
`with ma ny envi ronments advancing to the next genenltion
`of 100 Mbil/second networks based on the FOOl (Fiber
`Distributed Data Inte rfacc) standard [6J . FOOl provides
`higher ba ndwidth, longer distances, Hnd reduced error rates.
`largely becHuse of the introduction of tiber optiCS for data
`transmissio n. Unfortunately CO!;I, especially for replaci ng
`the existing copper wi re network with tiber, coupled with
`disappointing tra nsmission latencies, has slowed the acce p(cid:173)
`taAC1: of these higher speed networks. The latency problems
`have more to do wit h FOOl's protocols, which Ire based o n
`a token passing arbitration scheme, than any th ing intrinsic
`in fibe r-optic techoology.
`A network system is decomposed into multiple protocol
`layers, from the application interface down to the method
`of physical communication of bits on the network. Fig(cid:173)
`ure I s ummarizes the popular seven-layer ISO protocol
`model. The physical and link levels arc closely tied to the
`
`.(cid:173)
`...... -.... N_
`
`U ••
`
`"'1.01
`
`rl'Hl!nI ......
`
`OeWlcd inf<>nnatiocll-. IIIe ""'" bein. H~""
`0 ... rq>teIOnwion
`M..,.".mcM of ~ IleI_A P"'JIMI'S
`
`Ddj""'YofpKkel~
`""",01 of individllal packeU
`Access., and C<II>IroI of \nJISI1UuiooI modi"",
`MediUM of InnsrnU4icoo
`
`t. Seven.tofu ISO protocot rnodc:t. TIl. physic.t tly"
`)1¥-
`d.:lCtibc, the actual tr~rwni .. ion "",diam. be iL COlo' cable, Rile'
`optia.. 0' I paralkt backplane. The tink iltye, d""",ibcs how
`5I.lioo; pi. acceM 10 tile medium. This lay .. dOlls .... ith the
`protOaIll fot lItoitratillj fot lI1d obuillifli gr.nt permi5Sioft 10 llIe
`media. The MlWotk Ily.r ddines tbe formal of dall pllCk.ls lO
`lie trIIrwni,tcd .... , 1111: media. i""h,din, destination ancl stfldt,
`infonnatioJl as ",.n as any c,,"k SUJPS. 'The Ulnspon t.~'1 ;.
`~bIc to< Ita. ,";obt~ d ..... 1}' of pllCUts. The sosicoo .. ~
`ntal>liobu oommuniatKNr llerwetn Ilt¢ stIIdi"l pn:!IIam anct lile
`mxivin& prosram. 11tc JmSClIIIl ion Ja)'''' .... enn ines the dtllliJed
`fonnl~ .,rIb.: daLI cmbo<dded ,..ilhift poct cl" Th< IWlicllicft ta~e,
`h .. I"" responsibility of IllUkrstandia& how I~ d.u shook! be
`in~rp .. tcd wiLh in In .pp1i<:rt,;"ns oonlu1.
`
`underl yi ng transport medium. and deal with the physical
`attachment 10 the network and the method of acquiring
`access to it, The ne twork, transport, Llnd session levels
`focus on the detailed fannats of communica tions packets
`and the met1'lods for transmitting them from one program
`10 another. 1be presentation and applicatio ns layers define
`the formats of the data embedded within the packets and
`the applicalion-specific semantics of that data.
`A number of ~rrorm ance measurements of network
`transmission services point oul that the significant over(cid:173)
`head is not protocol interpretation (approXimately 10% of
`instructions Llre s pent in interpreting the network headers),
`The culprits are memory system overheads arising from
`data movement anrJ operating s ystem overheads related to
`context switches and data copying f1J-jlOJ. We will see
`thi$ again and aga in in the sections to follow .
`The network controller is the collection of ha rdware
`and firmware that im plemeots the interface between the
`network and the host processor. It is typically impleme nted
`on a small printed circuil board, and contains its own
`processor, memory mapped cootrol registers, interlace to
`the network, Hnd small memory to hold messages being
`traflSrn;lIed and received. The on-board processor, usua lly
`in conjunction with VLSI components within the netwo rk
`interface, impleme nts the physical and link· level protocols
`of the network.
`The interaction between the nelwork controller and the
`host's memory
`is depicted in Fig. 2, Lists of blocks
`containi ng packets to be sent and packets that have been
`received are maintained in the host proccs!IOr's memory.
`The locations of buffers for these blocks are made known
`10 the ne twork controller, and it will copy packets to and
`from the request/receive block areas using direct memory
`access (DMA) tech niques. This means that the copy of data
`across the peripheral bus is under the control of the network
`controller, artod does not require the intervention of the host
`processor. The cont ro ller will interrupt the host wheneve r
`a message has been received or sent.
`
`PROCUDINGS Of THE IEEE. VOL 110. NO, i. AU(iUST !99l
`
`3 of 24
`
`

`

`-
`
`-"~
`'r:iii"., ... ~'
`U. rI __
`G[J.- -
`Uotrl __
`
`Nclwolt COIII.roIJ&o:r
`
`~ t- -Coo",, _
`
`Proteoaor _ ~ .. I/'Il. _
`
`I~ ~'-
`"-
`
`-
`
`~-
`
`FIt. 2. NellO'Ork COIIl roIl .. fptocn5of memory ink,.mo... The
`tlpre daaihn I~ inl.rlCtion ~ the IIOlwork conlrol"',
`and liot memory or lhe not_It nocIr. The COIIh'OIk, rontaillS
`;m on·board microp"",",""",'. vll'ious memory· mopped OOI'IlruI rell'
`;Siers Ihrough which 5I:",i.e requQ.I$ an be ma<k and stalUS
`~he~ke<I, I physical inl.,.f~ 10 the no",,"'" media. and I buffer
`memory 10 hokl "'qo",", and receivo blocU. 1bese conllin n<lwor.
`m .... aes 10 be Ir:afl$lrlilred or which have betn received ~.
`1;""ly. A Ii" of pendin, fC<!UCJ.U and mC1Sages already re«ived
`f.,i<k!. in lhe """I PfOC:.WII' ·~ memory. Dil't>Cl memory opoTliion.
`(DMA'I). under lhe COIIllol of Ihe node proceMOf, ropy Ihese
`bIocb I/) and from Ibi, memory .
`
`While this prelic:nts a panicularly clean interface betwccn
`the network cont roller and Ihe operating system, it points
`oot some of the intrinsic mcmory system latencies that
`reduce network performance. Consider a message that will
`be transmitted to the network. First the contents of the
`message are created within a Ulic:r application. A call to the
`operating system results in a process switch and a data copy
`from the user's address space to the operating system's area.
`A protocol-specific ne twork header is then appended to the
`data to form a ~ck.aged network message. This must be
`copied one more time, to place the mes.o;.age into a request
`block thai can be accessed by the network controlle r. 'The
`final copy is the DMA operation that moves the message
`wilhin the request block to memory within Ihe network
`conlroller.
`Dala integrity is the aspect of system reliability concerned
`with the transmission of correct data and the explicit
`fl8gging of incol'TecI data. An overriding consideration of
`network prolocols is Iheir concern with reliable transmis(cid:173)
`sion. Because of the distances involved and the complexity
`of the Iransmission path, nelwork transmission is inherently
`lossy, The solution is to append eheck·sum protection bits
`to all network packcts and to include explicit acknowledg.
`ment as part of the network protocols. For cxample, if the
`check sum computed at the receiving end does not match
`the transmitted cheek sum, the receiver sends a negative
`acknowledgmenl 10 the sender.
`
`C. Charrrrel Af{:hil«IurC"s
`Channels provide the logical and physical pathways
`between 110 controllers and storage devices. They are
`medium-distance interconnecl lhat carry signals in parallel,
`usually with some parit y tech niquc to provide data inlegTity.
`In Ihis subsection, we will describe three alternative
`channel organiu tions that characterize the opposite cuds
`of lhe performance spec!rum: SCSI (small computer system
`
`imerface), HIPPI (high-performance parallel interface), and
`FCS (fibre channel standard).
`1) Small Compuler SySlem IIII(:r/oce SCSI is the channel
`interface most frequently encountered in small form facl or
`(5.25 in diameter and smaller) disk drives, as well as a
`wide variety of peripherals such as tape drives, optical disk
`readers, and image scanners. SCSI treats peripheral devices
`in a largcly device-independent fashion. For example, a disk
`drive is viewed as a linear bYle sire am; its detailed structure
`in terms of sectors, tracks, and cylinders is not visible
`through the SCSI inlerfacc. A SCSI channel can support
`up to eight devices sharing a common bus with an g·bit(cid:173)
`wide data path. In SCSI terminology, the I/O controller
`counts as one of these devices, and is called the h05t bus
`adapter (HBA). Burst transfers at 4 to 5 Mbytesls are widely
`available today. In SCSI terminology. a device that requests
`service from another de vice is called the master or the
`initiator. The device thai iN providing the service is called
`the slave or the target.
`SCSI provide. .. a high-level message-based protocol for
`communications between initiators and targets. While this
`makes it pos..~ible to mix widely diffe rent kinds on devices
`on the same channel. it does lead to relatively high over(cid:173)
`heads. The prOiocol has been designed to allow ini tiators
`10 manage multiple simultaneous operations. Targets are
`intelligent in the 5('nse tht lhey explicitly notify the initiator
`when they are ready to trans mit data or when Ihey need 10
`throule a transfer.
`It is ..... orthwhile to examine Ihe SCSI prolocol in some
`detail, 10 cltarly diSlinguish what it does from the kinds of
`messages exchanged on a computer network. The SCSI pro(cid:173)
`locol proceeds in a series of phases, which we summarize
`below:
`Bus Free: No device currently has the bus allocated.
`Arbitration: Initiators arbitrllc lOf access 10 the bus. A
`device's physical address dele rmi nes ilS priority.
`Selection: The initiator informs the target that it will
`panicipate in an 110 operation.
`Reselection: The target informs the initililor Ihat an
`outstanding operalion is to he resumed. FOf example,
`an operation could have been previously suspended
`bccau!lt the 110 device had to obtain more data.
`Command: Command bytes are written to the ta rget by
`the initiator. The target hegins uecuting the operation.
`Data Transfer: The protocol Supports two forms of the
`data transfer phase, Dow In and Do/a Ollt. The former
`refers to the movernent of data from the target to the
`initiator. In the latter, data move from the iniliator to
`the target.
`Message: The message phase also comes in twO forms,
`Messoge In and Messoge OUI. Messoge Irr consists of
`lic:veral allematives. Idefll;fy identifies the reselected
`target. So ~ DolO Poirrter saves the place in the current
`data \Tander if the targel is about to disconnecl. Restore
`DolO Pointer restores this pointer, Discon.rrC'Ct notifies
`tile: initiator that the targetis about to give up the data
`bus. Commond CQII1p{ete occurs when the target tells
`tile: initiator that the operalion has completed. Messoge
`
`4 of 24
`
`

`

`M ...... . -1>1-.. (lo,O (1-,.1
`._-
`-""(l4eMI)1 --~"
`.. ~ .. -.
`~ .t.~_ --~1. (I&ntify)
`
`O""",,_ .. _OItiU .. rr..
`_10 (00 -<)
`· ·U .. """, ··
`AdIi_
`
`~1061 ''" ...
`............ ($o"'O ... PuI
`-,-1. (DI..-<)
`
`.......,. .. ( _
`
`!)aU; .... ,
`
`~.-
`
`~-
`
`H .. _,I0_
`
`,/ "'~-
`
`,--
`I- .... = .... c..pku)
`
`Fi&:. l. SCSI ph • .., u.nsilions on • ",.d. 1'bt basic ph • .., ...
`quencing fOJ l read (from di~) ope.alion i$ shown. Fin, lbe
`initiator,.1.$ ~p lhe read commaDd .nd .. nels il iO lhe IJO device.
`The large! device di.~ecI' from ,h. SCSI bus 10 perform a .. ek
`aDd In begin 10 fin ;1$ lu10mal buffer. It lhen , .. nsfe!$ Ihe dal' 10
`the inilialOl. This may be inlel1ip«scd with additional di.<ronnecl$,
`as II>< tr.n~f.r ge15 ahead of tile internal bufferin8. A oommand
`",,<i$Oge 'erm;na' .. lite operation. This figure is adapled
`complm
`from [40].
`
`OUI hall just one fonn: Identify. This is used 10 identify
`the requesting initiator and its intended target.
`Status: JuSt before command roIDpletion. Ihe target
`sends 11 status message 10 tbe initiator.
`To beltef understa nd the sequencing among the phases,
`see Fig. 3. This illustrates the phase transitions for a typical
`SCSI read operation. The s«juencing of an I/O operation
`actually begins when Ihe host's operating sySlem eslablishes
`data and stalUS blocks within ils memory. Next, it issues an
`va command to the HBA, passing il pointers to command,
`status, and data blocks, as well as the SCSI address of
`the target device. These are staged from host memory to
`device-specific queues within the HBA's memory using
`direct memory access techniques.
`Now the va operation can begin in earnest. The HBA
`arbitrates for and wins control of the SCSI bus. It then
`indicates Ihe target device it wishes to communicate with
`during the seloction phase. The target responds by iden(cid:173)
`tifying itself during a following message out phase. Now
`the actual command, such as "read a sequence of byles,"
`is Iransmiued to the device.
`We assume that the target device is a disk. If the disk
`must first seek before it can obtain tbe requested data,
`it will disconnect from the bus. It sends a disconnect
`message to Ihe initiator, which in turn gives up the bus.
`Note that Ihe HBA can communicate with other devices on
`the SCSI channel, initiating additionalI/O operations. Now
`the device will seek 10 the appropriate track and will begin
`10 fill ils inlernal buffer wit

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket