`INTERNATIONAL APPLICATION PUBLISHED UNDER TilE PATENT COOPERATION TREAlY (PC1).
`
`WORLD INI'ELLECIUAL PROPER1Y ORGANIZATION
`In!emational Bureau
`
`(51) lntemational Patent Uassification 5 :
`G06F15/16
`
`Al
`
`(11) International Pobllcation Number:
`
`WO 91/03788
`
`(43)1ntemational Publication Date:
`
`21 March 1991 (21.03.91)
`
`(21) Intemational Applicatlon Number:
`
`PCT/US90/04711
`
`(22) International Filing Date:
`
`20 August 1990 (20.08.90)
`
`(74) Agents: FLIESLER, Martin, C. et al; Aiesler, Dubb,
`Meyer & Lovejoy, 4 Embarcadero Center, Suite 400, San
`Francisco, CA 94111 (US).
`
`(30) Priority data:
`404,959
`
`8 SeptemberU89 (08.09.89) US
`
`(7l)Applicant: AUSPEX SYSTEMS, INC. [US/US]; 2952
`Bunker Hill Lane, Santa Clara, CA 95054 (US).
`
`(72) Inventors: ROW, Edward, lohn ; 468 Mountain Laurel
`Court, Moutain View, CA 94064 (US). BOUCHER,
`Laurence, B. ; 20605 Montalvo Heights Drive, Saratoga,
`CA 95070 (US). PITIS, William, M. ; 780 Mom Drive,
`Los Altos, CA 94022 (US). BLIGHTMAN, Stephen, E. ;
`115 Salt Lake Drive, San Jose, CA 95133 (US}.
`
`(81) Designated States: AT (European patent), AU, BE (Euro(cid:173)
`pean patent), CA, CH (European patent), DE (Euro(cid:173)
`pean patent)*, DK (European patent), ES (European pa(cid:173)
`tent), FR (European patent), GB (European patent), IT
`(European patent), JP, KR, LU (European patent), NL
`(European patent), SE (European patent).
`
`Published
`With international search report.
`Btifore the expiration of the time limit for amending the
`claims and to be republished in the event of the rea!ipt of
`amendments.
`
`(54) Title: PARALLEL 1/0 NETWORK FILE SERVER ARCHITEC1URE
`
`uDbuOo.,
`UOc
`\
`,J.
`llOd"';
`
`NET\illRK
`
`U6ca--.,.
`U6b"'\,..-l-\ __ -.
`U6c-.. ~
`U6d").
`
`SYSlEH
`HEMIJRY
`
`...----100
`
`I r'~ I
`
`_j
`
`_l_
`
`I
`
`I
`I
`I
`
`(
`I
`
`I
`
`/
`W!b _,
`112o.-
`
`...
`
`(57) Abstract
`
`I
`
`STDRAGE
`PRllC:tSStiR
`
`11<41:1
`,.....__..___.c...,vu•ca
`it--t(J::J::P+!:::O::J:::r:J,
`
`A me server an:hi.ted;ure is disclosed, comprising as separate processors, a network controller unit (110), a fde controller
`unit (112) and a storage processor unit (114). These units incorporate their own processors, and operate in parallel with a local
`Unix host processor (118). All networks are conuected to the network controller unit (110), which performs all protocol process(cid:173)
`ing up through the NFS layer. The virtual me system is implemented in the me controller unit (112) and the storage processor
`(114) provides bigh.speed multiplexed access to an array of mass storage devices. The file controller unit (112) controls file infor(cid:173)
`mation caching through its own local cache buffer, and controls disk data caching through a large system memory which is acces-
`sible oo a bus by any of the proa=ssors.
`·
`
`* See back of page
`
`CQ-1002 / Page 960 of 1468
`
`
`
`l
`
`'
`
`.. I
`l
`
`· ..
`,'.4:.
`I
`
`·.·
`
`~
`
`'!
`
`I
`DESIGNATIONS OF "DE"
`
`Until further · notice, any designation of "DE" in any international application
`whose int.erna:tional firing date is prior to October 3, 1990, shall have effect in the
`· territory of the Federal Republic of Gennany with the exception of the territory of the
`former Gennan Democratic Republic.
`
`FOR THE PUBPOSBS OF INFOilMATION ONLY
`
`Codes used to identify States party to the Per on the front pages of pamphlets publishing international
`·
`·
`applications under the Per.
`
`A.T
`Austria
`AU Australia
`BB
`BBJbQda:s
`Bclgium.
`II£
`Burlr.lnaFIWO
`BP'
`BC
`Bulpria,
`BcrUn
`8J
`Brazil
`BR
`CA. Oulada
`CF
`Central African Republic:
`00 'eoa,o
`at Swilzcrland
`CM Ounerooa
`DB Ocnnaay
`
`'"' Dcamarlr.
`
`Spain
`I!S
`F"mlaud
`1'1
`FI'IIJlCC
`F'll
`CA Oaboll
`CB Unll&:d Kfn&dom
`ca Or=c
`HU Hunpry
`rr
`Italy
`JP
`Japan
`Kl' Dcmoc:nuk: l'llloplc:'l Republic
`of Kan:li
`KR Rcpubllc of kmca
`u
`LiechiCDslaln
`LK
`Sri l..anU
`l..u>ualbour&
`LU
`
`MC Monac:o
`MC Madagasl;at
`ML MaU
`MJt Maurhania
`MW Malawi
`NL Netherlands
`NOt'Wlly
`NO
`l'olaud
`PL
`RO Romania
`so
`Sud&D
`'
`Sweden
`S£
`SN
`Seacpl
`su
`Sovlcl Union
`TO
`Chad
`TC
`Taeo
`us
`Unll&:d SWcs of America
`
`CQ-1002 / Page 961 of 1468
`
`
`
`W091/03788
`
`PCf/US90/04711
`
`PARhLLEL I/0 NETWORK FILE SERVER ARCHITECTURE
`
`-1-
`
`5
`
`10
`
`15
`
`20
`
`25
`
`The present application is related to the following
`u.s. Patent Applications, all filed concurrently
`herewith:
`1. MULTIPLE
`SYSTEM
`OPERATING
`FACILITY
`ARCHITECTUPE, invented by David Hitz, Allan Schwartz,
`James Lau . ld Guy Harris;
`UTILIZING
`2 .
`ENHANCED
`VMEBOS
`PROTOCOL
`PSEUDOSYNCHRONOUS HANDSHAKING AND BLOCK MODE DATA
`TRANSFER, invented by Daryl Starr; and
`3. BUS LOCKING FIFO MULTI-PROCESSOR COMMUNICATIONS
`S7STEM UTILIZING PSEODOSYNCHRONOOS . HANDSHAKING AND
`BLOCK MODE DATA TRANSFER invented by Daryl D. Starr,
`William Pitts and Stephen Blightman.
`The above applications are all assigned to the
`assignee of
`the present
`invention and are all
`expressly incorporated herein by reference.
`
`SUBSTITUTE SHEET
`
`CQ-1002 / Page 962 of 1468
`
`
`
`W091/03788
`
`PCr/US90/04711
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`-2-
`
`BACKGROUNQ OF THE tNyENTION
`Field of the Inyention
`The invention relates to computer data networks,
`and more particularly,
`to network file
`server
`architectures for computer networks.
`
`l
`
`Description of the Related Art
`Over the past ten years, remarkable increases in
`hardware price/performance
`ratios have caused
`a
`startling shift in both technical and office computing
`environments. Distributedworkstation-servernetworks
`are displacing
`the once pervasive dumb
`terminal
`attached to mainframe or minicomputer.
`To date,
`however, network I/0 limitations have constrained the
`potential performance ava~lable to workstation users.
`This situation has developed in part because dramatic
`jumps
`in microprocessor performance have exceeded
`increases in network I/0 performance.
`In a computer network, individual user workstations
`are referred to as clients, and shared resources for
`filing,
`printing,
`data
`storage
`and wide-area
`communications are referred to as servers. Clients
`and servers are all considered nodes of a network.
`Client nodes use standard communications protocols to
`exchange service requests and responses with server
`nodes.
`Present-day network clients and servers usually run
`the DOS, Macintosh OS, OS/2, or Unix operating
`systems. Local networks are usually Ethernet or Token
`Ring at the high end, Arcnet in the midrange, or
`LocalTalk or StarLAN at the low end.
`The client(cid:173)
`server communication protocols are fairly strictly
`dictated by
`the operating system environment
`usually one of several proprietary schemes for PCs
`(NetWare,
`3Plus, Vines, LANManager,
`LANServer);
`AppleTalk for Macintoshes; and TCP/IP with NFS or RFS
`
`SUBSmUTE SHEET
`----·
`
`.
`
`·.:
`
`CQ-1002 / Page 963 of 1468
`
`
`
`W091/03788
`
`PCf/US90/04711
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`-3-
`
`for Unix. These protocols are all well-known in the
`industry.
`Unix client nodes typically feature a 16- or 32~
`bit microprocessor with 1-8 MB of primary memory, a
`640 x 1024 pixel display, and a built-in network
`interface. A 40-100 MB local disk is often optional.
`Low-end examples are 80286-based PCs or 68000-based
`Macintosh I's; mid-range machines include 80386 PCs,
`Macintosh II's, and 680XO-based Unix workstations;
`high-end machines include RISC-based DEC, HP, and Sun
`Unix workstations. Servers are typically nothing more
`than repackaged client nodes, configured in 19-inch
`racks rather than desk sideboxes. The extra space of
`a 19-inch rack is used for additional backplane slots,
`disk or tape drives, and power supplies.
`Driven
`by
`RISC
`and
`CISC microprocessor
`developments, client workstation performance has
`increased by more than a factor of ten in the last few
`years. Concurrently, these extremely fast clients
`have also gained an appetite for data that remote
`servers are unable to satisfy.
`Because the I/0
`shortfall is most dramatic in the Unix environment,
`the description of the preferred embodiment of the
`present invention will focus on Unix file servers.
`The architectural principles that solve
`the Unix
`server I/0 problem, however, extend easily to server
`performance bottlenecks in other operating system
`environments as well. Similarly, the description of
`the preferred embodiment will
`focus on Ethernet
`implementations, though the principles extend easily
`to other types of networks.
`In most Unix environments, clients and server!=~
`exchange file data using the Network File System
`("NFS"), a standard promulgated by Sun Microsystems
`and now widely adopted by the Unix community. NFS is
`defined in a document entitled, "NFS: Network File
`
`~P~S_TfTUTE SHEET
`
`...
`
`CQ-1002 / Page 964 of 1468
`
`
`
`1
`
`t
`
`.
`
`W091/03788
`
`PCf/US90/04711
`
`-:-4-
`
`System Protocol Specification,u Request For Comments
`(RFC) 1094, by Sun Microsystems, Inc.
`(March 1989).
`This document is incorporated herein by reference in
`its entirety.
`While simple and reliable, NFS is not optimal.
`Clients using NFS place considerable demands upon both
`networks and NFS servers supplying clients with NFS
`data.
`This demand is particularly acute for so(cid:173)
`called diskless clients that have no local disks and
`therefore depend on a file server for application
`binaries and virtual memory paging as well as data.
`For these Unix client-server configurations, the ten(cid:173)
`to-one increase in client power has not been matched
`by a ten-to-one increase in Ethernet capacity, in disk
`speed, or server disk-to-network I/O throughput.
`The result is that the number of diskless clients
`that a single modern high-end server can adequately
`support· has dropped to between 5-10, depending on
`client power and application workload.
`For clients
`containing small local di~ks for applications and
`paging, referred to as dataless clients, the client(cid:173)
`to-server ratio is about twice this, or between 10-
`20.
`low client/server ratios cause piecewise
`Such
`network configurations in which each local Ethernet
`contains isolated traffic for its own 5-10 (diskless)
`clients
`and
`dedicated
`server.
`For
`overall
`connectivity, these local networks are usually joined
`together with an Ethernet backbone or, in the future,
`with an FDDI backbone. These backbones are typically
`connected to the local networks either by IP routers
`or MAC-level bridges, coupling the local networks
`together directly, or by a Qecond server functioning
`as a network interface, coupling servers for all the
`local networks together.
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`SU~SJITUTE SHEET
`
`CQ-1002 / Page 965 of 1468
`
`
`
`W091/03788
`
`PCf/US90/04711
`
`-5-
`
`\
`
`In addition to performance considerations, the low
`client-to-server ratio creates computing problems in
`several additional ways:
`1.
`Sharing. Development groups of more than 5-
`10 people cannot share the same server, and thus
`cannot easily share files without file replication and
`manual, multi-server updates. Bridges or routers are
`a partial solution but inflict a performance penalty
`due to more network hops.
`System administrators must
`2. Administration.
`maintain many limited-capacity servers rather than a
`few more substantial servers. This burden includes
`network administration, hardware maintenance, and user
`account administration.
`3.
`File System BackuP· System administrators or
`operators must conduct multiple file system backups,
`which can be onerously time consuming tasks.
`It is
`also expensive to duplicate backup peripherals on each
`server (or every few servers if slower network backup
`is used).
`With only 5-10 clients per
`4.
`Price Per Seat.
`server, the cost of the server must be shared by only
`a small number of users. The real cost of an entry(cid:173)
`l.evel Unix workstation
`~ s
`therefore significantly
`qreater, often as much a:
`.40% greater, than the cost
`of the workstation alone.
`The widening I/O gap, as well as administrative and
`economic considerations, demonstrates a need for
`higher-performance, larger-capacity Unix file servers.
`Conversion of a display-less workstation into a server
`may address disk capacity issues, but .does nothing to
`address
`fundamental I/0 limitations.
`As an NFS
`server, the one-time workstation must sustain 5~10 or
`more times the network, disk, backplane, and file
`system throughput than it was designed to support as
`a client. Adding larger disks. more network adaptors,
`SUBSTITUTE SHEET
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`CQ-1002 / Page 966 of 1468
`
`
`
`::
`
`'\'
`
`W091/03788
`
`PCf/US90/04711
`
`-6-
`
`extra primary memory, or·even a faster processor do
`not resolve basic architectural I/0 constraintsi I/0
`throuqhput does n~t increase suffic·iently.
`Other prior art computer architectures, while not
`specifically designed as file servers, may potentially
`be used .as such.
`In one such well-known architecture,
`a CPU, a memory unit, and
`two I/O processors are
`connected to a single bus. One of the I/O processors
`operates a set of disk drives, and if the architecture
`is to be used as a server, the other I/O processor
`would be connected to a network. This architecture is
`not optimal as a file server, however, at least
`because the two I/O processors cannot handle network
`file requests without involving the CPU. All network
`file requests that are received by the network I/O
`processor are first transmitted to the CPU, which
`makes appropriate requests to the disk-I/O processor
`for satisfaction of the network request.
`In another such computer architecture, a disk
`controller CPU manaqes access to disk drives, and
`several other CPUs,
`three
`for example, may be
`clustered around the disk controller CPU. Each of the
`other CPUs can be connected to its own network. The
`network CPUs are each connected to the disk controller
`CPU as well as to each other for interprocessor
`communication.
`One of the disadvantaqes of this
`computer architecture is that each CPU in the system
`runs its own complete operatinq system. Thus, network
`file server requests must be handled by an operating
`system which is also heavily loaded with facilities
`and processes for performing a large number of other,
`non
`file-server
`tasks.
`Additionally.,
`the
`interprocessor communication is not optimized for file
`server type requests.
`In yet another computer architecture, a plurality
`of CPUs .• each having its· own cache memory for data and
`.·GUBSTITUTE SHEET
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`CQ-1002 / Page 967 of 1468
`
`
`
`W091/03788
`
`PCr/US90/04711
`
`-7-
`
`instruction storage, are connected to a common bus
`with a system memory and a disk controller. ~he disk
`controller and each of the CPUs have direct memory
`access to the system memory, and one or more of the
`CPOs can be connected to a network. This architecture
`is disadvantageous as a file server because, among
`other things, both file data and the instructions for
`the CPUs reside in the same system memory. ~here will
`be instances, therefore, in which the CPUs must stop
`running while they wait for large blocks of file data
`to be
`transferred between system memory and
`the
`network CPU.
`Additionally, as with both of the
`previously described computer architectures·,
`the
`entire operating system runs on each of the CPUs,
`including the network CPU.
`In yet another type of computer architecture, a
`large number of CPUs are connected together in a
`hypercube topology. One of more of these CPUs can be
`connected to networks, while another can be connected
`to disk drives.
`This
`architecture
`is also
`disadvantageous as a file server because, among other
`things, each processor runs
`the entire operating
`system.
`Interprocessor communication is also not
`optimal for file server applications.
`
`SUMMAR¥ OF ~HE INYEN~IQN
`The present
`invention
`involves a new, server(cid:173)
`specific I/0 architecture that is optimized for a Unix
`file server's most common actions -- file operations.
`Roughly stated, the invention involves a file server
`comprising
`network
`one
`or more
`architecture
`controllers, one or more file controllers, one or more
`storage processors, and a system or buffer memory, all
`connected over a message passing bus and operating in
`parallel with the Unix host processor. The network
`controllers each connect to one or more network, and
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`t.
`
`CQ-1002 / Page 968 of 1468
`
`
`
`W091103788
`
`PCI'/US90/047ll
`
`-8-
`
`provide all protocol processing between the network
`layer data format and an internal file server format
`for communicating client requests to other processors
`in the server. Only those data packets which cannot
`be interpreted by the network controllers, for example
`client requests to run a client-defined program on the
`server, are
`transmitted
`to
`the Unix host
`for
`processing.
`Thus
`the network controllers, file
`controllers and storage processors.contain only small
`parts of an overall operating system, and each is
`optimized for the particular type of work to which it
`is dedicated.
`Client requests for file operations are transmitted.
`to one of the file controllers which, independently of
`the Unix host, manages the virtual file system of a
`mass storage device which is coupled to the storage
`processors.
`The file controllers may also control
`data buffering between the storage processors and the
`network controllers, through the system memory. The
`file controllers preferably. each
`include a
`local
`buffer memory for caching file control information,
`separate from the system memory for caching file data.
`Additionally, the network controllers, file proc.essors
`and storage processors are all designed to avoid any
`instruction fetches from ·the system memory, instead
`keeping all instruction memory separate and local.
`This
`arrangement
`eliminates
`contention on
`the
`backplane between microprocessor instruction fetches
`and transmissions of message and file data.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`The invention will be described with respect to
`particular embodiments thereof, and reference will be
`made to the drawings, in.which:
`Fig. 1. is a block diagram of a prior art file
`server architecture;
`
`StiBSTITUTE SHEET
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`'
`
`.·.·.
`
`:>
`
`CQ-1002 / Page 969 of 1468
`
`
`
`W09l/03788
`
`PCriUS90/047ll
`
`-9-
`
`is a block diagram of a file server
`Fig. 2
`architecture according to the invention;
`Fig. 3 is a block diagram of one of the network
`controllers shown in.Fig. 2;
`Fig. 4 is a block diagram of one of the file
`controllers shown in Fig. 2;
`Fig. 5 is a block diagram of one of the storage
`processors shown in Fig. 2;
`Fig. 6 is a block dia~ram of one of the system
`memory cards shown in Fig. 2;
`the
`illustrating
`Figs.
`7A-C are a
`flowchart
`operation of a fast transfer protocol BLOCK WRITE
`cycle; and
`the
`illustrating
`flowchart
`Figs.
`SA-C are a
`operation of a fast transfer protocol BLOCK READ
`cycle.
`
`DETAILED DESCRIPTION
`an
`background,
`comparison purposes
`and
`For
`illustrative prior-art filEJ server architecture will
`first be described·with respect to Fig. 1. Fig. 1 is
`an overall block diagram of a conventional prior-art
`Unix-based file server for Ethernet networks.
`It
`consists of a host CPU card 10 with a
`single
`microprocessor .on board.
`The host CPU ·card 10
`connects to an Ethernet #1 12, and it connects via a
`memory management unit (MMU) 11 to a
`large memory
`array 16.
`The host CPU card 10 also drives a
`keyboard, a video display, and two RS232 ports (not
`shown) .
`It also connects via the MMU 11 and a
`standard
`32-bit VME bus 20 to various peripheral
`including
`an
`SMD disk
`·controller
`22
`devices,
`controlling one or two disk drives 24, a SCSI host
`adaptor 26 connected
`to a SCSI bus 28,
`a
`tape
`controller 30 connected to a quarter-inch tape drive
`32, and possibly a network #2 controller 34 connected
`
`SUBSTITUTE SHEET
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`..
`
`CQ-1002 / Page 970 of 1468
`
`
`
`W091103788
`
`PCf/US90/04711
`
`··· ..
`
`. ·
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`-10-
`
`layers are
`
`to a second Ethernet 36. The SMD disk controller 22
`can communicate with memory array 16 by direct memory
`access via bus io and MMU 11, with either the disk
`controller or the MMU acting as a bus master. This
`confiquration is illustrative; many variations are
`available.
`The system communicates over the Ethernets using
`industry standard TCP/IP and NFS p+otocol stacks. A
`description of protocol stacks in general can be found
`in Tanenbaum,
`••computer Networks"· (Second Edition,
`Prentice Hall: 1988). File server protocol stacks are
`described at pages 535-546. The Tanenbaum reference
`is incorporated herein by reference.
`Basically,
`the
`following protocol
`implemented in the apparatus of Fig. 1:
`Network Layer. The network
`layer converts data
`packets between a formal specific to Ethernets and a
`format which is independent of the particular type of
`network used.
`the Ethernet-specific format which is
`used in the apparatus of Fig. 1 is described in
`Hornig,
`0 A Standard For The Transmission of
`IP
`Datagrams Over Ethernet. Networks," R.FC 8 94
`(April
`1984), which is incorporated herein by reference.
`Tbe Internet Protocol· CIPl Layer. This
`layer
`provides the functions necessary to deliver a package
`of bits (an internet datagram)
`from a source to a
`destination over an interconnected system of networks.
`For messages to be sent from the file server to a
`client, a higher level in the server calls the IP
`module, providing
`the
`internet address of
`the
`destination client and the message to transmit. The
`IP module performs any required fragmentation of the
`message to accommodate packet size limitations of any
`intervening gateway, adds internet headers to each
`fragment, and calls on.the network layer to transmit
`the resulting interne_t datagrams. The internet header
`
`~U~$J1TUTE SHEET
`
`I
`. ~
`
`l .
`
`.... ·
`
`CQ-1002 / Page 971 of 1468
`
`
`
`W091/03788
`
`PCf/US90/04711
`
`-11-
`
`address
`network destination
`local
`a
`includes
`(translated from the internet address) as well as
`other parameters.
`For messages received by the IP layer from the
`network layer,
`the IP module determines
`from
`the
`internet address whether
`the datagram
`is
`to be
`forwarded to another host on another network, for
`example on a second Ethernet such as 36 in Fig. 1, or
`whether it is intended for the server itself.
`If it
`is intended for another host on the second network,
`the IP module determines a local net address for the
`destination and calls on the local network layer for
`that network to send the datagram. If the datagram is
`intended for an application program within the server,
`the IP layer strips off the header and passes the
`remaining portion of the message to the appropriate
`next higher layer.
`The internet protocol· standard
`used in the illustrative apparatus of Fig. 1 is
`specified in Information Sciences Institute, "Internet
`Protocol,
`DARPA
`Internet
`Program
`Protocol
`Specification,a RFC 791
`(September 1981), which is
`incorporated herein by reference.
`TCP/UDP Layer. This layer is a datagram service
`with more elaborate packaging and addressing options
`than the IP
`layer.
`·For example, whereas an IP
`datagram can hold about 1,500 bytes and be addressed
`to hosts, UDP datagrams can hold about 64KB and be
`addressed to a particular port within a host. TCP and
`UDP
`are alternative protocols at
`this
`layer;
`applications requiring ordered reliable deli very of
`_streams of data may use TCP, whereas applications
`(such as NFS) which do not require ordered and
`reliable delivery may use UDP.
`The prior art file server of Fig. 1 uses both TCP
`and UDP.
`.It uses UDP
`for file server-related
`services, and uses TCP for certain other services
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`CQ-1002 / Page 972 of 1468
`
`
`
`·:.:
`
`W09l/03788
`
`-~
`
`PCf/US90/047ll
`
`..
`
`: ·:
`
`-12-
`
`which the server provides to network clients. The UDP
`is specified in Postel, "User Datagram Protocol, .. RFC
`768 (August 28, 1980), which is incorporated herein by
`reference. TCP is specified in Postel, 0 Transmission
`Control Protocol,n RFC 761 (January 1980) and RFC 793
`(September 1981), which is also incorporated herein by
`reference.
`functions
`provides
`layer
`XDR/RPC Layer. This
`to
`run
`a
`level programs
`callable
`from higher
`designated procedure on a ·remote machine.
`It also
`provides the decoding necessary to permit a client
`machine to execute a procedure on the server.
`For
`example, a caller process in a client node may send a
`call message to the server of Fig. 1.
`The call
`message
`includes a
`specification of
`the desired
`procedure, and its parameters. The message is passed
`up
`the stack to the RPC
`layer, which calls the
`appropriate procedure within the server. When the
`procedure is complete, a reply message is generated
`and RPC passes it back down the stack and over the
`network to the caller client. RPC is described in Sun
`Microsystems,
`Inc.,
`0 RPC: Remote Procedure ca~l
`Protocol Specification, Version 2," RFC 1057 (June
`1988), which is incorpor~ted herein by reference.
`RPC uses
`the XDR external data representation
`standard to represent information passed to and from
`the underlying UDP
`layer.
`XDR is merely a data
`encoding
`standard, useful
`for
`transferring data
`between different computer architectures.
`Thus, on
`the network side of the XDR/RPC layer, information is
`machine-independent; on the host application side, it
`may not be.
`XDR is described in Sun Microsystems,
`Inc., "XDR: External Data Representation Standard, ..
`RFC 1014 (June 1987), which is incorporated herein by
`reference.
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`- ... -- .....
`SUBSffiUTE SHEET
`
`.
`
`.
`
`.
`
`. ·· ..
`
`CQ-1002 / Page 973 of 1468
`
`
`
`W091/03788
`
`PCf/US90/04711
`
`-13-
`
`("network file system•)
`The NPS
`NPS Layer.
`layer is one of the programs available on the server
`which an RPC request can call.
`The combination of
`host address, program number, and procedure number in
`an RPC request can specify one remote NFS procedure to
`be called.
`Remote procedure calls to NFS on the file server of
`Fig. 1 provide transparent, stateless, remote access
`to shared files on the disks 24. NFS assumes a file
`system that is hi~rarchical, with directories as all
`but the bottom level of files. Client hosts can call
`any of about 20
`.NFS procedures
`including
`such
`procedures as reading a specified number of bytes from
`a specified file; writing a specified number of bytes
`to a specified file; creating, renaming and removing
`specified .files; parsing directory trees; creating and
`removing directories; and reading and setting file
`attributes. The location on disk to which and from
`which data is stored and retrieved is always specified
`in logical terms, such as by a file handle or !node
`designation and a byte offset.
`The details of the
`actual data storage are hidden from the client. The
`NFS procedures, together with possible higher level
`modules
`such as Unix VFS
`and UPS, perform all
`conversion of logical data addresses to physical data
`·addresses such as drive, head,
`track and sector
`identification. NFS is specified in sun Microsystems,
`Inc. ,
`•NFs:
`Network
`File
`System
`Protocol
`Specification,a RFC 1094 (March 1989), incorporated
`herein by reference.
`With the possible exception of the network layer,
`all the protocol processing described above is done . in
`software, by a single processor in the host CPU card
`10.
`That is, when an Ethernet packet arrives on
`Ethernet 12, the host CPU 10 performs all the protocol
`processing in the NFS stack, as well as the protocol
`
`5
`
`10.
`
`15
`
`20
`
`25
`
`•
`
`30
`
`35
`
`SUSSTITUTE SHEET
`
`CQ-1002 / Page 974 of 1468
`
`
`
`W09l/03788
`
`PCTIUS90/04711
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`-14-
`
`processing for any other appl'ication which may be
`running on the host 10. NFS procedures are run on the
`host CPU 10, with access to memory 16 for both data
`and program code being provided.via MMU 11. Logically
`specified data addresses are converted to a much more
`physically specified form and communicated to the SMD
`disk controller 22 or the SCSI bus 28, via the VME bus
`20, and all disk caching is done by the host CPU 10
`through the memory 16. The host CPU card 10 also runs
`procedures for performing yarious other functions of
`the file server, communicating with tape controller 30
`via the VME bus 20. Among these are client-defined
`remote procedures requested by client workstations.
`If the server serves a second Ethernet 36, packets
`from that Ethernet are transmitted to the host CPU 10
`over the same VME bus 20 in the form of IP dataqrams.
`Again, all protocol processing except for the network
`layer is performed by software processes running on
`the host CPU 10.
`In addition, the protocol processing
`for any message that is to ~e sent from the server out
`on either of the Ethernets 12 or 36 is also done by
`processes running on the host CPU 10.
`It can be seen that the host CPU 10 performs an
`enormous amount of processing of data, especially if
`5-10 clients on each of the two Ethernets are making
`file server requests and need to be sent responses on
`a frequentbasis. The host CPU 10 l:¥ns a multitasking
`Unix operating system, so each incoming request need
`not wait for the previous request to be completely
`processed
`and
`returned before being processed.
`Multiple processes are activated on the host CPU 10
`for performing different stages of. the processing of
`different requests, so many requests may be in process
`at the same time. But there is only one CPU on the
`card 10, so the processing of these requests is not
`accomplished
`in a
`truly parallel manner.
`The
`
`SUBSTI11IfE SHEEI
`--·
`
`CQ-1002 / Page 975 of 1468
`
`
`
`W091103788
`
`PCI'/US90/047ll
`
`-15-
`
`processes are instead merely time sliced. The CPU 10
`therefore
`represents
`a major bottleneck
`in
`the
`processinq of file server requests.
`Another bottleneck occurs in MMU 11, which must
`transmit both instructions and data between the CPU
`card 10 and the memory 16. All data flowinq between
`the disk drives and the network passes throuqh this
`interface at least twice.
`Yet another bottleneck can occur on the VME bus 20,
`which must transmit data among the SMD disk controller
`22, the SCSI host adaptor 26, the host CPU card 10,
`and possibly the network 12 controller 24.
`
`PBEFERBED EMBOPIMENT-QVEBALL HARQWARE ARCHITECTURE
`In Fiq. 2 there is shown a block diaqram of a
`network file server 100 according to the invention.
`It can
`include multiple network controller
`(NC)
`boards, one or more file controller (FC) boards, one
`or more storaqe processor (SP) boards, multiple system
`memory boards, and one or more host processors. The
`particular embodiment shown in Fiq. 2 includes four
`network
`controller boards
`110a-110d,
`two
`file
`controller boards 112a-112b,
`two storaqe processors
`114a-114b, four system memory cards 116a-116d for a
`total of 192MB of memory, and one local host processor
`118.
`The boards 110, 112, 114, 116 and 118 are
`connected toqether over a VME bus 120 on which an
`enhanced block transfer mode as described in the
`ENHANCED VMEBUS PROTOCOL application identified above
`may be used. Each of the four network controllers 110
`shown
`in Fiq. 2 can be connected to up
`to
`two
`Ethernets 122, for a total capacity of 8 Ethernets
`122a-122h.
`Each of
`the storaqe processors 114
`operates ten parallel SCSI busses, nine of which can
`each support up to three SCSI disk drives each. The
`tenth SCSI channel on each of the storaqe processors
`
`SUBSJITUTE ~HEEJ
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`CQ-1002 / Page 976 of 1468
`
`
`
`- ._
`
`.· -
`
`W091103788
`
`·~
`
`PCT/US90/04711
`
`-16-
`
`and other SCSI
`
`for
`
`tape drives
`
`is used
`114
`peripherals.
`The host 118 is essen~ially a standard SunOs Unix
`processor 1 providing all the standard Sun Open Network
`Computing (ONC) services except NFS and IP routing.
`Importantly 1 all network requests to run a user(cid:173)
`defined procedure are passed
`to
`the host
`for
`execution. Each of the NC boards 110, the FC boards
`112 and the SP boards 114 includes its own independent
`32-bit microprocessor. These boards essentially off(cid:173)
`load from the host processor 118 virtually all of the
`NFS and disk processing. Since the vast majority of
`messages to and from clients over the Ethernets 122
`involve NFS requests and responses, the processing of
`these requests in parallel by the NC, FC and SP
`processors, with minimal involvement by the local host
`118, vastly improves file server performance. Unix
`is explicitly eliminated from virtually all network,
`file, a