`
`...... ~~1a Propriete
`~ , ·fntel~elle
`. ~
`du canada
`/
`Un organisme
`d'lndustrie Canada
`
`Canadian
`Intellectual Property
`Office
`An agency of
`Industry Canada
`
`(j/~
`( {
`CA 2066443 C 2003/10f21_,6
`(11)(21) 2 066 443
`(12>BREVET CANADIEN
`CANADIAN PATENT
`(13) c
`
`(86) Date de depot PCTIPCT Filing Date: 1990/08f20
`(87) Date publication PCTIPCT Publication Date: 1991/03!21
`(45) Date de delivrancellssue Date: 2003/1 Of21
`(85) Entree phase nationale/National Entry: 1992/03/03
`(86) N" demande PCTIPCT Application No.: US 1990/004711
`(87) N" publication PCTIPCT Publication No.: 1991/003788
`(30) Priorite!Priority: 1989/09/08 (404,959) us
`
`(51) Cl.lnt 5Jlnt.CI.5 G06F 15/16, H04L 12f28
`(72) lnventeursllnventors:
`ROW, EDWARD JOHN, US;
`BOUCHER, LAURENCE B., US;
`PITIS, WILLIAM M., US;
`BLIGHTMAN, STEPHEN E., US
`(73) Proprietaire/Owner:
`AUSPEX SYSTEMS, INC., US
`(74) Agent: GOWLING LAFLEUR HENDERSON LLP
`
`(54) Titre : ARCHITECTURE DE SERVEUR DE FICHIERS POUR RESEAU D'ENTREE-SORTIE PARALLELE
`(54) Title: PARALLEL 1/0 NEWTORK FILE SERVER ARCHITECTURE
`
`llOcl
`UOd"'\
`\
`r-1-
`
`122o.
`
`c 122
`e
`122
`122g ~
`1
`22b......-.:l.
`12
`2cl
`r
`122
`122h
`~
`
`(
`J
`
`110o.
`110b., '\
`1-
`
`NET'w'DRK
`CONTROLLER
`
`I
`
`I
`
`1
`
`116
`116b 0.'~
`;...
`116c
`116cl "")_ ,I-
`
`SYSTEM
`MEMORY
`
`LOCAL
`HDST
`
`I
`
`~100
`
`/118
`
`/1~0
`
`}
`
`I
`
`f'ILE
`CONTROLLER
`
`,...,
`112b _. _ /
`112o.
`
`114b
`r vl14o.
`
`STORAGE
`PROCESSOR
`
`I I
`~ I
`I
`LLJ I I I I I I I I
`
`r-"
`
`-
`
`II II
`J J I I I I I I I I
`LI I
`I I I I
`
`(57) Abrege/Abstract
`A file server architecture is disclosed, comprising as separate processors, a network controller unit (110), a file controller unit
`(112) and a storage processor unit (114). These units incorporate their own processors, and operate in parallel with a local Unix
`host processor (118). All networks are connected to the network controller unit (110), which performs all protocol processing up
`through the NFS layer. The virtual file system is implemented in the file controller unit (112) and the storage processor (114)
`provides high-speed multiplexed access to an array of mass storage devices. The file controller unit (112) controls file
`information caching through its own local cache buffer, and controls disk data caching through a large system memory which is
`accessible on a bus by any of the processors.
`
`d •••
`C
`ana a http://opic.gc.ca ·Ottawa-Hull KIA OC9 • http://cipo.gc.ca
`
`OPIC · CIPO 191
`
`0 PIC
`
`CI PO
`
`CQ-1002 / Page 1469 of 1790
`
`
`
`~ .-'
`
`Per
`
`•: " a , .. )!
`~ U' u 0 {! :t.)
`WORLD INTELLECTUAL PROPERTY ORGANIZATION
`International Bureau
`
`" <';
`
`INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREA1Y (PC1)
`
`(51) lotematioaal Patent Oassificatioo 5 :
`
`(11) Iotemational Publication Number:
`
`WO 91/03788
`
`G06F 15/16
`
`AI
`
`(43) lotematiooal Publication Date:
`
`21 March 1991 (21.03.91)
`
`(ll) lotemational Application Number:
`
`PCT/US90/04711
`
`(l2) lotemational Flliog Date:
`
`20 August 1990 (20.08.90)
`
`(74)Agents: FLIESLER., Martin, C. et al.; Fliesler, Dubb,
`Meyer & Lovejoy, 4 Embarcadero Center, Suite 400, San
`Francisco, CA 94111 (US).
`
`. '
`
`(30) Priority data:
`404,959
`
`8 September 1989 (08.09.89) US
`
`(71) Applicant: AUSPEX SYSTEMS, INC. [US/US]; 2952
`Bunker Hill Lane, Santa Clara, CA 95054 (US).
`
`(72) ln•eotors: ROW, Edward, John ; 468 Mountain Laurel
`Court, Moutain View, CA 94064 (US). BOUCHER,
`Laurence, B. ; 20605 Montalvo Heights Drive, Saratoga,
`CA 95070 (US). PITTS, William, M. ; 780 Mora Drive,
`Los Altos, CA 94022 (US). BLIGHTMAN, Stephen, E. ;
`775 Salt Lake Drive, San Jose, CA 95133 (US).
`
`(81) Designated States: AT (European patent), AU, BE (Euro(cid:173)
`pean patent), CA, CH (European patent), DE (Euro(cid:173)
`pean patent)•, OK (European patent), ES (European pa(cid:173)
`tent), FR (European patent), GB (European patent), IT
`(European patent), JP, KR, LU (European patent), NL
`(European patent), SE (European patent).
`
`Published
`With inlemational seareh report.
`Before the expiration of the time limit for amending the
`claims and to be republished in the event of the receipt of
`amendments.
`
`(54)ntle: PARALLEL 1/0 NETWORK FILE SERVER ARCHITECTURE
`
`t16co--...
`1160-..,,....._ \ __ ,
`U6c...,
`U6d~
`
`SYSTEM
`HEHORY
`
`~100
`
`I
`
`I I
`
`fiLE
`CONTRDLL£R
`
`/
`112b--
`1l2ca-
`
`I
`I.. ...._ __ _,
`
`PROCESSOR
`
`UltAL
`HOST
`
`I
`
`I
`
`I
`I
`
`(57) Abstract
`
`A file server architecture is disclosed. comprising as separate processors. a network controller unit (I 10), a file controller
`unit (112) and a storage processor unit (114). These units incorporate their own processors, and operate in parallel with a local
`Unix host processor (118). AU networks are connected to the network controller unit (110), which performs all protocol process(cid:173)
`ing up through the NFS layer. The virtual file system is implemented in the file controller unit (112) and the storage processor
`(114) provides high-speed multiplexed aa:ess to an amy of mass storage devices. The file controller unit (I 12) controls file infor(cid:173)
`mation caching through its own local cache buffer. and controls disk data caching through a large system memory which is acces·
`sible on a bus by any of the processors.
`
`• See back of page
`
`CQ-1002 / Page 1470 of 1790
`
`
`
`. ~0 91/03788
`
`PCT/US90/04711
`
`PABALLEL I/0 NETWQRK FILE SERYER ARCHITECTURE
`
`-1-
`
`.s
`
`10
`
`15
`
`20
`
`25
`
`The present application is related to the following
`U.S. Patent Applications, all filed concurrently
`herewith:
`SYSTEM
`OPERATING
`FACILITY
`1. MULTIPLE
`ARCHITECTURE, invented by David Hitz, Allan Schwartz,
`James Lau and Guy Harris;
`UTILIZING
`PROTOCOL
`ENHANCED
`VMEBUS
`2.
`PSEUDOSYNCHRONOUS HANDSHAKING AND BLOCK MODE DATA
`TRANSFER, invented by Daryl Starr; and
`3. BUS LOCKING FIFO MULTI-PROCESSOR COMMUNICATIONS
`SYSTEM UTILI ZING PSEUDOSYNCHRONOUS HANDSHAKING AND
`BLOCK MODE DATA TRANSFER invented by Daryl D. Starr,
`William Pitts and Stephen Blightman.
`The above applications are all assigned to the
`assignee of
`the present
`invention and are all
`expressly incorporated herein by reference.
`
`au&STITUTE SHEET
`
`CQ-1002 / Page 1471 of 1790
`
`
`
`W091/03788
`
`PCr/US90/04711
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`-2-
`
`BACKGRQUNQ OF THE iNVENTION
`Field of the Inyention
`The invention relates to computer data networks,
`and more particularly,
`to network
`file
`server
`architectures for computer networks.
`
`pescription of the Belated Art
`Over the past ten years, remarkable increases in
`hardware price/performance
`ratios have caused
`a
`startling shift in both technical and office computi~g
`environments. Distributed workstation-server networks
`are displacing
`the once pervasive dumb
`terminal
`attached to mainframe or ·minicomputer.
`To date,
`however, network I/O limitations have constrained the
`potential performance available to workstation users.
`This situation has developed in part because dramatic
`jumps
`in microprocessor performance have exceeded
`increases in network I/0 performance.
`In a computer network, individual user workstations
`are referred to as clients, and shared resources for
`filing,
`printing,
`data
`storage
`and wide-area
`communications are referred to as servers. Clients
`and servers are all considered nodes of a network.
`Client nodes use standard communications protocols to
`exchange service requests and responses with server
`nodes.
`Present-day network clients and servers usually run
`the DOS, Macintosh OS, OS/2, or Unix operating
`systems. Local networks are usually Ethernet or Token
`Bing at the high end, Arcnet in the midrange, or
`LocalTalk or StarLAN at the low end.
`The client(cid:173)
`server communication protocols are fairly strictly
`dictated by
`the operating system environment
`usually one of several proprietary schemes for PCs
`(NetWare,
`3Plus, Vines,
`LANManager,
`LANServer);
`AppleTalk for Macintoshes; and TCP/IP with NFS or RFS
`
`SUBSTITUTE SHEET
`--·
`
`CQ-1002 / Page 1472 of 1790
`
`
`
`-~0 91/03788
`
`PCT/US90/04711
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`-3-
`
`for Unix. These protocols are all well-known in the
`industry.
`Unix client nodes typically feature a 16- or 32-
`bit microprocessor with 1-8 MB of primary memory, a
`640 x 1024 pixel display, and a built-in network
`interface. A 40-100 MB local disk is often optional.
`Low-end examples are 80286-based PCs or 68000-based
`Macintosh I's; mid-range machines include 80386 PCs,
`Macintosh II's, and 680XO-based Unix workstations;
`high-end machines include RISC-based DEC, HP, and Sun
`Unix workstations. Servers are typically nothing more
`than repackaged client nodes, configured in 19-inch
`racks rather than desk sideboxes. The extra space of
`a 19-inch rack is used for additional backplane slots,
`disk or tape drives, and power supplies.
`Driven
`by
`RISC
`and
`CISC microprocessor
`developments, client workstation performance has
`increased by more than a factor of ten in the last few
`years. Concurrently, these extremely fast clients
`have also gained an appetite for data that remote
`servers are unable
`to satisfy.
`Because
`the I/O
`shortfall is most dramatic in the Unix environment,
`the description of the preferred embodiment of the
`present invention will focus on Unix file servers.
`The architectural principles that solve
`the Unix
`server I/O problem, however, extend easily to server
`performance bottlenecks
`in other operating system
`environments as well. Similarly, the description of
`the preferred embodiment will
`focus on Ethernet
`implementations, though the principles extend easily
`to other types of networks.
`In most Unix environments, clients and servers
`exchanqe file data using
`the Network File System
`(•NFS•), a standard promulgated by Sun Microsystems
`and now widely adopted by the Unix community. NFS is
`defined in a document entitled, "NFS: Network File
`
`~UBS.TITUTf SHEET
`
`CQ-1002 / Page 1473 of 1790
`
`
`
`r
`
`W091/03788
`
`PCf/US90/04711
`
`-4-
`
`System Protocol Specification,M Request For Comments
`Inc.
`(March 1989) .
`( RFC) 1094 I by Sun Microsystems 1
`This document is incorporated herein by reference in
`its entirety.
`While simple and reliable, NFS is not optimal.
`Clients using NFS place considerable demands upon both
`networks and NFS servers supplying clients with NFS
`data.
`This demand is particularly acute for so(cid:173)
`called diskless clients that have no local disks and
`therefore depend on a file server for application
`binaries and virtual memory paging as well as data.
`For these Unix client-server configurations, the ten(cid:173)
`to-one increase in client power has not been matched
`by a ten-to-one increase in Ethernet capacity, in disk
`speed, or server disk-to-network I/O throughput.
`The result is that the number of disklt:lSS clients
`that a single modern high-end server can adequately
`support has dropped to between 5-10, depending on
`client power and application workload.
`For clients
`· containing small
`local disks for applications and
`paging, referred to as dataless clients, the client(cid:173)
`to-server ratio is about twice this, or between 10-
`20.
`low client/server ratios cause piecewise
`Such
`network configurations in which each local Ethernet
`contains isolated traffic for its own 5-10 (diskless)
`clients
`and
`dedicated
`server.
`For
`overall
`connectivity, these local networks are usually joined
`together with an Ethernet backbone or, in the future,
`with an FDDI backbone. These backbones are typically
`connected to the local networks either by IP routers
`or MAC-level bridges, coupling the local networks
`together directly, or by a Qecond server functioning
`as a network interface, coupling servers for all the
`local networks together.
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`SU~STITUT£ SHEET
`
`CQ-1002 / Page 1474 of 1790
`
`
`
`~091/03788
`
`PCf /US90/04711
`
`-5-
`
`In addition to performance considerations, the low
`client-to-server ratio creates computing problems in
`several additional ways:
`1.
`Sharing. Development groups of more than 5-
`10 people cannot share the ·same server, and thus
`cannot easily share f~les without file replication and
`manual, multi-server updates. Bridges or routers are
`a partial solution but inflict a performance penalty
`due to more network hops.
`System administrators must
`2. Administration.
`maintain many limited-.capacity servers rather than a
`few more substantial servers. This burden includes
`network administration, hardware maintenance, and user
`account administration.
`3.
`File System Backup. System administrators or
`operators must conduct multiple file system .backups,
`which can be onerously time consuming tasks.
`It is
`also expensive to duplicate backup peripherals on each
`server (or every few servers if slower network backup
`is used).
`With only 5-10 clients per
`4.
`Price Per Seat.
`server, the cost of the server must be shared by only
`a small number of users. The real cost of an entry(cid:173)
`level Unix workstation is therefore significantly
`greater, often as much as 140\ greater, than the cost
`of the workstation alone.
`The widening I/0 gap, as well as administrative and
`economic considerations, demonstrates a need
`for
`higher-performance, larger-capacity Unix file servers.
`Conversion of a display-less workstation into a server
`may address disk capacity issues, but does nothing to
`As an NFS
`address
`fundamental
`I/O
`limitations.
`server, the one-time workstation must sustain 5-10 or
`more
`times the network, disk, backplane, and file
`system throughput than it was designed to support as
`a client. Addin~ larger disks.~ more network adaptors,
`SUIISTITUTE SHEET
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`CQ-1002 / Page 1475 of 1790
`
`
`
`W091103788
`
`PCT/US90/04711
`
`-6-
`
`extra primary memory, or even a faster processor do
`not resolve basic architectural I/0 constraints; I/O
`throughput does not increase sufficiently.
`Other prior art computer architectures, while not
`specifically designed as file servers, may potentially
`be used as such.
`In one such well-known architecture,
`a CPU, a memory unit, and
`two
`I /0 processors are
`connected to a single bus. One of the I/0 processors
`operates a set of disk drives, and if the architecture
`is to be used as a server, the other I/O processor
`would be connected to a network. This architecture is
`not optimal as a file server, however, at least
`because the two I/0 processors cannot handle network
`file requests without involving the CPU. All network
`file requests that are received by the network I/0
`processor are first transmitted to the CPU, which
`makes appropriate requests to the disk-I/O processor
`for satisfaction of the network request.
`In another such computer architecture, a disk
`controller CPU manages access to disk drives, and
`several other CPUs,
`three
`for example, may be
`clustered around the disk controller CPU. Each of the
`other CPUs can be connected to its own network. The
`network CPUs are each connected to the disk controller
`CPU as well as to each other for interprocessor
`communication.
`One of the disadvantages of this
`computer architecture is that each CPU in the system
`runs its own complete operating system. Thus, network
`file server requests must be handled by an operating
`system which is also heavily loaded with facilities
`and processes for performing a large number of other,
`non
`file- server
`tasks.
`·Additionally,
`the
`interprocessor communication is not optimized for file
`server type requests.
`In yet another computer architecture, a plurality
`of CPUs, each having its own cache memory for data and
`SUBSTITUTE SHEET
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`CQ-1002 / Page 1476 of 1790
`
`
`
`~091/03788
`
`PCT/US90/04711
`
`-7-
`
`instruction storage, are connected to a common bus
`with a system memory and a disk controller.
`.The disk
`controller and each of the CPUs have d.irect memory
`access to the system memory, and one or more of the
`CPUs can be connected to a network·. This architecture
`is disadvantageous as a file server because, among
`other things, both file data and the instructions for
`the CPUs reside in the same system memory. There will
`be instances, therefore, in which the CPUs must stop
`running while they wait for large blocks of file data
`to be
`transferred between system memory and
`the
`network CPU.
`Additionally, as with both of the
`previously described computer architectures,
`the
`entire operating system runs on each of the CPUs,
`including the network CPU.
`In yet another type of computer architecture, a
`large number of CPUs are connected together in a
`hypercube topology. One of more of these CPUs can be
`connected to networks, while another can be connected
`is also
`to disk drives.
`This
`architecture
`disadvantageous as a file server because, among other
`things, each processor runs
`the entire operating
`system.
`Interprocessor communication is also not
`optimal for file server applications.
`
`SUMKARY OF THE INVENTION
`server(cid:173)
`The present
`invention
`involves a new,
`specific I/0 architecture that is optimized for a Unix
`file server's most common actions -- file operations.
`Roughly stated, the invention involves a file server
`architecture
`comprising
`one
`or more
`network
`controllers, one or more file controllers, one or more
`storage processors, and a system or buffer memory, all
`connected over a message passing bus and operating in
`parallel with the Unix host processor; The network
`controllers each connect to one or more network, and
`
`SU~$TITUTE SREEI
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`CQ-1002 / Page 1477 of 1790
`
`
`
`W09J/03788
`
`,., u
`•)t'664~~
`!l d
`
`PCI'/US90/04711
`
`-8-
`
`provide all protocol processing between the network
`layer data format and an internal file server format
`for communicating client requests to other processors
`in the server. Only those data packets which cannot
`be interpreted by the network controllers, for example
`client requests to run a client-defined program on the
`server,
`are
`transmitted
`to
`the Unix host
`for
`processing.
`Thus
`the network controllers,
`file
`controllers and storage processors contain only small
`·parts of an overall operating system, and each is
`optimized for the particular type of work to which it
`is dedicated.
`Client requests for file operations are transmitted
`to one of the file controllers which, independently of
`the Unix host, manages the virtual file syste~ of a
`mass storage device which is coupled to the storage
`processors.
`The file controllers may also control
`data buffering between the storage processors and the
`network controllers, through the system memory. The
`file controllers preferably each
`include a
`local
`buffer memory for caching file control information,
`separate from the system memory for caching file data.
`Additionally, the network controllers, file processors
`and storage processors are all designed to avoid any
`instruction fetches from the system memory, instead
`keeping all instruction memory separate and local.
`This
`arrangement
`eliminates
`contention
`on
`the
`backplane between microprocessor instruction fetches
`and transmissions of message and file data.
`
`BRIEF DESCRIPTION OF THE DBAWINGS
`The invention will be described with respect to
`particular embodiments thereof, and reference will be
`made to the drawings, in which:
`. Fig. 1. is a block diagram of a prior art file
`server architecture;
`
`SUBSTITUTE SHEET
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`CQ-1002 / Page 1478 of 1790
`
`
`
`W091/03788
`
`-9-
`
`file server
`
`is a block diagram of a
`Fig. 2
`architecture according to the invention;
`Fig. 3 is a block diagram of one ~f the network
`controllers shown in Fig. 2;
`Fig. 4 is a block diagram of one of the file
`controllers shown in Fig. 2;
`Fig. 5 is a block diagram of one of the storage
`processors shown in Fig. 2;
`Fig. 6 is a block diagram of one of the system
`memory cards shown in Fig. 2;
`the
`illustrating
`Figs.
`7A-C are
`a
`flowchart
`operation of a fast transfer protocol BLOCK WRITE
`cycle; and
`Figs.
`8A-C
`operation of a
`cycle.
`
`the
`illustrating
`flowchart
`are a
`fast transfer protocol BLOCK READ
`
`DETAILED DESCRIPTION
`an
`background,
`comparison purposes
`and
`For
`illustrative prior-art file 1 server architecture will
`first be described·with respect to Fig. 1. Fig. 1 is
`an overall block diagram of a conventional prior-art
`Unix-based file server for Ethernet networks.
`It
`consists of a host CPU card 10 with a
`single
`microprocessor on board.
`The host CPU card 10
`connects to an Ethernet 11 12, and it connects via a
`memory management unit ( MMU >. 11
`to a
`large memory
`array 16.
`The host CPU card 10 also drives a
`keyboard, a video display, and two RS232 ports (not
`shown).
`It also ·connects via
`the MMU 11 and a
`standard 32-bi t VME bus
`20
`to various peripheral
`SMD
`disk controller
`22
`devices,
`including
`an
`controlling one or two disk drives 24, a SCSI host
`adaptor 26 connected
`to a SCSI bus 28,
`a
`tape
`controller 30 connected to a quarter-inch tape drive
`32, and possibly a network 12 controller 34 connected
`
`8U&STITUTE SHEET
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`CQ-1002 / Page 1479 of 1790
`
`
`
`W091/0J788
`
`PCT/US90/047J I
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`-10-
`
`layers are
`
`to a second Ethernet 36. The SMD disk controller 22
`can communicate with memory array 16 by direct memory
`access via bus 20 and MMU 11, with either the disk
`controller or the MMU acting as a bus master. This
`configuration is illustrative; many variations are
`available.
`The system communicates over the Ethernets using
`industry standard TCP/IP and NFS protocol stacks. A
`description of protocol stacks in general can be found
`in Tanenbaum,
`"Computer Networks"
`(Second Edition,
`Prentice Hall: 1988). File server protocol stacks are
`'
`described at pages 535-546. The Tanenbaum reference
`is incorporated herein by reference.
`Basically,
`the
`following protocol
`implemented in the apparatus of Fig. 1:
`Network Layer. The network
`layer converts data
`packets between a formal specific to Ethernets and a
`format which is independent of the particular type of
`network used.
`the Ethernet-specific format which is
`used in the apparatus of Fig. 1 is described in
`Hornig,
`"A Standard For The Transmission of
`IP
`Datagrams Over Ethernet Networks," R.FC 894
`(April
`1984), which is incorporated herein by reference.
`The Internet Protocol (IPl Layer. This
`layer
`provides the functions necessary to deliver a package
`of bits (an internet dataqram)
`from a source to a
`destination over an interconnected system of networks.
`For messages to be sent from the file server to a
`client, a hiqher level in the server calls the IP
`module, providing
`the
`internet address of
`the
`destination client and the message to transmit. The
`IP module performs any required fragmentation of the
`message to accommodate packet size limitations of any
`intervening gateway, adds internet headers to each
`fragment, and calls on the network layer to transmit
`the resultinq internet dataqrams. The internet header
`
`SUBSTTnUTE SHEE~
`.
`-·- .
`I
`
`CQ-1002 / Page 1480 of 1790
`
`
`
`!VO 91/03788
`
`PCT/US90/0471 I
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`-11-
`
`address
`network destination
`local
`includes
`a
`(translated from the internet address) as well as
`other parameters.
`For messages received by the IP layer from the
`network layer,
`the IP module determines
`from
`the
`internet address whether
`the datagram is
`to be
`forwarded ·to another host on another network, for
`example on a second Ethernet such as 36 in Fig. 1, or
`whether it is intended for the server itself.
`If it
`is intended for another host on the second network,
`the IP module determines a local net address for the
`destination and calls on the local network'layer for
`that network to send the datagram. If the datagram is
`intended for an application program within the server,
`the IP layer strips· off the header and passes the
`remaining portion of the message to the appropriate
`next higher layer.
`The internet protocol standard
`used in the illustrative apparatus of Fig. 1
`is
`specified in Information Sciences Institute, •Internet
`Protocol,
`DARPA
`Internet
`Program
`Protocol
`Specification,• RFC 791
`(September 1981), which is
`incorporated herein by reference.
`TCP/UDP Layer. This layer is a datagram service
`with more elaborate packaging and addressing options
`than the IP
`layer.
`For example, whereas an
`IP
`datagram can hold about 1,500 bytes and be addressed
`to hosts, UDP datagrams can hold about 64KB and be
`addressed to a particular port within a host. TCP and
`UDP
`are alternative protocols at
`this
`layer;
`applications requiring ordered reliable delivery of
`streams of data may use TCP, whereas applications
`(such as NFS) which do not require ordered and
`reliable delivery may use UDP.
`The prior art file server of Fig. 1 uses both TCP
`and UDP.
`It uses UDP
`for file server-related
`services, and uses TCP for certain other services
`
`~U~~TITUTE SiiEET
`
`CQ-1002 / Page 1481 of 1790
`
`
`
`W091/03788
`
`PCT/US90/04711
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`-12-
`
`which the server provides to network clients. The UDP
`is specified in Postel, •user Datagram Protocol," RFC
`768 (August 28, 1980), which is incorporated herein by
`reference. TCP is specified in Postel, "Transmission
`Control· Protocol," RFC 761 (January 1980) and RFC 793
`(September 1981), which is also incorporated herein by
`reference.
`XDR/RPC Layer. This
`functions
`provides
`layer
`callable
`from higher
`to
`run
`a
`level programs
`designated procedure on a remote machine.
`It also
`provides the decoding necessary to permit a client
`machine to execute a procedure on the .server.
`For
`example, a caller process in a client node may send a
`call message to the server of Fig. 1.
`The call
`message
`includes
`a
`specification of
`the desired
`procedure, and its parameters. The message is passed
`up
`the stack to the RPC
`layer, which calls
`the
`appropriate procedure within the server. When the
`procedure is complete, a reply message is generated
`and RPC passes it back down the stack and over the
`'network to the caller client. RPC is described in Sun
`Microsystems,
`Inc.,
`"RPC: Remote Procedure Call
`Protocol Specification, Version 2 I a RFC 1057 (June
`1988), which is incorporated herein by reference.
`RPC uses
`the XDR external data representation
`·standard to represent information passed to and from
`the underlying UDP
`layer.
`XDR
`is merely a data
`encoding
`standard, useful
`for
`transferring data
`between different computer architectures.
`Thus, on
`the network side of the XDR/RPC layer, information is
`machine-independent; on the host application side, it
`may not be~ XDR is described in Sun Microsystems,
`Inc., "XDR: External Data Representation Standard,"
`RFC 1014 (June 1987), which is incorporated herein by
`reference.
`
`SUBSTITb7E SHEET
`- ··-- ..
`
`.
`
`CQ-1002 / Page 1482 of 1790
`
`
`
`.~o 91103788
`
`PCr/US90/04711
`
`-13-
`
`(•network file system•)
`The NFS
`NFS Layer.
`layer is one of the programs available on the server
`which an RPC request can call.
`The combination of
`host address, program number, and procedure number in
`an RPC request can specify one remote NFS procedure to.
`be called.
`Remote procedure calls to NFS on the file server of
`Fig. 1 provide transparent, stateless, remote access
`to shared files on the disks 24. NFS assumes a file
`system that is hierarchical, with directories as all
`but the bottom level of files. Client hosts can call
`any of about 20 NFS procedures
`including
`such
`procedures as reading a specified number of bytes from
`a specified file; writing a specified number of bytes
`to a specified file; creating, renaming and removing
`specified files; parsing directory trees; creating and
`removing directories; and reading and setting file
`attributes. The location on disk to which and from
`which data is stored and retrieved is always specified
`in logical terms, such as by a file handle or Inode
`designation and a byte offset.
`The details of the
`actual data storage are hidden from the client. The
`NFS procedures, together with possible higher level
`modules
`such as Unix VFS
`and UFS, perform all
`conversion of logical data addresses to physical data
`addresses such as drive, head,
`track and sector
`identification. NFS is specified in Sun Microsystems,
`Inc. ,
`•NFS:
`Network
`File
`System
`Protocol
`Specification,• RFC 1094
`(March 1989), incorporated
`herein by reference.
`With the possible exception of the network layer,
`all the protocol processing described above is done in
`software, by a single processor in t~e host CPU card
`10.
`That is, when an Ethernet packet arrives on
`Ethernet 12, the host CPU lO.performs all the protocol
`processing in the NFS stack, as well as the protocol
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`SUBSTITUTE SHEET
`
`CQ-1002 / Page 1483 of 1790
`
`
`
`W091/03788
`
`PCf /US90/04711
`
`5
`
`10
`
`15
`
`20
`
`25
`
`30
`
`35
`
`-14-
`
`processing for any other appl.ication which may be
`running on the host 10. NFS procedures are run on the
`host CPU 10, with access to memory 16 for both data
`and program code being provided via MMU 11. Logically
`specified data addresses are converted to a much more
`physically specified form and communicated to the SMD
`disk controller 22 or the SCSI bus 28, via the VME bus
`20, and all disk caching is done by the host CPU 10
`through the memory 16. The host CPU card 10 also runs
`procedures for performing various other functions of
`the file server, communicating with tape controller 30
`via the VME bus 20. Among these are client-defined
`remote procedures requested by client workstations.
`If the server serves a second Ethernet 36, packets
`from that Ethernet ·are transmitted to the host CPU 10
`over the same VME bus 20 in the form of IP datagrams.
`Again, all protocol processing except for the network
`layer is performed by software processes running on
`the host CPU 10.
`In addition, the protocol processing
`for any message that is to be sent from the server out
`on either of the Ethernets 12 or 36 is also done by
`processes running on the host CPU 10.
`It can be seen that the host CPU 10 performs an
`enormous amount of processing of data, especially if
`5-10 clients on each of the two Ethernets are making
`file server requests and need to be sent responses on
`a frequent basis. The host CPU 10 runs a multitasking
`Unix operating system, so each incoming request need
`not wait for the previous request to be completely
`processed
`and
`returned before being processed.
`Multiple processes are activated on the host CPU 10
`for performing different stages of the processing of
`different requests, so many requests may be in process
`at the same time. But there is only one CPU on the
`card 10, so the processing of these requests is not
`accomplished
`in
`a
`truly parallel manner.
`The
`
`SUBSTITUTE SHEEJ
`----·
`
`CQ-1002 / Page 1484 of 1790
`
`
`
`W091/03788
`
`l
`
`,. .
`
`•.. , 6 6 1 1 ;
`. ., v
`·:..: .::,.]
`
`ii .':
`
`I
`
`PCI"/US90/04711
`
`-15-
`
`processes are instead merely time sliced. The CPU 10
`therefore
`represents
`a major bottleneck
`in
`the
`processing of file server requests.
`Another bottleneck occurs in MMU 11 , which must
`transmit both instructions and data between the CPU
`card 10 and the memory 16. All data flowing between
`the disk drives and the network passes through this
`interface at least twice.
`Yet another bottleneck can occur on the VME bus 20,
`which must transmit data among the SMD disk controller
`22, the SCSI host adaptor 26, the host CPU card 10,
`and possibly the network 12 controller 24.
`
`PREFERREQ EMBOpiMENT-OYEBALL HARQiARE ARCHITECTURE
`In Fig. 2 there is shown a block diagram of a
`network file server 100 according to the invention.
`It can
`include multiple network controller
`(NC)
`boards, one or more file controller (FC) boards, one
`or more storage processor (SP) boards, multiple system
`memory boards, and one or more host processors. The
`particular embodiment shown in Fig. 2 includes four
`network
`controller boards
`110a-110d,
`two
`file
`controller boards 112a-112b,
`two storage processors
`114a-114b, four system memory cards 116a-116d for a
`total of 192MB of memory, and one local host processor
`118.
`The boards 110, 112, 114, 116 and 118 are
`connected together over a VME bus 120 on which an
`enhanced block transfer mode as described in the
`ENHANCED VMEBUS PROTOCOL application identified above
`may be used. Each of the four network controllers 110
`shown
`in Fig. 2 can be connected to up
`to
`two
`Ethernets 122, for a total capacity of 8 Ethernets
`122a-122h.
`Each of
`the storage processors 114
`operates ten parallel SCSI busses, nine