`Multicomputers
`
`Daniel Andresen Tao Yang Vegard Holmedahl Oscar H(cid:2) Ibarra
`Department of Computer Science
`University of California
`Santa Barbara(cid:3) CA
`fdandrese(cid:3) tyang(cid:3) veho(cid:3) ibarrag(cid:9)cs(cid:2)ucsb(cid:2)edu
`
`Abstract
`
`We investigate the issues involved in developing a scalable World Wide Web (cid:2)WWW(cid:3) server
`on a cluster of workstations and parallel machines(cid:4) using the Hypertext Transport Protocol
`(cid:2)HTTP(cid:3)(cid:5) The main objective is to strengthen the processing capabilities of such a server by
`utilizing the power of multicomputers to match huge demands in simultaneous access requests
`from the Internet(cid:5) We have implemented a system called SWEB on a distributed memory ma(cid:6)
`chine(cid:4) the Meiko CS(cid:6)(cid:4) and networked SUN and DEC workstations(cid:5) The scheduling component
`of the system actively monitors the usages of CPU(cid:4) I(cid:8)O channels and the interconnection net(cid:6)
`work to e(cid:9)ectively distribute HTTP requests across processing units to exploit task and I(cid:8)O
`parallelism(cid:5) We present the experimental results on the performance of this system(cid:5) Our
`results indicate that the system delivers good performance on multi(cid:6)computers and obtains
`signi(cid:10)cant improvements over other approaches(cid:5)
`
` Motivation
`
`The Scalable Web server (cid:2)SWEB(cid:3) project grew out of the needs of the Alexandria Digital Library
`(cid:2)ADL(cid:3) project at UCSB (cid:4)A (cid:7)(cid:8) Digital library systems(cid:9) which provide the on(cid:10)line retrieval and
`processing of digitized documents through Internet(cid:9) have increasingly turned into a topic of national
`
`importance(cid:8) The Alexandria Project is focused on the design(cid:9) implementation(cid:9) and deployment
`of a distributed digital library for spatially(cid:10)indexed information(cid:8) The collections of the library
`
`currently involve geographically(cid:10)referenced materials(cid:9) such as maps(cid:9) satellite images(cid:9) digitized aerial
`photographs(cid:9) and associated metadata(cid:8) The fundamental goal of the Alexandria Project is to permit
`
`users who are distributed over the Internet to access broad classes of spatially(cid:10)indexed materials
`that may themselves be distributed over the Internet and to derive useful information from such
`
`materials(cid:8) A rapid prototype system for the Alexandria digital library has already been developed
`which allows users to browse spatially(cid:10)indexed maps and images (cid:4)A (cid:7)(cid:8) To expose the system to
`
`
`
`Petitioner Microsoft Corporation - Ex. 1009, p.1
`
`
`
`a broad user community(cid:9) the next version of the system (cid:2)available at the end of (cid:3) will be
`connected to the World Wide Web (cid:2)WWW(cid:3) using the Hypertext Transport Protocol (cid:2)HTTP(cid:3)(cid:8)
`
`Our work is motivated by the fact that the Alexandria digital library WWW server has a potential
`to become the bottleneck in delivering digitized documents over high(cid:10)speed Internet(cid:8) Popular
`
`WWW sites such as the Lycos and Y ahoo (cid:4)LYC (cid:7) receive over one million accesses a day(cid:8) For
`example(cid:9) in alone(cid:9) the weekly access rate for NCSA(cid:13)s HTTP server at UIUC increased from
` K to (cid:8)M(cid:8) The peak request arrival rate at NCSA exceeds more than requests(cid:16)second(cid:9) but a
`
`single(cid:9) high(cid:10)end workstation can handle only a limited number of requests per second (cid:4)KB (cid:7)(cid:8) The
`requests to the NCSA server typically involve only simple (cid:18)le retrieval operations(cid:8) For WWW(cid:10)based
`
`network information systems such as digital libraries(cid:9) the servers involve much more intensive I(cid:16)O
`and heterogeneous CPU activities(cid:8) This is because such systems process a large amount of digitized
`
`images(cid:8) And content(cid:10)based searching and multi(cid:10)resolution image browsing (cid:4)A (cid:7) have large demands
`in computation speed(cid:9) disk storage and network bandwidth in the server(cid:8) To meet such demands(cid:9)
`the SWEB project makes use of existing networked computers and inexpensive disk resources to
`strengthen the processing and storage capabilities of WWW servers(cid:8) During the last few years(cid:9) it has
`become commonplace that local networks span large numbers of powerful workstations connected
`with inexpensive disks(cid:8) Those machines are idle much of time(cid:8) Also the price(cid:16)performance ratio of
`parallel machines has dropped rapidly(cid:8) We expect that recycling the idle cycles of those processing
`
`units and retrieving (cid:18)les in parallel from inexpensive disks can signi(cid:18)cantly improve the scalability
`of the server in response to a large amount of simultaneous HTTP requests(cid:8)
`
`We have developed the preliminary version of a scalable WWW server (cid:2)SWEB(cid:3) (cid:4)A (cid:7) running on a
`Meiko CS(cid:10) distributed memory machine and a network of workstations (cid:2)NOW(cid:3) such as SUN and
`DEC machines(cid:8) Each processing unit is capable of handling a user request following the HTTP
`protocol(cid:8) The distinguishing feature of SWEB is e(cid:19)ective resource utilization by a close collaboration
`of multiple processing units(cid:8) Each processing unit is a processor (cid:2)e(cid:8)g(cid:8) SUN SPARC or a Meiko
`
`CS(cid:10) node(cid:3) possibly linked to a local disk(cid:8) Several resource constraints a(cid:19)ect the performance of
`the server(cid:20) the CPU speed and memory size of one processing unit(cid:9) the existing system load(cid:9) the
`
`transmission bandwidth between the processing unit and its local disk(cid:9) the network latency and
`bandwidth between a processing unit and a remote disk when the accessed (cid:18)les are not stored in
`
`the local disk(cid:9) and disk contention when multiple I(cid:16)O requests access the same disk(cid:9) can all a(cid:19)ect
`the performance(cid:8) Server scalability is achieved by actively monitoring the run(cid:10)time CPU(cid:9) disk I(cid:16)O(cid:9)
`and network loads of system resource units(cid:9) and dynamically scheduling user HTTP requests to a
`
`proper node for e(cid:21)cient processing(cid:8)
`
`The paper is organized as follows(cid:20) Section discusses the related work(cid:8) Section gives the back(cid:10)
`
`ground and problem de(cid:18)nition(cid:8) Section discusses the possible approaches for building a scalable
`
`
`
`Petitioner Microsoft Corporation - Ex. 1009, p.2
`
`
`
`WEB server and the SWEB organization and load balancing strategies(cid:8) Section presents the
`
`experimental results showing the performance of SWEB and compares our approach with others(cid:8)
`Section provides analytic results on a performance bound of SWEB and veri(cid:18)es it with the actual
`
`experiment results(cid:8) Section discusses conclusions and future work(cid:8)
`
` Related Work
`
`Numerous other initiatives to create high(cid:10)performance HTTP servers have been reported(cid:8) NCSA (cid:4)KB (cid:7)
`has built a multi(cid:10)workstation HTTP server based on round(cid:10)robin domain name resolution (cid:2)DNS(cid:3)
`
`to assign requests to workstations(cid:8) The round(cid:10)robin technique is e(cid:19)ective when HTTP requests
`access HTML information of relatively(cid:10)uniform size chunks and the load and computing powers of
`workstations are relatively comparable(cid:8) Our assumption is that the computing powers of worksta(cid:10)
`tions and parallel machine resources can be heterogeneous(cid:8) They can be used for other computing
`
`needs(cid:9) and can leave and join the system resource pool at any time(cid:8) Thus scheduling techniques
`which are adaptive to the dynamic change of system load and con(cid:18)guration are desirable(cid:8) The
`
`domain name resolution in a round(cid:10)robin fashion cannot predict those changes(cid:8) Another weakness
`of the technique is the degree of name caching which occurs(cid:8) DNS caching enables a local DNS
`system to cache the name(cid:10)to(cid:10)IP address mapping(cid:9) so that most recently accessed hosts can quickly
`be mapped(cid:8) The downside is that all requests for a period of time from a DNS server(cid:13)s domain
`will go to a particular IP address(cid:8) For example(cid:9) all the requests from UCSB might go to server S(cid:9)
`
`if its IP address happened to be cached in some previous lookup(cid:9) even though server S might be
`relatively idle(cid:8)
`
`Recently NCSA has come out with version (cid:8) of their httpd software(cid:9) which in many cases doubles
`the performance of previous versions by utilizing a pre(cid:10)forking strategy (cid:4)NCH (cid:7)(cid:8) This technique
`
`can be incorporated in our system and will be bene(cid:18)cial(cid:8) However we have not done so at this
`stage and in this paper we focus on alternative means of achieving the scalability through dynamic
`scheduling and request redirection(cid:8)
`
`Other projects have focused on improving the performance of a single(cid:10)workstation server(cid:8) The
`Apache project has programmed a set of changes to the early versions of NCSA httpd to make
`
`the server more e(cid:21)cient(cid:8) The CERN server has achieved signi(cid:18)cant improvements(cid:8) The Harvest
`project (cid:4)BD (cid:7) has introduced a HTTP cache which can be a major accelerator for pages without
`
`a dynamically created component(cid:8) All these e(cid:19)orts are concentrating on the single workstation
`server level and can be embodied in our system(cid:9) while our work focuses on a collaborating multi(cid:10)
`workstation server(cid:8)
`
`It should be noted that WWW applications only represent a special class of Internet information
`
`
`
`Petitioner Microsoft Corporation - Ex. 1009, p.3
`
`
`
`systems(cid:8) There are other information servers dealing with huge (cid:18)le sizes and large numbers of
`
`users(cid:9) for example multi(cid:10)media servers (cid:4)HL (cid:7)(cid:8) Our situation has several di(cid:19)erences(cid:8) First(cid:9) our
`current system has no real(cid:10)time processing constraints for displaying digital movies or audio clips(cid:8)
`
`Secondly(cid:9) many of our users tend to browse di(cid:19)erent text and images information(cid:9) rather than
`focusing one particular document such as one (cid:18)lm beginning to end(cid:8) Thirdly(cid:9) we assume lossless
`document delivery(cid:9) keeping consistency between the source library holding and what is sent to the
`
`client(cid:8)
`
`Our dynamic scheduling scheme is closely related to the previous work on load balancing on dis(cid:10)
`
`tributed systems (cid:4)ELZ(cid:9) BCS (cid:9) WR (cid:7)(cid:8) In these studies(cid:9) tasks arrivals may temporarily be uneven
`among processors and the goal of load balancing is to adjust the imbalance between processors by
`
`appropriately transferring tasks from overloaded processors to underloaded processors(cid:8) The most
`well known techniques are called receiver(cid:10)initiated and sender(cid:10)initiated strategies(cid:8) In those studies(cid:9)
`
`the criteria for task migration are based on a single system parameter(cid:9) i(cid:8)e(cid:8)(cid:9) the CPU load(cid:8) We call
`them single(cid:10)faceted scheduling strategies(cid:8) In the WWW applications(cid:9) there are multiple parameters
`that a(cid:19)ect the system performance(cid:9) including CPU loads(cid:9) interconnection network performance and
`
`disk channel usages(cid:8) The optimal HTTP request assignment to processors does not only depend on
`CPU loads(cid:8) Thus we need to develop a multi(cid:10)faceted scheduling scheme that can e(cid:19)ectively utilize
`
`the system resources by considering multiple performance parameters(cid:8)
`
` Background(cid:4) The World Wide Web
`
`The World Wide Web is based on three critical components(cid:20) the Uniform Resource Locator (cid:2)URL(cid:3)(cid:9)
`the HyperText Markup Language (cid:2)HTML(cid:3)(cid:9) and the HyperText Transfer Protocol (cid:2)HTTP(cid:3)(cid:8) The
`
`URL de(cid:18)nes which resource the user wishes to access(cid:9) the HTML language allows the information
`to be presented in a platform(cid:10)independent but still well(cid:10)formatted manner(cid:9) and the HTTP protocol
`is the application(cid:10)level mechanism for achieving the transfer of information (cid:4)HT (cid:9) BC (cid:9) BL (cid:7)(cid:8)
`
`(cid:0) HTML is derived from the Standard Generalized Markup Language (cid:2)SGML(cid:3) optimized for
`presenting hypertext information(cid:8) It operates on the basis of tags(cid:9) where each piece of the text
`
`is marked to indicate the format to be used(cid:8) HTML is similar to other markup languages such
`as LATEX and RTF(cid:8) HTML version (cid:8) contained little beyond hypertext links and standard
`formatting functions such as bold(cid:9) center(cid:9) etc(cid:8) Later versions have been extended to include
`the forms interface(cid:9) where users can (cid:18)ll in data and submit the form to a server(cid:25) tables(cid:9) which
`
`mimic the layout of a spreadsheet(cid:25) and inline images(cid:8) In addition to the formal standard(cid:9) many
`browser companies (cid:2)e(cid:8)g(cid:8) Netscape(cid:3) have added custom extensions to their implementations(cid:8)
`
`
`
`Petitioner Microsoft Corporation - Ex. 1009, p.4
`
`
`
`(cid:0) The Uniform Resource Locator (cid:2)URL(cid:3) serves as an index to a particular service or document(cid:9)
`giving the hostname(cid:9) the access method(cid:9) and document(cid:16)application path in a single concise
`
`format(cid:8) For example(cid:9) the URL http(cid:20)(cid:16)(cid:16)www(cid:8)cs(cid:8)ucsb(cid:8)edu(cid:16) would fetch the default document
`(cid:2)or page(cid:3) from the www server at the Computer Science department at UCSB using the
`HTTP protocol(cid:8) ftp(cid:20)(cid:16)(cid:16)www(cid:8)cs(cid:8)ucsb(cid:8)edu(cid:16)my(cid:18)le uses the FTP protocol to fetch the document
`
`(cid:26)my(cid:18)le(cid:26) from the same machine(cid:8) URLs are not limited to simply fetching (cid:18)les(cid:9) however(cid:9) but
`can specify a program to execute(cid:8) E(cid:8)g(cid:8)(cid:9) http(cid:20)(cid:16)(cid:16)www(cid:8)cs(cid:8)ucsb(cid:8)edu(cid:16)cgi(cid:10)bin(cid:16)uptime causes the
`
`(cid:26)uptime(cid:26) program to execute on the www server and return its results to the client(cid:8) This
`ability to transparently select a resource which might be the results of a dynamic execution
`or simply a (cid:18)le fetch makes a number of internet services possible without adding complexity
`for the user(cid:8)
`
`(cid:0) The HTTP protocol (cid:2)HTTP(cid:3) is a stateless(cid:9) connection(cid:10)oriented application(cid:10)level protocol
`
`used for network hypermedia information systems(cid:8) A number of WWW browsers and servers
`use such protocols(cid:9) for example(cid:9) NetCom(cid:13)s NetScape browser(cid:9) the NCSA and CERN HTTP
`servers(cid:4)HT (cid:7)(cid:8) A simple HTTP request would typically activate a sequence of events from
`
`initiation to completion as shown in (cid:18)gure (cid:8) First(cid:9) the client determines the host name from
`the URL(cid:9) and uses the local Domain Name System (cid:2)DNS(cid:3) server to determine its IP address(cid:8)
`
`The local DNS may not know the IP address of the destination(cid:9) and may need to contact
`the DNS system on the destination side to complete the resolution(cid:8) After receiving the IP
`address(cid:9) the client then sets up a TCP(cid:16)IP connection to a well(cid:10)known port on the server where
`the HTTP process is listening(cid:8) The request is then passed in through the connection(cid:8) After
`
`parsing the request(cid:9) the server sends back a response code (cid:2)e(cid:8)g(cid:8)(cid:9) in the HTTP stands for
`(cid:26)OK(cid:8) File found(cid:8)(cid:26)(cid:9) and is (cid:26)File not found(cid:8)(cid:26)(cid:3) followed by the results of the query(cid:8) The
`connection is then closed by either the client or the server(cid:8)
`
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`C
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`1.1.1.1
`S?
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`DNS
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`local
`
`r
`
`f
`
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`S
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`DNS
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`remote
`
`Figure (cid:20) A simple HTTP transaction(cid:2) Client C looks up the address of server S(cid:9) sends over
`request r(cid:9) and receives response f (cid:8)
`
`
`
`Petitioner Microsoft Corporation - Ex. 1009, p.5
`
`
`
` SWEB(cid:4) A scalable WWW server
`
`We call a WWW server scalable if the system response time for individual requests remains relatively
`
`constant when the the number of simultaneous HTTP requests increases(cid:8) We de(cid:18)ne the response
`time for a request as the length of the time period from when a request is initiated until all the
`requested information arrives at the client(cid:8)
`
`For a single(cid:10)workstation server(cid:9) there is an upper bound for the number of requests per second
`(cid:2)rps(cid:3) that the server can handle(cid:8) The NCSA has performed a number of tests using high(cid:10)end
`
`workstations(cid:9) and discovered in their working environment approximately (cid:10) rps could be dealt
`with using the NCSA httpd server (cid:4)KB (cid:7)(cid:9) which cannot match the current and future loads (cid:2)e(cid:8)g(cid:8)
`a digital library server(cid:3)(cid:8) Thus multiple servers are needed for achieving scalable performance(cid:8)
`
`We assume that the architecture of our system contains a set of networked workstations(cid:8) Some of
`
`the workstations are connected to SCSI disks or mass storage RAID subsystems in which HTML
`(cid:18)les are stored(cid:8) The disks are included in a shared (cid:18)le system among all processing units(cid:8) Our
`current implementation runs on NOWs using Ethernet (cid:2)we will use ATM in the future(cid:3) and a Meiko
`CS(cid:10) machine which contains a set of distributed memory SPARC processors connected by a fat tree
`based communication network(cid:8) All (cid:18)les are available to each processor via NFS mounts(cid:8) The NOW
`
`version runs on a set of SUN Sparc stations with Solaris operating systems(cid:9) and DEC Alpha(cid:10)based
`workstations with OSF (cid:8) Fig(cid:8) depicts the architecture assumptions of SWEB(cid:8) In this paper(cid:9) the
`
`term workstation unit and processor are interchangeable(cid:8) We also assume that each CPU unit may
`be used by other applications and it could leave and join the resource pool at any time(cid:8)
`
`Client
`
`Client
`
`Client
`
`Internet
`
`HTTP requests
`
`Scheduler
`
`Load information
`
`SWEB
`
`Disks
`
`Ethernet/ATM
`
`Workstations
`
`MPP (Meiko CS−2)
`
`Figure (cid:20) The computing and storage architecture of SWEB(cid:8)
`
`The overall objective of the system is to reduce and sustain the response time under large numbers
`
`
`
`Petitioner Microsoft Corporation - Ex. 1009, p.6
`
`
`
`of simultaneous requests(cid:8) Goals considered in designing this system are(cid:20)
`
`(cid:0) To demonstrate how to utilize existing inexpensive commodity network(cid:9) workstation(cid:9) and
`disk resources to build a scalable WWW server(cid:8) The computing environment could be het(cid:10)
`
`erogeneous and workstation(cid:16)processor units with di(cid:19)erent speeds and di(cid:19)erent loads at any
`time(cid:8)
`
`(cid:0) To develop sophisticated dynamic scheduling algorithms for exploiting task and I(cid:16)O paral(cid:10)
`
`lelism adaptive to the run(cid:10)time change of system resource loads and availabilities(cid:8) The system
`needs to provide good system resource estimation to assist the scheduler(cid:8) The scheduler needs
`
`to incorporate multiple system performance parameters to assign user HTTP requests to a
`proper unit for e(cid:21)cient processing(cid:8) The overhead involved for current resource load assessment
`and scheduling decision making should be minimized(cid:8)
`
`In this subsection we will (cid:18)rst discuss factors that a(cid:19)ect scalability of the system(cid:9) introduce the
`
`functional modules of SWEB(cid:9) and discuss the approaches for routing a HTTP request to a processor
`based on the scheduling decision (cid:2)since such a routing scheme a(cid:19)ects the design of the scheduling
`
`heuristic(cid:3)(cid:8) Then we will present our scheduling algorithm for determining the processor assignment
`of a given HTTP request(cid:8)
`
`(cid:2) Performance factors
`
`There are several performance factors that a(cid:19)ect the response time in processing HTTP requests(cid:8)
`We discuss them in more detail as follows(cid:8)
`
` (cid:8) Processor load(cid:8)
`
`A HTTP request is handled through a TCP(cid:16)IP connection and then subsequently through
`
`a forked subprocess(cid:8) This subprocess may retrieve a (cid:18)le or invoke a CGI program(cid:9) which
`requires varying amounts of CPU time(cid:8) Notice that a processing unit can also be used for
`
`applications other than WWW(cid:8) The load of a processing unit must be monitored so that
`HTTP requests can be distributed to relatively lightly loaded processors(cid:8)
`
`(cid:8) Disk I(cid:2)O(cid:8)
`
`Since most HTTP requests retrieve (cid:18)les from disks(cid:9) the I(cid:16)O requirements can present further
`bottlenecks(cid:8) For instance(cid:9) whether the (cid:18)les are on local disk or fetched over a local network
`can a(cid:19)ect performance substantially(cid:8) The raw data transfer rate of disks is in the range of (cid:10)
`MByte per second while the RAID subsystems can achieve substantially faster performance
`
`
`
`Petitioner Microsoft Corporation - Ex. 1009, p.7
`
`
`
`(cid:2) MB or more per second(cid:3)(cid:8) If many requests are accessing the same disk at the same time(cid:9)
`the I(cid:16)O performance can degrade signi(cid:18)cantly due to disk channel contention(cid:8) If simultaneous
`
`user requests are accessing di(cid:19)erent disks(cid:9) then parallel I(cid:16)O leads to a higher throughput in
`delivering documents(cid:8)
`
` (cid:8) Interconnection network(cid:8)
`
`HTTP requests usually access HTML (cid:18)les(cid:9) but they could activate the execution of a program(cid:8)
`The local interconnection network bandwidth a(cid:19)ects the performance of (cid:18)le retrieval since
`
`many (cid:18)les may not reside in the local disk of a processor(cid:8) Remote (cid:18)le retrieval through the
`network (cid:18)le system will be involved(cid:8) The network tra(cid:21)c congestion could dramatically slow
`the HTTP request processing(cid:8) However(cid:9) in some cases(cid:9) given a very fast network and NFS
`server(cid:9) it is possible that a remote access can actually be faster than pulling data from the
`
`local disk(cid:9) if the local disk has a highc hannel contention(cid:8)
`
`(cid:8) Internet(cid:3)
`
`The Internet speed signi(cid:18)cantly a(cid:19)ects the performance in information delivery(cid:8) Currently(cid:9)
`Internet bandwidth is increasing rapidly with the advent of ATM(cid:9) BISDN(cid:9) and other WAN
`
`technologies(cid:8) However(cid:9) the NCSA work (cid:4)KB (cid:7) and our experiment show that the Internet is
`not necessarily the bottleneck in a high tra(cid:21)c system(cid:9) and the server is another bottleneck
`for document delivery(cid:8)
`
`Our goal is to get the requested information out of the server as fast as possible(cid:8) Thus our scheduler
`primarily monitors the above (cid:18)rst three performance factors(cid:8) The Internet bandwidth information
`is used partially in request re(cid:10)direction(cid:8) Our scheduling algorithm is multi(cid:10)faceted in the sense that
`
`the proper decision on routing HTTP requests needs to aggregate the impact of the above multiple
`system parameters on the overall system response time(cid:8)
`
`(cid:2) Distributed vs(cid:2) centralized scheduler
`
`There are two approaches in designing the scheduler(cid:8) One is to have a centralized scheduler running
`
`one processor such that all HTTP requests go through this processor(cid:8) The scheduler monitors the
`usages of all system resources(cid:9) makes assignment decisions based on this information(cid:9) and routes
`
`requests to appropriate processors(cid:8) We did not take this approach mainly because the single central
`distributor becomes a single point of failure(cid:9) making the entire system more vulnerable(cid:8)
`
`The current version of SWEB uses a distributed scheduler(cid:8) The user requests are (cid:18)rst evenly routed
`to SWEB processors via the DNS scheme as shown in Fig(cid:8) (cid:8) The use of DNS rotation to provide
`
`
`
`Petitioner Microsoft Corporation - Ex. 1009, p.8
`
`
`
`Clients
`
`D N S
`
`Collaborative
`Servers
`
`Figure (cid:20) Distributed collaborative scheduler(cid:8)
`
`the initial assignment of HTTP requests is used in the NCSA multi(cid:10)workstation server (cid:4)KB (cid:7)(cid:8) In
`this scheme(cid:9) multiple real machines are mapped to the same IP name(cid:8) As shown in Fig(cid:8) (cid:9) when
`
`a client requests the network ID of the machine name (cid:2)e(cid:8)g(cid:8)(cid:9) www(cid:8)cs(cid:8)ucsb(cid:8)edu(cid:3)(cid:9) the DNS at the
`server site will rotate the network IDs(cid:9) pick up one (cid:2)e(cid:8)g(cid:8)(cid:9) (cid:8) (cid:8) (cid:8) (cid:3) and send back to the client(cid:8) The
`
`rotation on available workstation network IDs is in a round(cid:10)robin fashion(cid:8) This functionality is
`available in current DNS systems(cid:8) The major advantages of this technique are simplicity(cid:9) ease of
`
`implementation(cid:9) and reliability (cid:4)KB (cid:7)(cid:8)
`
`r
`
`f
`
`C
`
`S?
`
`1.1.1.x
`
`DNS
`
`1.1.1.1
`
`S0
`
`1.1.1.2
`
`S1
`
`Figure (cid:20) DNS Rotation(cid:2) The DNS server returns di(cid:19)ering IP addresses for a given name in
`round(cid:10)robin fashion from a list of servers(cid:8)
`
`The DNS assigns the requests without consulting dynamically(cid:10)changing system load information(cid:8)
`Thus the SWEB conducts a further re(cid:10)direction of requests(cid:8) Each processor in SWEB contains a
`scheduler and those processors collaborate with each others to exchange system load information(cid:8)
`
`After a request is routed to a processor via DNS(cid:9) the scheduler in that processor makes a decision
`regarding whether to process this request or redirect it to another processor(cid:8) Any HTTP request
`
`is not allowed to be re(cid:10)routed more than once to avoid the ping(cid:10)pong e(cid:19)ect(cid:8) This architecture is
`substantially less prone to single(cid:10)point(cid:10)of(cid:10)failure breakdowns(cid:9) is more distributed(cid:9) and the overhead
`
`of load balancing is distributed across a number of nodes(cid:8)
`
`The functional structure of the scheduler at each processor is depicted in Fig(cid:8) (cid:8)
`
`It contains a
`
`httpd daemon based on NCSA httpd code for handling httpd requests(cid:9) with a broker module which
`determines the best possible processor to handle a given request(cid:8) The broker consults with two
`
`
`
`Petitioner Microsoft Corporation - Ex. 1009, p.9
`
`
`
`other modules(cid:9) the oracle and the loadd(cid:8) The oracle is a miniature expert system(cid:9) which uses
`
`a user(cid:10)supplied table to characterize the CPU and disk demands for a particular task(cid:8) The loadd
`daemon is responsible for updating the system CPU(cid:9) network and disk load information periodically
`
`(cid:2)every (cid:10) seconds(cid:3)(cid:9) and marking those processors which have not responded in a preset period of
`time as unavailable(cid:8) When a processor leaves or joins the resource pool(cid:9) the loadd daemon will be
`aware of the change(cid:8)
`
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`Broker
`SWEB
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
` - manage servers
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
` - choose server for request
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`ccept_request(r)
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`choose_server(s)
`Oracle
`if (s != me)
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
` - characterize requests
`reroute_request(s)
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`else
`
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`handle_request(r)
`
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`loadd
`
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`- manage distributed load info
`
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`loadd
`httpd
`
`(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)
`
`...a
`
`...
`
`Figure (cid:20) The functional modules of a SWEB scheduler in a processor(cid:8)
`
`(cid:2) Routing HTTP requests
`
`After a processor accepts a HTTP request through TCP connection and a system decides to assign
`a request to a particular processor(cid:9) this request needs to be transparently routed to this processor(cid:8)
`
`The best situation would be to modify the UNIX sockets package to change the semantics of the
`(cid:27)accept(cid:26) system call(cid:8) Currently the http daemon at the server sits listening on a pre(cid:10)de(cid:18)ned port(cid:8)
`
`When a client wishes to connect(cid:9) the server then negotiates another port over which to conduct
`the transaction(cid:9) while continuing to listen on the original port(cid:8) Setting this new port to be on
`a separate machine is di(cid:21)cult since it requires a substantial modi(cid:18)cation of the UNIX operating
`system kernel(cid:8)
`
`The approach that we eventually decided to implement is based on the fact that the HTTP protocol
`
`allows for a response called (cid:27)URL Redirection(cid:26)(cid:8) As shown in Fig (cid:9) when client C sends a request to
`server S (cid:9) S returns a rewritten URL r and a response code indicating the infor