throbber
SWEB Towards
`
`Scalable World Wide Web Server on Multicomputers
`
`Daniel Andresen
`
`Tao Yang Vegard Holmedahl Oscar
`Department of Computer Science
`University of California
`Santa Barbara CA 93106
`dandrese tyang veho ibarra cs .ucsb.edu
`
`Ibarra
`
`Abstract
`
`We investigate the issues
`World Wide Web WWW server on
`
`involved in developing
`
`scalable
`
`cluster of workstations and
`
`parallel machines
`
`The objective is to strengthen the processing
`server by utilizing the power of multicom
`capabilities of such
`puters to match huge demands
`in simultaneous access requests
`from the Internet We have implemented
`system called SWEB
`theMeiko CS-2 andnetworked
`on adistributedmemorymachine
`The scheduling component of the system actively
`workstations
`monitors the usages of CPU I/O channels
`and the interconnection
`network to effectively distribute HITP requests
`across processing
`task and I/O parallelism We present the experi
`mental results on the performance of this system
`
`units to exploit
`
`Motivation
`
`The Scalable Web server SWEB project grew out of the needs
`of the Alexandria Digital Library ADL project at UCSB
`library systems which provide the on-line retrieval and
`Digital
`processing of digitized documents through Internet have increas
`The Alexandria
`topic of national
`importance
`on the design implementation
`
`ingly turned into
`is focused
`
`Project
`
`and deployment
`
`of
`
`distributed digital
`The collections of
`
`library for spatially-indexed information
`
`the library currently involve geographically
`referenced materials such as maps satellite images digitized
`and associated metadata
`aerial photographs
`rapid prototype
`library has already been devel
`system for the Alexandria digital
`oped which allows users
`to browse spatially-indexed maps and
`broad user commu
`To expose the system to
`images
`nity the next version of the system available in early 1996 will
`to the World Wide Web WWW using the Hypertext
`be connected
`Transport Protocol HTTP
`Our work is motivated by the fact
`that the Alexandria digital
`library WWW server has
`to become the bottleneck in
`potential
`delivering digitized documents over high-speed Internet Popu
`lar WWW sites such as the Lycos and Yahoo
`receive
`day For WWW-based network in
`involve
`formation systems such as digital
`libraries the servers
`CPU activities
`To
`much more intensive I/O and heterogeneous
`meet such demands the SWEB project makes use of existing net
`worked computers and inexpensive disk resources
`to strengthen
`capabilities of WWW servers We
`
`over one million accesses
`
`the processing and storage
`
`1996 IEEE
`1063-7133/96 $5.00
`Proceedings of IPPS 96
`
`850
`
`that using the idle cycles of those processing units and
`expect
`retrieving files in parallel from inexpensive disks can significantly
`large amount
`
`in response to
`
`improve the scalabiity of the server
`of simultaneous HTFP requests
`Numerous other initiatives to create high-performance HTTP
`servers have been reported The lnktomi server atUC Berkeley is
`based on the NOW technology
`NCSA
`has built
`multi-workstation HITP server based on round-robin domain
`name resolution tINS to assign requests to workstations
`The
`is effective when HTTP requests access
`round-robin technique
`HTML information of relatively uniform size and the load and
`computing powers of workstations are relatively comparable Our
`the computing powers of workstations
`and
`assumption is that
`parallel machine resources
`They can be
`can be heterogeneous
`used for other computing needs and can leave andjoin the system
`resource pool at any time Thus scheduling techniques which are
`to the dynamic change of system load and configuration
`adaptive
`are desirable The DNS in
`round-robin fashion cannot predict
`those changes Another weakness of the technique is the degree of
`local DNS
`name caching which occurs
`tINS caching
`enables
`so that most
`system to cache the name-to-IP address mapping
`recently accessed hosts can quickly be mapped The downside is
`period of time from DNS servers domain
`that all requests
`will go to
`IP address
`particular
`We have
`the preliminary version of
`WWW server SWEB
`running on Meiko CS-2 dis
`network of workstations NOW
`tributed memory machine and
`such as SUN and DEC machines Each processing unit is capable
`request following the HTTP protocol
`The
`user
`of handling
`distinguishing feature of SWEB is effective resource utilization
`by close collaboration of multiple processing units Server scala
`by actively monitoring the run-time CPU disk
`is achieved
`bility
`110 and network loads of system resource units and dynami
`cally scheduling user HTTP requests to
`processing
`Our dynamic scheduling scheme is closely related to the pre
`vious work on load balancing on distributed systems for which
`In these studies tasks
`collection of papers is available in
`arrivals may temporarily be uneven among processors and the goal
`of load balancing is to adjust the imbalance between processors
`by appropriately transferring tasks from overloaded
`processors to
`underloaded processors The task resource requirements are Un
`known and the criteria for task migration are based on
`single
`system parameter i.e the CPU load We call them single-faceted
`
`for
`
`developed
`
`scalable
`
`proper node for efficient
`
`Petitioner IBM – Ex. 1010, p. 1
`
`

`
`scheduling strategies WW applications
`there are multiple
`the system performance including CPU
`parameters that affect
`loads interconnection network performance and disk channel us
`ages The optimal HITP request assignment
`does
`to processors
`on Cpu loads
`Thus we need to develop
`not solely depend
`scheduling scheme
`that can effectively utilize the
`
`multi-faceted
`
`system resources by considering the aggregate
`
`impact of multiple
`re
`
`resource
`
`gives the back
`
`In
`
`approaches
`
`for building
`
`experimental
`
`parameters on system performance
`quirements are predicted and suggested to guide the load sharing
`They mention multiple factors but utilize only the CPU factor in
`predicting response times
`The paper is organized as follows Section
`ground and problem definition Section
`the possible
`discusses
`scalable WEB server and the SWEB
`organization and load balancing strategies Section
`presents the
`results showing the perfnruiance of SWEB and coin-
`discusses conclusions
`pares our approach with others Section
`and future work
`
`1.1.1.1
`
`__
`
`Figure
`
`simple HT1P transaction Client
`the address of server
`sends over request
`
`looks up
`
`and receives
`
`response
`
`number of tests using high-
`
`example the NCSA has performed
`in their working environment
`and discovered
`end workstations
`approximately 5-it rps could be dealt with using the NCSA httpd
`which cannot match the current and future loads
`server
`library server Thus multiple servers are needed
`scalable performance
`for achieving
`Our overall objective of the system is to reduce and sustain
`the response time under large numbers of simultaneous requests
`Goals considered in designing this system are twofold First
`we hope to demonstrate how to utilize existing inexpensive com
`modity networks heterogeneous workstations
`and disks to build
`scalable WWW server Second we attempt
`to develop dy
`namic scheduling algorithms for exploiting task and I/O paral
`lelism adaptive to the run-time change of system resource loads
`The scheduler needs
`and availabilities
`to incorporate multiple
`system performance parameters to assign user HTTP requests
`proper unit for efficient processing while adding minimal over
`head
`
`Background
`
`The World Wide Web
`
`e.g
`
`digital
`
`First
`
`The World Wide Web is based on three critical components
`Locator URL the HyperText Markup
`the Uniform Resource
`Language HTML and the HyperText Transfer Protocol HrrP
`The URL defines which resource the user wishes to access
`HTML language
`allows the information
`to be presented
`but still well-formatted manner
`arLd the
`platform-independent
`HYIP protocol
`is the application-level mechanism for achieving
`the transfer of information
`simple HTTP request would typically
`from initiation
`to completion as shown
`of events
`in Figore
`from the URL and
`the client determines the host name
`uses the local Domain Name System DNS server
`to determine
`its IP address The local DNS may not know the IP address of
`the DNS system on the
`the destination and may need to contact
`destination side to complete the resolution
`After receiving the
`TCP/IP connection
`IP address the client then sets up
`to well-
`known port on the server where the HTTP process is listening The
`request is then passed in through the connection
`After parsing
`code e.g 202 in
`the server
`sends hack
`the request
`stands for OK File found and 404 is File
`the HYIP protocol
`not found followed by the results of the query The connection
`is then closed by either the client or the server
`
`the
`
`in
`
`activate
`
`sequence
`
`response
`
`SWEB
`
`scalable WWW server
`
`if
`
`We call
`system scalable
`the system response
`requests is kept as small as theoretically possible when
`individual
`the the number of simultaneous HTIP requests increases while
`high peak
`low request drop rate and achieving
`maintaining
`request rate We define the response time fo request as the
`length of the time period from when
`request is initiated until all
`
`time for
`
`the requested
`
`information sTrives at the client
`server there is an upper bound for the
`For
`single-workstation
`number of requests per second rps that
`the server can handle For
`
`851
`
`to
`
`There are several performance factors that affect the response
`time in processing H1TP requests These include processor load
`to send bytes out on the network
`caused by the overhead necessary
`and marshaled disk I/O which limits how
`properly packetized
`the local network
`quickly data can be delivered to the processor
`over which remote data requests must be fetched and the internet
`to the client which often is
`connection
`severe bottleneck
`
`above first
`
`Our goal is to get
`
`information out of the server
`
`the requested
`Thus our scheduler primarily monitors
`the
`as fast as possible
`three performance factors The Tnternet bandwidth in
`in request re-direction Our scheduling
`formation is used partially
`algorithm is multi-faceted in the sense that the proper decision on
`routing HITP requests needs to aggregate the impact of the above
`multiple system parameters on the overall system response time
`We will present the architecture for SWEB and its scheduling
`algorithm next
`
`3.1
`
`SWEB Architecture
`
`There are two approaches
`
`in designing the scheduler One is
`
`to have
`centralized scheduler running on one processor such
`that all H1TP requests
`go through this processor The scheduler
`monitors the usages of all system resources makes assignment
`decisions based on this information and routes requests
`to appro
`priate processors We did not take this approach mainly because
`
`Petitioner IBM – Ex. 1010, p. 2
`
`

`
`acct_requeStr
`
`choose_servers
`ifstsme
`reroute_requests
`
`else
`
`handle requeotr
`
`Broker
`
`servers
`
`massage
`
`choose server
`
`for request
`
`_______________________
`Lgisubusealoadiniii
`
`SWEBogftalserver
`
`httpd
`
`Ii
`
`Ioadd
`
`Figure2 The computing and storage architecture ofSWEB
`DNS directs tasks to servers running task scheduler
`
`Figure
`
`The functional modules of
`
`SWEB scheduler
`
`in
`
`processor
`
`the single central distributor
`
`becomes
`
`single point of failure
`
`system load parameters affect such
`
`decision In
`
`single-faceted
`
`via
`
`making the entire system more vulnerable
`The current version of SWEB uses
`distributed scheduler
`are first evenly routed to SWEB processors
`The user requests
`the DNS rotation as shown in Figure
`The rotation on avail
`able workstation network IDs is in
`round-robin fashion
`This
`is available in current DNS systems The major ad
`functionality
`vantages of this technique are simplicity ease of implementation
`and reliability
`The DNS assigns the requests without consulting dynamically-
`changing system load information Then SWEll conducts
`fur
`Each processor in SWEB contains
`ther assignment of requests
`scheduler and those processors collaborate with each others
`exchaesge system load information After
`request is routed to
`processor via DNS the scheduler in that processor makes
`decision regarding whether to process
`this request or assign it
`Two approaches URL redirection or request
`another processor
`forwarding could be used to achieve reassignment and we use
`the former Request
`to implement
`forwarding is very difficult
`URL redirection gives us excellent compatibility
`within
`with current browsers and near-invisibility to users Any HTlP
`request is not allowed to be redirected more than once to avoid
`
`to
`
`to
`
`the ping-pong effect
`The functional
`
`structure of the scheduler
`
`depicted in Fig
`
`It contains
`
`at each processor is
`based on NCSA
`bttpd daemon
`broker module
`httpd code for handling httpd requests with
`which determines the best possible processor to handle
`given
`request The broker consults with two other modules the oracle
`and the loadd The oracle is miniature expert system which
`u.ser-supplied table to characterize the CPU and disk de
`uses
`The loadd daemon is responsible
`mands for
`task
`particular
`for updating the system CPU network and disk load information
`and marking those processors
`periodically every 2-3 seconds
`which have not responded in
`preset period of time as unavail
`able When
`joins the resource pool
`processor leaves or
`loadd daemon will be aware of the change
`
`the
`
`3.2
`
`The multi-faceted scheduling algorithm
`
`In this subsection we present he algorithm that decides where
`HTTP request should be routed As we discusted before several
`
`852
`
`scheduling system processor can be classified as lightly loaded
`and heavily loaded based on one parameter e.g CPU load One
`purpose of such
`classification is to update load information only
`when
`overhead
`to reduce
`
`classification changes
`
`it
`
`its
`
`broadcasting
`
`unnecessary
`In our problem context
`is hard to classify
`e.g
`load
`processor as heavily or lightly loaded since there are several
`light CPU load but
`processor could have
`parameters
`local disk may receive many access requests from the network file
`system Thus each processor in SWF.B has
`load daemon to detect
`its own CPU disk and network load periodically
`such information to other processors In our experiments we will
`show that such overhead
`is insignificant
`We address how multiple system load parameters can be used
`in deciding the assignment of H1TP requests
`Since
`time for each request we
`is to minimize the response
`our goal
`design the heuristic based on the estimated cost
`for processing
`each request using the following formula
`
`together
`
`ediest4ios
`
`tdaa
`
`tcypr
`
`t055
`
`tdjeet4on is the cost
`to redirect
`the request to another pro
`tdo is the time to transfer the required data
`cessor if required
`from the disk drive or from the remote disk if
`the file is not local
`tp is the time to fork
`process perform disk reading to handle
`HTIP request plus any known associated
`computational
`the request is CGI operation
`is the cost for transferring the
`the Internet We discuss
`and model
`
`cost
`
`if
`
`these
`
`processing results over
`
`individual
`
`cost
`
`terms as follows
`
`tdato
`
`Requested
`
`file size
`
`I5
`hie size
`Requested
`ninbj .td1b.5 xdz
`
`If the file is local the time required to fetch the data is simply
`the file size divided by the available bandwidth of the local
`We also measure the disk channel
`storage system
`there are many requests the disk transmission
`performance degrades accordingly
`
`load ft
`
`If
`
`If
`
`the data is remote then the file must be retrieved through
`the interconnection network
`The local network bandwidth
`burt and load f2 must be incorporated
`
`Experimentally
`
`Petitioner IBM – Ex. 1010, p. 3
`
`

`
`10% penalty for
`we found on the Meiko approximately
`remote NFS access and on the SUN workstations connected
`the cost increases by 50%-70%
`by Ethernet
`No of operations required
`CPU
`
`tapu
`
`server the estimated load on
`
`the
`
`term estimates the amount of processing tune re
`The tap
`quired to complete the task This is based on the speed of the
`destination node CPEII02d
`and the estimated number of operations required for
`task The computation requirement
`for
`particular
`request
`is estimated by the oracle component of the system see
`The parameters for different architectures are
`Figure
`
`file
`
`that some
`It should be noted
`saved in configuration
`estimated CPU cycles may overlap with network and disk
`time and the overall Cost may be overestimated sLightly but
`estimation works well in our experience
`this conservative
`The load estimation of remote processors is based on the
`periodic updating of information given by those ramolte pro
`cessors
`is possible that
`processor Pc
`is incorrectly
`believed to be lightly loaded by other processors and many
`To avoid this unsynchro
`requests will be redirected to it
`nized overloading we conservatively increase the CPU load
`of
`This strategy is found to be effective in
`by
`30%
`We use
`
`It
`
`ne5
`
`Jnse...cegLired
`np.5_handaidth
`
`to return
`This term is uscd to cstiinate the time necessary
`the network When the
`the results back
`to thc client over
`scheduler compares processors we assume all processors
`for this term so it
`will have basically the same cost
`is not
`estimated Our research goal is to produce the query result
`for HTFP request as fast as possible in the server site
`
`does not exist or is
`already determined to be
`retrieval of information then the request is always
`not
`completed ate Then the request is passed to the broker for
`analysis The broker then
`
`redirection
`
`Determines the server on whose
`resides if any
`
`local disk the file
`
`Calculates an estimated time for each available server-
`node for request
`
`Having determined the estimated time for each server
`the broker
`indicates its choice determined by
`the request
`the minimum time to completion to the main process
`
`to fill
`
`Redirection
`
`If
`
`the chosen server is not
`
`the request
`
`is
`
`redirected appropriately
`
`Fulfillment At this point
`in the
`the request
`is processed
`normal HYIP server manner with any CGIs executed
`needed and any client responses
`shuttled out the appropriate
`socket
`
`as
`
`It should he noted that modeling the cost associated with pro
`HTTP request accurately is not easy We still need to
`cessing
`function Our experiments
`investigate further the design of such
`show that
`the current cost
`function does reflect
`the impact of
`multiple parameters on the overall system response performance
`and that SWEB delivers acceptable
`performance based on such
`system re
`heuristic function adapting
`to dynamically-changing
`source loads In our experiments we will show that the overhead
`of SWEB scheduling is insignificant
`compared to the system load
`used for request fulfillment and other activities
`
`iredjrectjon
`
`tcjjt_servgr iatncy
`
`cemnect
`
`3.3 Performance Analysis
`
`This term measures the time necessary
`request from one server to another
`This is set
`the estimated latency of the connection
`
`to move rrp
`
`to twice
`
`between the server
`
`and the client
`
`plus the time for
`connection tconej
`The conceptual
`server to set up
`very short reply going back to the client
`model
`is that of
`browser who
`issues another
`then automatically
`request
`the new server address The transfer time is zero if
`the task
`to the target server The estimate of the link
`is already local
`latency is available from the TCP/IP implementation
`but in
`
`to
`
`We provide an analysis on the maximum sustained
`rps which
`ate achievable within our schema We use the following paranie
`is the number of nodes
`is the theoretical max sustained
`ters
`
`the bandwidth of
`is the average file size requested
`rps
`local disk accessing b2 is the bandwidth of remote disk accessing
`in prepro
`is the average redirection probability
`is the
`is the overhead
`of redirection and
`cessing
`request
`average processing time for each request Then we can show the
`maximum sustained rps for file fetches is
`
`is overhead
`
`is
`
`the initial
`
`implementation is hand-coded into the server
`
`____
`
`AdA0
`
`at processor
`
`the sched
`
`Given the arrival of HTTP request
`uler at processor
`goes through the following steps
`the HTTP com
`The server parses
`Preprocess
`request
`mands and completes the pathname given determining ap
`propriate permissions along the way
`It also determines
`whether the requested document exists has moved or is
`COl program to execute
`Analyze request The server then determines whether itself
`It does so by
`or another server should fulfill
`the request
`the results from the preprocessing phase If
`
`checking
`
`is
`
`in
`
`The full analysis is discussed
`5MB/s and b2
`4.5MB/s
`For example if
`2.88 then the maximum sustained rps is 17.3 for
`nodes We will show this is close to our experimental
`results in
`Section
`
`SWEB currently
`focuses on GET and related commands used in the
`but SWEB
`Otherconimands e.g POST arenothandled
`IlTfPprotocol
`could be extended to do so in the future
`
`853
`
`Petitioner IBM – Ex. 1010, p. 4
`
`

`
`Experimental Results
`
`4.1
`
`Scalability of overall performance
`
`file sizes requested
`
`it
`
`high-
`
`approximately
`
`Maximum rps on Meiko CS-2 and NOW The first experi
`ment was run to determine how many requests per second SWEB
`could process This depends on the average
`and the number of nodes In
`is reported that
`end workstation running NCSA httpd could fulfill
`one-node NCSA httpd 1.3 server per
`rps We examine how
`forms and compare it with the 6-node SWEB on Meiko CS-2 and
`the 4-node SWEB on NOW The maximum rps is determined by
`fixing the average file size and increasing the rps until
`requests
`start to fail which indicates that the system limit is reached The
`duration of the test which simulates the burst of simultaneous re
`quests also affects the experimental results The requests coming
`But
`short period can be queued
`and processed
`gradually
`
`in
`
`in
`
`be
`
`is
`
`120 seconds
`
`that
`
`single-
`
`long period cannot
`the requests
`continuously generated
`queued without actively processing them since there are new re
`quests coming after each second We use two types of
`tests One
`duration of 30 seconds
`and at each second
`short period as
`are launched The long period has
`constant number of requests
`in order to obtain the sustained maximum rps
`the maximum rps of
`It can be seen from Table
`by the multi-node server
`The maximum rps
`than that
`long
`period because
`short period can he
`queued For NOW results with 1.5MB file size 11 rps is reached
`for sustained maximum
`is achieved
`for duration of 30s but only
`rps This is because the maximum disk and Ethernet bandwidth
`limit is reached for this case These results have been confirmed
`via the analysis in Section 3.3 which gave an analyticalmaximum
`SM files on the Meiko consistent with
`sustained
`17.8 rps
`for
`the 16 rps achieved
`in practice It should be noted that in practice
`come in periodic bursts
`Thus we use
`
`to
`
`network
`
`testbed consists of Meiko CS-2
`Our primary experimental
`distributed memory machine at UCSB Each node has
`scalar
`chip with 32MB of RAM
`40Mhz SuperSparc
`processing unit
`running Solaris 2.3 We mainly use six CS-2 nodes each of which
`GB hard drive on which the test flies
`is connected
`dedicated
`reside Disk service is available to all other nodes via NSF mounts
`via modified fat-tree network with
`These nodes are connected
`peak bandwidth of 40MB/s Our secondary
`testbed is
`of SparcStation LXs running Solaris 2.4 connected
`via standard
`Each LX has
`local 525 MB hard drive and
`10 Mb/s Ethernet
`16MB of RAM The effective bandwidth of this Ethernet
`is low
`is shared by other UCSB machines
`since it
`The complete SWEB system is based on NCSA httpd 1.3
`the distributed functional
`source with modifications
`to support
`ity We plan to move to version
`library built on the Solaris TCP/IP
`All software uses the sockets
`the built-in TCP/IP library
`The use of
`
`.5 in the immediate future
`
`streanis implementation
`was
`
`deliberate decision based cii several
`
`factors including the
`
`following
`
`to use current WWW/HITP
`We wanted
`Gompatibility
`compatible clients for testing purposes as well as our own
`programs Current browsers typically use TCP/IP and sock
`ets or the equivalent
`
`Portability With an implementation built on standard
`any UNIX system our soft
`libraries available on virtually
`heterogeneous NOW with no
`ware can easily be moved to
`modifications
`
`speeds near
`
`The primary disadvan
`There are of course disadvantages
`we were only able to achieve ap
`tage is certainly performance
`proximately 5-15% of
`the peak communication performance on
`the Meiko where DMA transfea handled by the built-in Elan
`communications
`the peak
`co-processor can achieve
`We also felt
`bandwidth
`code complexity was
`that additional
`caused by not using the built-in messaging libraries such as NXJZ
`or Active Messages
`We ran
`
`series of tests where
`
`burst of requests would ar
`simulating the action of
`rive nearly simultaneously
`graphical
`number of simultaneous con
`browser such as Netscape where
`are made one for each graphics image on the page
`nections
`The clients were primarily situated within UCSB We also tested
`SWEB via requests
`from the East coast of the US Rutgers Uni
`versity to examine the benefits of scheduling in high network
`on the UCSB data on the seal-
`latency situation We concentrate
`ability of the system believing this more accurately reflects the
`high-bandwidth networks we anticipate
`
`It should be noted that the results we report are average per
`formances by running the same tests multiple times The test
`system loads
`performance is affected
`since the machines are shared by many active users at UCSB
`We report our experiments to examine the performance of our
`system in the following aspects
`scalability of overall perfor
`and overhead costs
`
`mance scheduling strategy comparison
`
`by dynamically-changed
`
`is improved significantly
`The speedups
`
`node server
`
`obtained in
`
`vary for different
`short period is much higher
`
`file sizes
`
`in
`
`requests
`
`accumulated in
`
`requests
`
`30-second test
`
`period in the rest of experiments representing
`
`non-trivial but
`
`Table
`
`limited burst of requests
`Response time and drop rate on Meiko CS-2 and NOW In
`we report the response time the time from after the client
`request until the completion of this request when we vary
`sends
`the number of server nodes The system starts to drop requests
`file requests 1K the
`the server
`reaches
`its rps limit For small
`multi-node server performs much better than one-node server but
`the response remains constant when using
`processors This is
`because none of
`limits on bandwidth
`the theoretical or practical
`
`if
`
`for small
`
`tiles
`
`to
`
`The re
`or processing power have been reached
`certain degree after the number of
`sponse time does increase
`which reflects the overhead
`processors exceeds
`required by the
`scheduling algorithm and distributed file system
`For relatively large files 1.SMB the processing time is sub
`stantially longer When the number of processors increases the
`SWEB provides substantially better performance
`These results
`are consistent with those of NCSA and also strongly confirm
`loads
`distributed server solution
`the notion that under heavy
`can achieve significant speedups Under especially heavy loads
`which would tend to occur during peak hours at popular sites
`tends to drop almost half or more of the connections
`single server
`made whereas
`
`distributed server might have
`
`average response
`
`time and fill every request
`
`854
`
`larger overall
`The superlinear
`
`Petitioner IBM – Ex. 1010, p. 5
`
`

`
`file sizes
`
`Single server
`SWEB
`
`1K
`
`45
`
`82
`
`Short period 30s
`NOW
`Meiko
`1.5M
`
`1.SM
`
`1K
`
`Sustained 120
`NOW
`Meiko__J
`1.5M
`1.5M
`
`1K
`
`24
`
`76
`
`11
`
`45
`
`16
`
`45
`
`10
`
`16f28
`
`Table
`
`Maximum rps for
`
`test duration of 30s and 120s on Meiko CS-2 and NOW
`
`that
`
`the total size of roam
`
`speedup we obtain reflects the fact
`ory in SWEB is much larger than on
`one-node server and that
`the multi-node server accommodates more requests within main
`memory while one-node server spends more time in swapping be
`tween memory and the disk Additionally the network overhead
`is distributed among multiple servers rather than concentrated
`single node
`
`at
`
`42 Comparison between
`strategies
`
`different scheduling
`
`is to
`
`results in
`
`Our scheduling strategy takes into consideration the locality of
`files and also the current resource loads We compare
`requested
`one is the NCSA approach
`that uni
`our approach with others
`lousily distributes requests to nodes round robin another
`the file locality by assigning requests to the nodes
`purely exploit
`that own the requested files We present experimental
`set of files with uniform and non-uniform sizes
`retrieving
`Request with non-uniform file size With non-uniform file
`by the initial DNS
`sizes the load distribution between processors
`assignment is heterogeneous We expect
`that the round-robin ap
`proach cannot adapt such load variations We tested the ability of
`the system to handle requests with sizes varying from short ap
`100 bytes to relatively long approximately 1.5MB
`proximately
`shows the actual
`time in seconds for this case
`For lightly loaded systems SWEB performs comparably with the
`20 SWEB has an
`others For heavily loaded systems rps
`advantage of 15-60% over round robin and file locality
`
`Table
`
`response
`
`RPS
`Round Robin
`
`File Locality
`SWEB
`
`16
`
`2.4
`
`2.5
`
`2.6
`
`20
`
`5.2
`
`4.4
`
`3.5
`
`24
`
`6.9
`
`7.9
`
`5.9
`
`32
`
`15.3
`
`9.3
`
`6.2
`
`1.5
`
`1.7
`
`1.7
`
`Performance under non-uniform requests 1.5MB
`Table
`file size 30 sec duration 0% drop rate Meiko CS-2
`
`It
`
`file locality end also consider CPU
`to exploit
`is important
`load We performed
`skewed test
`to illustrate the fundamental
`weakness of the file locality heuristic where each client accessed
`the same file located on
`single server effectively reducing the
`In this situation round-robin
`
`single server
`parallel system to
`handily outperforms file locality with and average response times
`This test was performed with six
`of 3.7s and 81.4s respectively
`rps for 45s and file size of 15MB
`
`servers
`
`server
`
`to
`
`We also ran clients on the east coast Rutgers University New
`performance gain of over 10%
`Jersey arid the tests results show
`using file locality instead of round robin from an Ethernet-linked
`the poor bandwidth and long latency over the
`in spite of
`from the east coast
`to the west coast
`connection
`Requests with uniform fiks size For uniform file sizes the
`DNS scheme assigns HLTP requests evenly to nodes which leads
`homogeneous distribution of CPU load but the network com
`munication load is not minimized We expect that the advantages
`high communication cost
`of the re-scheduling will manifest
`environment On Meiko CS-2 we conducted
`several
`tests and
`the result shows
`that three strategies
`have similar performance
`This is because NFS is implemented in Meiko CS-2 through the
`tree network and the chance of network contention
`fast Elan fat
`
`in
`
`is much smaller than on Ethernet
`relatively slow bus-type
`Ethernet in NOW environment
`the advantage
`of exploiting file
`locality is more clear This is verified in an experiment shown in
`
`In
`
`Table
`
`RPS
`Round Robin
`
`File locality
`SWEB
`
`48.1
`
`128.3
`
`243.7
`
`42.25
`
`41.6
`
`124.2
`
`124.8
`
`219.0
`
`210.4
`
`Table
`
`File size
`
`Performance under uniform requests
`.5MB 0% drop rate
`
`on NOW
`
`4.3 Overhead of SWEB
`
`Overhead distribution from the client point of view SWEB
`re
`code was instrumented to determine the overhead
`cost
`for
`1.5MB file fetched over
`shows the case of
`quest Table
`fairly
`heavily loaded system The results indicate that the overall over
`head introduced by SWEB analysis and scheduling algorithm is
`insignificant For small files approximately 1K the data transfer
`time is smaller but the overhead of SWEB compared to the pre
`processing time for parsing HTTP commands is still very small
`The direct cost of analysis is typically about 1-4 ms
`for cost
`redirection if necessary Indi
`estimation and
`iris to generate
`rect costs which affect
`redirection
`the time required to complete
`depend heavily on the latency between
`client and server For
`5M file on the Meiko of the 5.4 sec total time
`client fetching
`well over 90% is spent doing data transfer
`Overhead distribution from the server point of view From
`
`855
`
`Petitioner IBM – Ex. 1010, p. 6
`
`

`
`Meiko
`
`nodes
`
`lKfiletime
`1K drop rate
`l.5M file time
`
`1.5Mdroprate
`
`9.8
`0%
`
`0.40
`0%
`
`0.41
`0%
`
`0.44
`0%
`
`0.49
`0%
`
`72.5
`98.2
`37.3% 5.0%
`
`30.7
`23.7
`16.5
`3.5% 3.5% 3.5%
`
`ft
`
`IJ
`
`ft
`
`0.62
`0% ft
`9.9
`0.0% if
`
`NOW
`
`10.9
`0%
`
`0.83
`0%
`
`600
`
`201.4
`
`20.5%
`
`0.40
`0%
`
`170.6
`0%
`
`0.26
`0%
`
`167.6
`0%
`
`Table
`
`Performance in terms of response times and drop rates On Meiko CS-2 16 rps 30 sec duration On NOW 16
`.5M file The duration is 30s Time units are in seconds averaged over all requests Single server
`rps for 1k file
`rps for
`test timed out after no responses were received
`
`Activity
`Preprocessing Req
`Analysis SWEB
`Redirection SWEB
`Data Transfer
`
`Network Costs
`
`Total Client Time
`
`Time
`70 msec
`or msec
`
`msec
`
`4.9 sec
`
`0.5 sec
`
`5.4 sec
`
`Table
`
`.5M
`Cost distribution in average response time
`file size Meiko CS-2 Items marked SWEB are intro
`duced by the SWEB system All other
`standard HTIPd functionality
`
`times are due to
`
`the servers point of view we need to examine how much load the
`SWEB contributes for scheduling decisions and resource
`infor
`mation collection Our data shows that
`in processing requests

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket