throbber
Computing Practices
`
`NCSA's World Wide
`Web Server: Design
`and Performance
`T he recent explosion
`
`Thomas T. Kwan and
`Robert E. McGrath
`
`National Center for Super­
`
`computing Applications
`Daniel A. Reed
`
`University ofnlinois
`
`in the World Wide Web 0NWW) 1
`of interest
`
`can be traced to the distribution of the CERN (European
`
`
`
`Laboratory for Particle Physics in Geneva, Switzerland) and
`
`
`
`
`NCSA (National Center for Supercomputing Applications) servers and
`
`
`WWW client browsers. In particular, NCSA Mosaic, the graphical user
`
`
`
`
`interface for WWW browsing, based on distributed, multimedia hyper­
`
`
`
`text, has spawned several commercial variants and has made the Internet
`
`
`readily accessible to a much larger population than in the past.
`
`
`Network statistics from Merit, the NSFNet backbone management
`
`
`group, show that WWW traffic is the largest and by far the fastest grow­
`
`
`ing segment of the Internet, and growing numbers of government and
`
`
`commercial groups are making hundreds of gigabytes of data available
`via WWW servers. At the same time, the WWW servers at NCSAhave expe­
`
`
`
`
`rienced explosive growth in traffic, from 1 million requests per week in
`February 1994, to 2 million per week in June 1994, 3 million per week in
`
`September 1994, nearly 4 million per week in December 1994, and even
`
`larger numbers in 1995.2
`
`To support continued growth, WWW servers must manage a multigi­
`
`
`
`gabyte (in some instances a multiterabyte) database of multimedia infor­
`
`
`
`mation while concurrently serving multiple request streams. This places
`
`
`
`demands on the servers' underlying operating systems and file systems
`
`that lie far outside today's normal operating regime. Simply put, WWW
`
`
`servers must become more adaptive and intelligent. The first step on this
`
`
`
`path is understanding extant access patterns and responses. On the basis
`
`
`of this understanding, one can then develop more efficient and intelligent
`
`
`server and system file-caching and prefetching strategies.
`In this article, we describe extant access patterns and responses at
`
`
`
`
`
`
`NCSA's WWW server and the implications of that data. But first, we
`
`
`describe the context in which the data was collected...:...the NCSA WWW
`server architecture.
`
`- E
`
`xplosive Web traffic growth
`
`has placed burdens on Web
`
`servers that lie far outside
`
`today's normal operating
`
`regime. This article examines
`
`extant Web access patterns
`
`with the aim of developing
`
`NCSA WWW SERVER ARCHITECTURE
`
`Shortly after NCSA's WWW server was established, it became clear that
`
`the volume of WWW trafficwould stress operating systems and network
`
`
`
`
`implementations in ways not originally envisioned by their designers. At
`
`
`
`peak times, the NCSA server receives 30-40 new WWW requests per sec­
`
`
`
`ond, and because the Hypertext Transfer Protocol (HITP) is connection­
`less, each such request appears to the server as a separate network
`
`connection.
`Not only were most implementations of the TCP lIP network protocol
`
`
`
`not designed to accept connections at this sustained rate, even conserva­
`
`tive projections of request rate growth showed that no single processor
`
`
`system could serve all requests. To support the growing request rate, NCSA
`
`
`
`
`has developed a scalable WWW architecture that consists of a group of
`
`loosely coupled WWW servers. Though the servers operate independently,
`
`more efficient file-caching and
`
`
`collectively they provide the illusion of a single server.
`
`
`
`
`Development of the NCSA architecture required resolution of three key
`problems:
`
`prefetching strategies.
`
`-
`
`Computer
`
`0018-9162/95/$4.00
`© 1995 IEEE
`
`Petitioner Microsoft Corporation - Ex. 1068, p.1
`
`

`
`
`1. Information addressing.
`Externally, the NCSA server
`of memory and uses its local disk as a moderate-size (130
`
`
`
`megabytes) AFS cache. In addition, the local disk stores
`has a single domain name (www.ncsa.uiuc.edu).
`
`
`Incoming requests addressed to this domain name
`HTTP server log files and is the backing store for the vir­
`
`
`must be mapped to multiple servers, each with a sep­
`tual memory system.
`The WWW servers are connected to the AFS file servers
`
`
`arate, user-invisible domain name. This mapping
`
`via a 100-megabit! second Fiber Distributed Data Interface
`
`allows NCSA to invisibly add servers to accommodate
`ring (see Figure 1). The FODI ring connects to the rest of
`the growing number of incoming requests.
`
`2. Information distribution.
`Each server must be capable
`
`NCSA and to the Internet via a T3line.
`
`
`of responding to requests for any portion of the NCSA
`
`
`WWW server database. Otherwise, the servers must
`NCSA AFS configuration
`
`be more tightly coupled, an arbiter must distribute
`All documents provided by the NCSA WWW service are
`
`
`requests to servers on the basis of request type, and it
`
`served from the NCSA center-wide Andrew File System
`
`
`is likely that the arbiter will become the bottleneck.
`
`
`environment.6 This distributed file system is shared by
`3. Load balancing.
`
`
`many hundreds of client workstations and supercomput­
`
`The requests must be equally appor­
`
`tioned among the servers. Thus, newly added servers
`ers, as well as the WWW servers.
`
`will always share the load and contribute to the scal­
`
`
`
`AFS provides a single, consistent view of the file system
`
`ability of the implementation.
`to each WWW server, allowing each server to access the
`
`entire WWW documenttree.BecauseAFSclients (that is,
`
`
`the WWW servers) cache recently used files on their local
`The server architecture is based on three components:
`
`
`
`a collection of independent servers, a WWW document
`
`
`disks, the most frequently accessed documents are gen­
`tree shared among the servers and stored by the Andrew
`
`
`erally available locally, without remote disk access. In
`
`
`distributed file system (AFS),3 and a round-robin domain
`
`
`
`effect, AFS caching replicates the document tree on each
`
`name system (ONS) that multiplexes the domain name
`WWW server.
`Because AFS manages the shared document tree, the
`
`
`www.ncsa.uiuc.edu among the constituent servers.
`the NCSA WWW service
`With this architecture,
`is always
`
`individual WWW servers need not and do not know either
`
`the same, although the number and identity of the par­
`
`
`
`the number or identity of the other servers. It is difficult to
`ticular servers
`
`may change from day to day. Beginning with
`
`
`overemphasize the importance of this point. This allows
`
`
`rapid, "plug-and-play" addition (and removal) of compo­
`
`one server in February 1994, the architecture grew to four
`servers in May, eight servers in November, and nine in
`
`
`nent servers and the use of heterogeneous systems. In
`
`early 1995. To meet increasing demand, NCSA will con­
`
`practice, we have found that servers can be added or
`tinue to add servers as needed.
`removed from the ensemble in under an hour.
`
`
`Below, we briefly describe each of the three server com­
`
`
`
`ponents. For additional details on the server architecture,
`see Katz et a1. 4 and Kwan et a1. 5
`
`,�===l ;AFSfile,1
`�ver '
`
`The servers and the network
`The NCSA WWW server architecture is flexible enough
`
`
`
`to accommodate most Unix systems as component servers.
`
`
`The only requirements are that the systems function as AFS
`
`clients and support TCP lIP. The servers need not be homo­
`
`
`geneous; the particular systems in use vary from time to
`
`
`time and may be a heterogeneous collection of systems.
`To date, the backbone of the NCSA WWW service has
`I www server
`
`
`been a group of dedicated Hewlett-Packard HP 735 work­
`(AFS Clients)
`
`
`stations. Though these systems are not generally consid­
`'--- __ �I L __
`
`
`
`ered "servers," their efficient TCP lIP implementation has
`
`made them an effective choice to process WWW requests.
`Figure 1. The NCSA scalable WWW server.
`
`In the NCSA configuration, each HP 735 has 96 megabytes
`
`FOOl ring
`
`_
`
`Glossary
`NCSA Mosaic-A freely available browser developed at
`
`Browser-Software that allows users to view documents
`
`NCSA. (NCSA Mosaic is a trademark of the Board of Trustees
`
`retrieved from the Internet.
`
`
`Client-Software responsible for communicating with
`
`of the University of Illinois.)
`for making local documents Server-Software responsible
`
`
`servers to retrieve necessary documents and files.
`
`
`to other software systems. or files available
`
`Firewall-A computer system and network interface that
`
`World Wide Web (WWW)-A global information system
`
`
`
`maintains security for an organization by filtering incom­
`ing networking requests.
`
`providing hypertext-linked access to resources on the
`The WWW also incorporates Internet. existing network ser­
`
`HTTP-Hypertext Transfer Protocol (HTIP) is a stateless
`
`
`
`protocol used by the WWW. data-transfer
`
`vices, such as FTP and Gopher.
`
`November 1995
`
`-
`
`Petitioner Microsoft Corporation - Ex. 1068, p.2
`
`

`
`Our experience to date has been thatAFS's local caching
`
`Round-robin domain name system
`to the success of the NCSA WWW server archi­
`The third and final component of the NCSA scalable
`is critical
`WWW server is a modified network
`
`name resolver based
`
`
`
`tecture. Stateless distributed file systems (for example,
`
`on the Berkeley Internet Name Domain (BIND) code.?
`
`NFS) cannot exploit the locality inherent in the HTTP
`
`
`The existing BIND 4.9.2 code has a round-robin option
`
`
`request stream by locally caching frequently requested
`
`that can associate a single domain name with several IP
`
`
`
`items. Instead, they must repeatedly retrieve those items
`
`
`
`addresses. In response to requests, these addresses are dis­
`
`
`from a shared file server. Not only does this increase the
`
`
`
`tributed using a simple rotation algorithm. Because this
`
`load on the file server, it is inherently unscalable.
`
`
`Despite the advantages oflocal caching, much research
`
`
`
`rotation conflicted with extant software at NCSA, the
`
`remains before we learn how distributed file systems in gen­
`BIND software was modified to rotate only specific
`namely those of the WWW servers (see Katz et
`addresses,
`eral, and AFS in particular, support large, less frequently
`
`al. 4 for details) .
`
`accessed files (for example, 24-bit color images and digital
`The modified domain name system (DNS) allows a
`
`
`
`video clips). With standard caching algorithms, access to
`
`domain name with more than one associated IP address to
`
`
`
`
`these files will displace smaller, more frequently accessed
`be specified as "round-robin." Each incoming request forthe
`files from the local cache. Ifthese large, nontext files are not
`
`
`address of a round-robin domain name is satisfied by the
`
`cached, their access latencies will be large. Data-type­
`
`next IP address on the list in a simple rotation. Thus, l/Nth
`
`
`specific caching algorithms are one potential solution.
`
`www server performance visualization
`the small variations are due to differing request patterns.
`
`
`
`To gain insights into the large volume of access and per­
`formance data in the WWW logs, we relied on a variety of
`
`Figure B shows a global view of the origin of the requests.
`
`The height of the bar at each geographical location repre-·
`
`
`standard statistical data analysis tools. However, to under­
`sents the number of bytes requested by that location; the
`stand the dynamics of server behavior and the interactions
`
`
`
`different color segments represent the different data types.
`of request patterns with round-robin DNS system, we
`at 6 p.m., local time, of a typical Figure B shows the activity
`
`
`
`exploited the local availability of the CAVE, an immersive,
`
`workday. Because of the time zone difference, most of the
`
`unencumbered virtual environment, and our Avatar visual­
`
`requests at this time are originating from the west coast of
`
`
`ization software,' to create dynamic displays of server
`
`the United States. (The bar at the north pole represents sites
`
`behavior. Figures A and B show snapshots of this visualiza­
`
`
`that cannot be mapped to a specific geographical location.)
`tion from a "day in the life"of the NCSA WWW
`server.
`
`
`In Figure A, the trajectories of four different servers in
`
`For details about the infrastructure of this visualization
`see Reed et aI.' in this issue of Computer. environment,
`
`the performance metric space are denoted by the four col­
`
`
`ored ribbons. This snapshot, from near noon on September
`7,1994, shows that the round robin DNS system effectively
`Reference
`1. D.A. Reed et aI., "Virtual Reality and Parallel Systems Perfor­
`
`balances the server load-the trajectories of all the servers
`
`
`Computer, Vol. 28, No. 11, Nov. 1995, p. 57-67.
`
`cluster in the same region of the performance metric space;
`
`mance Analysis,"
`
`Figure A. WWW server
`
`visualization.
`
`Figure B. Origin of WWW requests at 6 p.m. local time.
`
`-
`
`computer
`
`Petitioner Microsoft Corporation - Ex. 1068, p.3
`
`

`
`Table 1. Server request origins
`response to the hardware
`of the DNS requests get each oftheN different IP addresses.
`
`
`by domain.
`
`
`and software capabilities of
`This allows NCSA to maintain a group of WWW servers
`
`the requesting platform.
`aliased by the single domain name www.ncsa.uiuc.edu.
`Internet domain Percentage
`The user agent logs from
`Adding a new server to the group is as simple as adding its
`of requests
`the first 20 days of Decem­
`IP address to the DNS entry for www.ncsa.uiuc.edu.
`! -E-d -u-ca-t-io-n--(e-d-u-)--------26---- ---
`ber 1994 show that 31 per­
`DATA COLLECTION
`
`cent of all connections were
`
`Commercial (com) 18
`from X Windows clients,
`38
`If the NCSA WWW service has accomplished nothing
`
`
`Government (gov) 5
`percent from Microsoft
`
`else, it has produced copious amounts of performance-and
`51
`Others
`
`Windows clients, 20 per­
`
`
`
`access-pattern data. This data is collected continuously on
`
`cent from Macintoshes, and
`
`each server and is permanently archived each day to be
`21 percent from all other
`
`
`
`available for researchers. Collectively, the files constitute
`
`types of clients. This data shows that at least 58 percent of
`more than 150 megabytes of data each weekday.8
`
`
`
`the requests originate from personal computers. As ven­
`
`On each of the component WWW servers, the data col­
`
`
`dors continue to ship new and improved versions ofWWW
`lected includes
`
`
`browsers for personal computers, we expect requests from
`
`
`personal computers to grow at a very rapid rate. However,
`• the standard access logs from the NCSA HTTP dae­
`
`because of the relatively low bandwidth (modem) con­
`mons (httpd),
`
`
`
`nections from most personal computers to the Internet, it
`• the standard error logs from the httpd daemons,
`
`is becoming increasingly important for WWW servers to
`
`• a custom log of the client browser type (the "user
`adapt to client needs (for example, by sending lower reso­
`
`agent") that initiated each request,
`
`
`
`lution images) and for clients to prefetch selected data to
`
`
`• a trace of virtual memory statistics, obtained by
`Unix vmstat
`hide the long latency for data retrieval.
`recording
`data once each minute,
`
`• a trace of packet counts, obtained by recording Unix
`netstat
`Domain characteristics
`data once each minute, and
`ps once
`Much discussion has centered on the commercial poten­
`
`
`
`
`• a count of active processes, sampled with
`
`tial of the World Wide Web and the increasing accessibil­
`every 5 minutes.
`
`
`
`ity of commercial information. To assess the number and
`REQUEST PATTERN ANALYSIS
`
`
`
`
`distribution of commercial and other requesting sites, we
`
`aggregated domain names into a small number of broad
`
`
`To understand the access pattern and characteristics of
`
`
`
`
`categories: educational, commercial, government, and
`
`
`NCSA's WWW service, we analyzed the data described above
`
`
`
`
`other. Table 1 summarizes the fraction of requests from
`
`
`for selected weeks during five different months of 1994.
`
`the major Internet domains.
`
`
`Below, we present the qualitative results with respect to the
`Although Table 1 shows that the edu domain generates
`
`general access trends, the domain characteristics, and the
`
`file type distribution (see Kwan et al. 5 for details).
`
`
`more requests than any other single domain, Figure 2
`shows that the number of requests from commercial
`
`domains is growing rapidly. (For each month, the figure
`General trends
`
`shows seven data points, corresponding to Sunday
`
`
`Qualitatively, WWW traffic growth on the Internet is
`through Saturday of the week we analyzed during that
`
`well known. However, the specific characteristics of this
`
`
`
`month.) This reflects the increasing presence of commer-
`
`growth and the sources of requests are much less well
`
`
`
`understood. Hence, the initial goal of our analysis was a
`---_._
`
`simple characterization ofWWW traffic in terms of request
`-----�-----------
`
`count, request data volume, and request sources (by hard­
`I • edu
`120 � C (')rll
`ware platform type).
`i .) -l0V
`TRAFFIc GROWTH. The number
`I
`100r-
`-0 c � :J
`of requests received by
`
`
`the NCSA WWW servers during the period of our analysis
`80 � ,--\,
`grew from about 300,000 per day in May 1994 to about
`0
`.<:: .., '" .., '" CJJ
`60� ,
`i r" I
`
`500,000 per day in September. Thus, the compounded
`
`growth rate over the five· month period is roughly 14 per­
`I .
`cent per month. A scan of NCSA's January 1995 WWW
`I/�'
`0-�
`40 ff
`server logs shows that the number of requests has
`to about 690,000 per day. As a result,
`'+-0 ... CJJ
`increased
`the com­
`tl • .
`pounded growth rate is about 11 percent per month from
`.0 E :J
`20r �;;,
`May 1994 to January 1995. Forthe restof1995, however,
`z
`
`the number of requests to the NCSA server have slowly
`'.�
`oJ '4
`�
`"�
`
`decreased. (See File2 for the latest 1995 statistics.)
`June July Aug Sept
`May
`CUENT PLATFORMS. Knowing the platform
`from which
`Figure 2. Weekly domain request statistics
`
`
`a request originated has great potential value. Information
`to September 1994. Each data point represents a
`
`
`providers can customize documents for different platforms,
`
`and servers can exploit this knowledge by tailoring their
`
`
`:J
`
`�-� L �,
`
`Sunday-Saturday cycle analyzed during the month.
`
`from May
`
`November 1995
`
`-
`
`----- ----------
`• .. .
`
`•
`
`��,
`
`\ ...
`
`� . \
`A �--...� .,
`? '
`t
`
`Petitioner Microsoft Corporation - Ex. 1068, p.4
`
`

`
`fer rates. Interestingly, the temporal distribution of the
`
`
`cial Internet service providers and the growing use of the
`
`
`
`
`requests for audio and video is skewed toward later in the
`
`
`Internet by the staff of commercial organizations.
`
`day than the distribution of those for text and images. We
`
`Although the top 10 educational and government
`
`
`
`
`conjecture that users seek off-peak times to retrieve large
`
`
`domains (which generate the largest number of requests
`items from the server.
`to NCSA's server) change almost daily, the top 10 com­
`
`
`One should be chary about projecting access charac­
`
`mercial domain names change little. Indeed, most of the
`The NCSA WWW document tree is
`
`teristics from this data.
`
`top 10 commercial domain names on any given day were
`As WWW
`
`dominated by a large number of small objects.
`
`also among the top 10 domain names throughout the five
`
`document repositories mature, we expect them to contain
`months of data we analyzed.
`
`
`a much larger number oflarge scientific and technical data
`The domain names in the com domain are mainly net­
`
`
`sets, scientific visualizations and video clips, and audio
`
`
`work firewalls for large organizations; they have long con­
`
`
`segments. This shift will accentuate the behavior found in
`
`nection times and make an unusually large number of
`
`
`this study: Many of the requests will be for small data
`
`
`requests. Because a firewall acts as a central location for
`
`
`items, but an increasing fraction of the data volume will be
`
`
`accessing data outside a given organization, it is the ideal
`
`
`
`associated with requests for large, nontext items.
`
`
`location for implementing network caching and proxy
`
`servers, a topic to which we will return.
`
`SERVER CACHING
`To this point, our focus has been on the characteristics
`Media distributions
`As we noted above, the request rate to the NCSA WWW
`
`of the request stream. We turn now to an examination of
`
`
`the servers' "response" to the incoming request stream.
`
`server is growing at a compounded rate of between 11 and
`
`Effective, distributed file caching was one of the key
`
`14 percent per month. In addition to the rate, the charac­
`in NCSA's WWW server architecture.
`design principles
`
`
`
`teristics of the growth have important implications for
`Local caching at the WWW servers
`WWW server implementation.
`reduces the load on the
`For example, satisfying
`
`
`shared AFS file servers, minimizes file traffic on the FDDI
`
`
`large numbers of requests for small, text-based documents
`ring, and allows the WWW servers to respond quickly to
`
`
`is much easier than responding to large numbers of
`
`
`
`requests for frequently accessed documents. To measure
`
`requests for color images, video clips, or large data files.
`
`
`the effectiveness of the current AFS caching protocols, we
`
`Because the HTTPD server logs contain the name of the
`analyzed the WWW server logs to identify the character­
`
`
`document being requested, and the file extension can be
`
`
`istics of the most frequently requested documents.
`
`used to identify the document category, it is possible to
`As mentioned above, NCSA serves documents from the
`
`
`determine the relative request frequency for text, images,
`
`AFS distributed file system, which automatically caches
`
`audio, video, and data. The text category includes
`
`the most recently used files in local AFS client caches. The
`
`
`
`Hypertext Markup Language (HTML) documents, plain
`
`left portion of Figure 4 shows the number of distinct files
`
`
`
`text, and postscript files; the image category includes GIF,
`X bitmap (xbm) , JPEG, and RGB files; the audio category
`
`requested per day during the five months of our analysis;
`the right portion shows the total size of these same files.
`
`
`includes au, aiff, and aifc files; and the video category
`Comparing the two figures shows that although the
`
`includes MPEG and QuickTime files.
`
`
`
`number of distinct files requested has increased, the total
`Figure 3 shows that text and images account for the
`
`size of all the requested files has remained under 450
`
`majority of the requests. Although audio and video
`megabytes per day. Most of the newly added files have
`
`account for only 1 percent of the requests, they represent
`
`been small text and image files. To date, the AFS client
`
`
`28 percent of the bytes transferred. The requests for large
`cache hit ratios for the WWW servers have been near
`90
`
`audio and video files also lead to more bursty data trans-
`
`350.,------ ----------------------
`300 L
`250 f--
`
`• Text
`'""'les
`� �ud:o
`-::: '/Ideo
`
`I
`
`� '.
`
`. .
`
`• Text
`o Images
`o Audio
`* Video
`
`3
`
`;)
`:J
`
`-;)
`
`:J..
`
` 2
`.r .:; >-
`u 2:
`
`�
`'"
`C"l
`
`Time of day (hour� . ___ _
`Figure 3. File type statistics
`
`by rate (left) and volume (right).
`
`20 23
`4 8
`
`Time ()f day (hours)
`
`-
`
`Computer
`
`Petitioner Microsoft Corporation - Ex. 1068, p.5
`
`

`
`searches) that must be retained by a WWW server.
`
`percent, suggesting that AFS caching has worked quite
`
`
`
`
`
`
`Supporting such extensions may be difficult for a multi­
`well for the past access patterns.
`
`
`server architecture that relies on round-robin DNS. A sec­
`Note that not only does the AFS file system cache fre­
`
`ond request may be sent to a different server than the one
`
`
`quently accessed files on the local disk of the WWW servers,
`
`
`holding the result of the previous request. Unless the data
`
`
`but also the most frequently accessed of those files are
`
`is shared (for example, via AFS), obtaining the requisite
`
`cached in the primary memory of the WWW servers. With
`
`
`information will require closer server cooperation, with
`
`
`the observed access patterns to NCSA's WWW servers, less
`
`associated overhead.
`
`than 60 megabytes of primary memory cache space is
`
`needed to satisfy 95 percent of all incoming requests, which
`HTTP protocol extensions
`
`
`corresponds to roughly 800 distinct files. Though most
`The overriding trend from our data analysis is the con­
`
`
`
`
`requests are small, a small number of requests retrieve large
`
`
`tinued growth in request rate. Currently, each request
`
`items. For this reason, satisfying 95 percent of the requests
`
`
`from the client uses a separate TCP connection, and the
`
`represents only 80 percent of the total data volume.
`
`
`large number of short-lived TCP connections limits the
`IMPLICATIONS
`
`
`
`
`performance of the server. This problem is exacerbated by
`As the number of requests to NCSA's and other WWW
`
`the fact that a document may be composed of several
`
`
`
`pieces, each of which is fetched separately, with each fetch
`
`
`
`servers continues to grow, the continued scalability of the
`
`
`requiring a separate TCP/IP connection. Padmanabhan
`
`server architecture, the efficiency of the HTTP protocol,
`and Mogul9 have proposed opening a single TCP connec­
`
`
`and the effectiveness of caching strategies become increas­
`
`tion per HTML document to avoid unnecessaryTCP over­
`
`
`
`
`ingly critical research and implementation issues. Let's
`
`
`head; preliminary experiments show that this reduces
`
`
`examine salient aspects of each issue.
`
`
`document retrieval latency. Spero10 has proposed a new
`
`
`protocol, HTTP-NG, which dramatically alters HTTP to
`Scalability and persistent state
`
`
`
`
`reduce overhead, allow more parallelism, and efficiently
`
`Although round-robin DNS has allowed NCSA to add
`
`support features such as authentication.
`
`WWW servers without piercing the illusion that
`
`
`www .ncsa.uiuc.eduisa singlehost, the use of round-robin
`
`These and related protocol changes will reduce the
`
`latency to deliver data and transmit more data over each
`
`
`DNS is not an ideal solution to either the decoupling of
`
`TCP lIP connection. It will make HTTP servers much more
`
`
`logical WWW server names from the physical server iden­
`like FTP and other session-oriented
`
`services. This may well
`
`
`tity or to request load balancing. With this approach, the
`
`make much better use of the available network bandwidth
`
`
`distribution ofWWW server addresses is divorced from
`and other server resources.
`
`
`the characteristics and load of the constituent servers.
`
`While the round-robin mechanism equally distributes
`
`
`
`the IP addresses of the constituent servers, there is no
`Distributed caching and prefetching
`
`Beyond reducing the network protocol overhead, one
`
`
`
`mechanism to limit the number of times an address is used
`
`can also aggressively cache and pre fetch the data. At the
`
`
`after it is distributed, or to guarantee that the client sys­
`moment, various browsers cache data on local client disks
`
`tem will honor the advertized time to live (TTL). For
`
`
`to improve performance. Pitkow and Reckerll have shown
`
`
`instance, a local DNS service might distribute a single IP
`that caching based on recent rates of past access is an effec­
`
`address to any number of clients in its domain.
`
`tive technique. However, to design and implement effec­
`
`
`
`Moreover, envisioned extensions to HTTP includelong­
`
`
`lasting state (for example, the results of previous database
`
`tive prefetching, one must first study and understand the
`
`6,000 ;
`
`5,000
`I
`'" 4,000 f-
`(l)
`! I •
`;,=
`, • ,
`'+-
`0
`'-
`(l)
`
`.)
`
`Z
`
`• 1\ �,
`\ /' �i
`Ii, --�
`,� \ i \
`0,
`, ""� '� ,
`'-.
`0
`V
`
`•
`•
`
`0 ,
`"
`
`,
`
`-
`
`�
`
`>.
`.:::l
`
`.D
`
`'" 400 :.;
`'" ?'
`g 300
`J:: ... >-
`'0 200
`
`� 100lL
`!
`1l +"'
`
`� J May
`
`-----._-------
`
`-------- ------
`
`- - .�
`
`
`
`Each data point represents a Sunday-Saturday cycle analyzed during that month.
`
`November 1995
`
`-
`
`3,000 to' \ I
`.:::l E ::l 2,000 �
`1,000� I
`!
`01
`June July �:"uJ Sept
`May June July Aug Sept
`Figure 4. Request profile: number of distinct files requested (left) and total size of all files requested (right).
`
`'-QJ
`
`
`.D
`<::
`
`Petitioner Microsoft Corporation - Ex. 1068, p.6
`
`

`
`extant access patterns. Our data suggests that partitioned
`
`3. M. Satyanarayanan, "Scalable, Secure, and Highly Available
`
`
`caches are a promising alternative. However, prototype
`
`Distributed File Access," Computer, Vol. 23, No.5, May 1990,
`
`
`
`
`implementations and trace-driven simulations are needed
`pp.9-21.
`
`to measure the performance benefits that might accrue
`
`4. E.D. Katz, M. Butler, and R. McGrath, "A Scalable HTTP
`from this approach.
`
`Computer Networks and ISDN
`Server: The NCSA Prototype,"
`We noted that the most prolific sites are all commercial
`Systems, Vol. 27, 1994, pp. 155·164.
`5. T.T. Kwan, R.E. McGrath,
`gateways. Moreover, about 2 percent of the requests to the
`and D.A. Reed, "User Access Pat­
`
`NCSA WWW servers are from hosts that make only one
`terns to NCSA's World Wide Web Server," Tech. Report
`request. The most popular of these requests are to the
`
`
`UIUCDCS-R-95-1934, Dept. Computer Science, Univ. of illi­
`
`"directory" pages, namely the NCSA Internet Starting
`
`nois, Urbana-Champaign, Feb. 1995.
`
`
`Points, the Internet Resources Meta-Index, and the What's
`
`6. NCSAAFS Users Guide, National Center for Supercomputing
`
`
`New pages. These pages are excellent candidates for repli­
`
`
`
`Applications, Univ. of Illinois, Urbana-Champaign, 1994
`
`cation and caching throughout the Internet, particularly
`
`(http://www.ncsa.uiuc.edu/Pubs/UserGuides/ AFS­
`at commercial gateways.
`Guide/ AFSv2.100.html).
`In the future, as audio and video clips playa larger role
`
`
`7. P. Albitz and C. Liu, DNS and BIND in a Nutshell, O'Reilly and
`
`in conveying multimedia information, audio and video
`
`
`Associates, Sebastopol, Calif., 1992.
`
`requests will significantly affect network traffic and
`8. R.E. McGrath, What We Do and Don't Know About the Load on
`
`caching strategies. As we have seen, even a small increase
`the NCSA WWW Server, Sept. 1994 (http://www.ncsa.uiuc.
`
`in the use of these data types will dramatically increase
`
`edu/InformationServers/Col!oquia/28.Sep.94/Begin.html).
`
`the amount of data to be read and transmitted, with a con­
`9. Y.N. Padmanabhan and J.C. Mogul, "Improving HTTP
`
`
`comitant deleterious effect on the efficiency of server
`
`Latency," Pmc. Second Int'l WWWConf., 1994, pp. 995-1,005
`caching strategies.
`(http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/DDay/
`
`moguI!HTTPLatency.html) .
`
`WE HAVE DESCRIBED THE DESIGN OF NCSA's WWW SERVER
`
`10. S. Spero, Progress on HTTP-NG, 1994 (http://www7.cern.ch/
`and analyzed the access patterns to the server in terms of
`
`
`hypertext!WWW /Protocols/ HTTP-NG/http-ng-status.html).
`the user request patterns and the responses of the server.
`11. J.E. Pitkow and M.M. Recker, "A Simple Yet Robust Caching
`
`
`
`The analysis shows that scalability, protocol efficiency, and
`Froc. Second
`
`Algorithm Based on Dynamic Access Patterns,"
`
`
`effective caching strategies are the major issues for the
`
`
`Int'l WWW Conf., 1994, pp. 1,039-1,046 (http://www.ncsa.uiuc.
`
`
`
`next generation ofWWW servers. In particular, we believe
`
`ed u/SDG/IT94/Proceedings/DDay /pitkow / caching.html).
`
`that to improve performance, both clients and servers
`Thomas Kwan is a doctoral candidate
`
`
`must aggressively exploit caching and prefetching on the
`in the Department
`
`at Urhana­of Computer Science at the University of Illinois
`basis of knowledge of request patterns, data types, and
`
`
`I
`hardware capabilities.
`
`Champaign and a graduate research assistant at the
`
`
`National Center for Supercomputing Applications. His
`Acknowledgments
`
`
`
`
`research interests include parallel computing, gigabit appli­
`Our thanks to Eric Katz for providing us with the initial
`
`
`
`
`cations, and World Wide Web technology. He received a BS
`
`
`degree in electrical engineeringfrom the University ofWash­
`
`log analysis scripts, to Nancy Yeager, Michelle Butler, and
`
`ington and an MS degree in computer science from the Uni­
`
`Paul Zawada for providing crucial assistance in under­
`
`
`versity of Illinois at Urbana-Champaign. He is a member of
`standing the NCSA WWW server, and to Charlie Catlett,
`IEEE, ACM, and Tau Beta Pi.
`without whom this work would not have been possible.
`
`Finally, thanks to Will Scullin and Steve Lamm for devel­
`Robert McGrath is a research programmer at the
`
`
`
`oping the virtual reality software used to display dynamic
`server behavior.
`
`
`National Center for Supercomputing Applications. His
`
`
`research centers on the architecture and performance of
`Thomas Kwan is supported in part by the National
`
`Science Foundation and the Advanced Research Projects
`
`
`
`large-scale distributed systems. He is a coauthor of

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket