throbber
The following paper was originally presented at the
`Ninth System Administration Conference (LISA ’95)
`Monterey, California, September 18-22, 1995
`
`Administering Very High Volume Internet Services
`
`Dan Mosedale, William Foss, and Rob McCool
`Netscape Communications
`
`For more information about USENIX Association contact:
`1. Phone:
`510 528-8649
`2. FAX:
`510 548-5738
`3. Email:
`office@usenix.org
`4. WWW URL: http://www.usenix.org
`
`Petitioner Microsoft Corporation - Ex. 1073, p. 1
`
`

`
`Administering Very High
`Volume Internet Services
`Dan Mosedale, William Foss, and Rob McCool – Netscape Communications
`
`ABSTRACT
`
`Providing WWW or FTP service on a small scale is already a well-solved problem.
`Scaling this to work at a site that accepts millions of connections per day, however, can
`easily push multiple machines and networks to the bleeding edge.
`In this paper, we give
`concrete configuration techniques that have helped us get the best possible performance out
`of server resources. Our analysis is mostly centered on WWW service, but much of the
`information applies equally well to FTP service. Additionally we discuss some of the tools
`that we use for day-to-day management.
`We don’t have a lot of specific statistics about exactly how much each configuration
`change helped us. Rather, this paper represents our many iterations through the ‘‘watch the
`load increase; see various failures; fix what’s broken’’ loop. The intent is to help the reader
`configure a high-performance, manageable server from the start, and then to supply ideas
`about what to look for when it becomes overloaded.
`
`Our Site
`runs what we
`Netscape Communications
`believe to be one of the highest volume web services
`on the Internet. Our machines currently take a total
`of between six and eight million HTTP hits per day,
`and this number continues to grow. Furthermore, we
`make the Netscape Navigator, our web browser,
`available for downloading via FTP and HTTP[1].
`The web site[2] contains online documentation
`the Netscape Navigator, sales and marketing
`for
`information about our entire product line, many gen-
`eral
`interest pages including various directory ser-
`vices, as well as home pages for Netscape employ-
`ees. All of the machines run the Netscape Server,
`but most of the strategies in this paper should apply
`to other HTTP and even FTP servers also.
`At various times, we have tried out various
`configurations of machines running the given operat-
`ing systems; Figure 1 shows a list.
`Each of our WWW servers has an identical
`content
`tree uploaded to it
`(more on this later).
`Before the Netscape Navigator was released to the
`Internet community for the first
`time, we thought
`about
`the web pages we intended to serve and
`
`debated how we would spread the load across multi-
`ple machines when that became necessary. Because
`of problems reported using DNS round-robin tech-
`niques[3], we chose to instead implement a randomi-
`zation scheme inside of
`the Netscape Navigator
`itself.
`In short, when a copy of the Navigator is
`accessing home.mcom.com or home.netscape.com, it
`periodically queries the DNS for a hostname of the
`form homeX.netscape.com, where X is a random
`number between 1 and 16. Each of our web servers
`has a number of the homeX aliases pointing to it.
`Since this strategy is not something that will be
`available to most sites, we won’t spend more time
`on it here.
`Another scheme which we have looked into is
`nameserver-based load balancing.
`This
`scheme
`depends upon a nameserver periodically polling each
`content server at site to find out how loaded they are
`(though a less functional version could simply use
`static weighting). The nameserver then resolves the
`domain name to the IP address of the server which
`currently has the least
`load. This has the added
`benefit of not sending requests to a machine that is
`overloaded or unavailable, effectively creating a poor
`man’s failover system.
`It can, however,
`leave a
`
`Sun uniprocessor (60 Mhz) SPARCserver 20
`SGI 2-processor Challenge-L
`SGI Indy (150 Mhz)
`SGI Challenge-S (175 Mhz)
`HP 9000/816
`90 Mhz Pentium PC
`90 Mhz Pentium PC
`
`Solaris 2.3 & 2.4
`IRIX 5.2
`IRIX 5.2 & 5.3
`IRIX 5.3
`HP-UX 9.04
`Windows NT 3.5
`BSD/OS 2.0
`
`Figure 1: Tested hardware/software configurations
`
`1995 LISA IX – September 17-22, 1995 – Monterey, CA
`
`95
`
`Petitioner Microsoft Corporation - Ex. 1073, p. 2
`
`

`
`Administering Very High Volume Internet Services
`
`Mosedale, Foss, & McCool
`
`dead machine in the server pool for as long a dura-
`tion as the DNS TTL. Two DNS load balancing
`schemes that we plan to investigate further are RFC
`1794[4] and lbnamed[5].
`Figures 2 and 3 summarize some of the perfor-
`mance statistics for the various servers on July 26,
`1995 along with their assigned relative load. Our
`setup allows us to give each machine a fraction of
`the total traffic to our site measured in sixteenths of
`the total.
`the
`In addition to assuming 1/16 of
`Note:
`WWW load,
`the 150 MHz Indy also wears the
`aliases www.netscape.com, home.netscape.com, and
`currently runs all CGI processes for the entire site.
`Total CGI load for this day accounted for 49,887 of
`its 1,548,859 HTTP accesses (more on this later).
`
`The Tools
`The traffic and content size of our web site
`grew very quickly, and we collected a number of
`tools to help us manage the machines and content.
`We used a number of existing programs, some of
`which we enhanced, and developed a few internally
`as well. We’ll go over many of these here;
`the
`intent is to focus on tools that are directly useful in
`
`managing web servers and content. We will avoid
`discussing HTML authoring programs and utilities;
`for
`those who are
`interested,
`see http://home.
`netscape.com/home/how-to-create-web-services.html
`for pointers to many such tools.
`Document Control: CVS
`Since our content comes from several different
`sources within the company, we chose to use
`CVS[6] to manage the document
`tree. This has
`worked moderately well, but is not an ideal solution
`for our environment.
`the creation of content
`In some some ways,
`resembles a mid-to-large programming environment.
`A document revision system became necessary to
`govern the creation of our web site content as multi-
`ple ‘‘contributing editors’’ added and deleted material
`from the content tree. CVS provided a reasonably
`easy method to retrieve older source or detailed logs
`of changes made to HTML dating back to the crea-
`tion of the content tree.
`One drawback of CVS is that many of the folks
`who design our content found it difficult to use and
`understand due to a lack of experience with UNIX.
`A cross-platform GUI-based tool would be especially
`well-suited to this market niche.
`
`Host type
`
`Load
`Fraction
`
`Hits
`
`Redirects
`
`Server
`Errors
`
`#
`unique
`URL’s
`
`Unique
`hosts
`
`KB
`trans-
`ferred
`
`SGI 150Mhz
`R4400 Indy
`SGI 175 Mhz
`R4400 Challenge S
`SGI 175 Mhz
`R4400 Challenge S
`BSD/OS 90 Mhz
`Pentium
`
`1/16
`
`1548859
`
`68962
`
`7253
`
`4558
`
`120712
`
`12487021
`
`6/16
`
`2306930
`
`47154
`
`5/16
`
`2059499
`
`43441
`
`4/16
`
`1571804
`
`31919
`
`23
`
`43
`
`23
`
`2722
`
`249791
`
`11574007
`
`2681
`
`225055
`
`10626111
`
`2351
`
`192726
`
`7936917
`
`Figure 2: WWW Server activity for the period between 25/Jul/1995:23:58:04 and 26/Jul/1995:23:58:59
`
`Host type
`
`Load
`Fraction
`
`Bytes/Hits
`
`Bytes/Hits
`
`Bytes/Hits
`
`Bytes/Hits
`
`Bytes/Hits
`
`Host type SGI 150 Mhz
`R4400 Indy
`Host type SGI 175 Mhz
`R4400 Challenge S
`Host type SGI 175 Mhz
`R4400 Challenge S
`Host type BSD/OS 90
`Mhz Pentium
`
`1/16
`
`430600/82
`
`345132/61
`
`227618/61
`
`224849/60
`
`236678/59
`
`6/16
`
`613621/128
`
`646112/119
`
`656412/110
`
`545699/108
`
`520256/107
`
`5/16
`
`466244/93
`
`430870/88
`
`358186/84
`
`375964/84
`
`531421/82
`
`4/16
`
`375696/81
`
`256143/77
`
`417655/76
`
`394878/72
`
`298561/70
`
`Figure 3: Five busiest minutes for the tested hosts
`
`96
`
`1995 LISA IX – September 17-22, 1995 – Monterey, CA
`
`Petitioner Microsoft Corporation - Ex. 1073, p. 3
`
`

`
`Mosedale, Foss, & McCool
`
`Administering Very High Volume Internet Services
`
`Content Push
`Once we had multiple machines serving our
`WWW content, it became necessary to come up with
`a reasonable mechanism for getting copies of our
`master content tree to all of the servers outside our
`firewall. NCSA distributes their documents among
`server machines by keeping their content tree on the
`AFS distributed filesystem[3].
`It seemed to us that another natural solution to
`this problem was rdist[7], a program specifically
`designed for keeping trees of files in sync with a
`master copy. However, we felt we couldn’t use this
`unmodified, as its security depended entirely on a
`.rhosts file, which is a notoriously thin layer of pro-
`tection. With the help of some other developers, we
`worked on incorporating SSL[8] into rdist in order to
`provide for encryption as well as better authentica-
`tion of both ends. With SSL, we no longer need to
`rely on a client’s IP address for its identity; crypto-
`graphic
`certificates
`provide
`that
`authentication
`instead.
`In the development of our SSLified rdist, we
`decided that it would be a good idea to use the latest
`rdist from USC, in part because it has the option of
`using rsh (1) for its transport rather than rcmd (3).
`Because it doesn’t use rcmd (3), it no longer needs to
`be setuid root, which is a real security win. One
`side effect of this is that we now have an SSLified
`version of rsh (1), which we use to copy log files
`from our servers back to our internal nets.
`Monitoring
`During the course of our server growth, we
`wrote and/or borrowed a number of tools for moni-
`toring our web servers. These include a couple of
`tools to check response time, a log analyzer, and a
`program to page us if one of the servers goes down.
`The tool to check response time is designed to
`be run from a machine external to the server being
`monitored. Every so often, it wakes up and sends a
`request to an HTTP server for a typical document,
`such as the home page.
`It measures the amount of
`time that it took from start to finish; that is, from
`before it calls connect() to after it gets a zero from
`read() indicating that the server has closed the con-
`nection.
`If you choose a relatively small document,
`this time can give you a good general indication of
`how long people are unnecessarily waiting for docu-
`ments (since under ideal conditions a small docu-
`ment should come back nearly instantaneously).
`In
`our typical monitoring setup, we run monitor pro-
`grams from remote, well-connected sites as well as
`from locally-networked machines. This allows us to
`see when problems are a result of network conges-
`tion as opposed to lossage on the server machines.
`The logfile analyzer, which is now a standard
`part of the Netscape Communications and Commerce
`server products, provides information about the busi-
`est hours or minutes of the day, and about how
`
`much data transfer client document caching has
`saved our site. The analyzer can be very helpful in
`determining which hours are peak hours and will
`require the most attention. Because of
`the high
`volume of traffic at our site, we designed it to pro-
`cess large log files quickly.
`The program to page us when a server becomes
`unreachable is similar to our response time program.
`The difference is that when it finds that a server
`does not respond within a reasonable time frame for
`three consecutive tries, it sends an e-mail message to
`us along with a message to our alphanumeric-pager
`gateway to make sure we know that a server needs
`attention.
`At times when we knew that we wouldn’t be
`able to come in and reboot the server, we used a
`UNIX box with an RS232-controlled on/off switch to
`automatically hard boot any system not responding
`to three sequential GET requests. A small PC with
`two serial ports is enough to individually monitor 10
`systems and provides recovery for most non- fatal
`system errors
`(e.g., most problems other
`than
`hardware failure or logfile partitions filling up).
`
`Performance
`Previous works [9, 10, 11] have explored HTTP
`performance and have come to the conclusion that
`HTTP in its current form and TCP are particularly
`ill-suited to one another when it comes to perfor-
`mance. The authors of some of these articles have
`suggested a number of ways to improve the situation
`via protocol changes. For the present, however, we
`are more interested in making do with what we
`have.
`
`More practically speaking, most TCP stacks
`have never been abused in quite this way before, so
`it’s not too surprising that they don’t deal well with
`this level of load. The standard UNIX model of
`forking a new server each time a connection opens
`doesn’t
`scale particularly well
`either,
`and the
`Netscape Server uses a process-pool model for just
`this reason.
`Kernels
`The problems that took the most time for us to
`solve involved the UNIX kernel. Not having sources
`for most platforms makes it something of a black
`box. We hope that sharing some hard-won insights
`in this area will prove especially useful
`to the
`reader.
`TCP Tuning
`There are several kernel parameters which one
`can tweak that will often improve the performance
`of a web or FTP server significantly.
`The size of the listen queue corresponds to the
`maximum number of connections pending in the ker-
`nel. A connection is considered pending when it has
`not been fully established, or when it has been
`
`1995 LISA IX – September 17-22, 1995 – Monterey, CA
`
`97
`
`Petitioner Microsoft Corporation - Ex. 1073, p. 4
`
`

`
`Administering Very High Volume Internet Services
`
`Mosedale, Foss, & McCool
`
`established and is waiting for a process to do an
`accept (2). If the queue size is too small, clients will
`sometimes see ‘‘connection refused’’ or ‘‘connection
`timed out’’ messages.
`If it is too big, results are
`sporadic: some machines seem to function nicely,
`while others of similar or identical configuration
`become hopelessly bogged down. You will need to
`experiment
`to find the right size for your listen
`queue. Version 1.1 of the Netscape Communications
`and Commerce Servers will never request a listen
`queue larger than 128.
`In kernels that have BSD-based TCP stacks, the
`size of
`the listen queue is controlled by the
`SOMAXCONN parameter. Historically,
`this has
`been a #define in the kernel, so if you don’t have
`access to your OS source code, you will probably
`need to get a vendor patch which will allow you to
`tune
`it.
`In Solaris,
`this parameter
`is
`called
`tcp_conn_req_max and can be read and written using
`ndd (1M) on /dev/tcp. Sun has chosen to limit the
`size to which one can raise tcp_conn_req_max using
`ndd to 32. Contact Sun to find out how to raise this
`limit further.
`the kernel on a server machine
`Additionally,
`needs to have enough memory to buffer all the data
`that it is sending out.
`In variants of UNIX that use
`a BSD-based TCP stack,
`these buffers are called
`mbufs. The default number of mbufs in most ker-
`nels is way too small for TCP traffic of this nature;
`reconfiguration is usually required. We have found
`that
`trial and error is required to find the right
`number:
`if
`’netstat
`-m’ shows that
`requests for
`memory are being denied, you probably need more
`mbufs. Under IRIX, the parameter you will need to
`raise
`is
`called
`nm_clusters
`and
`it
`lives
`in
`/var/sysgen/master.d/bsd.
`TCP employs a mechanism called keepalive
`that is designed to make sure that when one host of
`a TCP connection loses contact with its peer host,
`and either host is waiting for data from its peer, the
`waiting system does not wait indefinitely for data to
`arrive. Under the sockets interface, if the socket the
`system is waiting for
`is configured to have the
`SO_KEEPALIVE option turned on, the system will
`send a keepalive packet to the remote system after it
`has been waiting for a certain period of time. It will
`continue sending a packet periodically, and will give
`up and close the connection if the system does not
`respond after a certain number of tries.
`Many systems provide a mechanism for chang-
`ing the interval between TCP keepalive probes. Typ-
`ically, the period of time before a system will send a
`keepalive packet is measured in hours. This is to
`make sure that
`the system does not send large
`numbers of keepalive packets to hosts which, for
`example, have idle telnet sessions that simply don’t
`have data to send for long periods of time.
`
`With a web server, an hour is an awfully long
`If a browser does not send information the
`time.
`server is waiting for within a few minutes,
`it
`is
`likely that the remote machine has become unreach-
`able. In the past, router failures were the typical
`cause of hosts becoming unreachable.
`In today’s
`Internet, that problem still exists, while at the same
`time an increasingly large number of users are using
`a modem with SLIP or PPP as their connection to
`the Internet. Our experience has shown that
`these
`types of connections are unstable, and cause the
`most situations where a host suddenly becomes
`silent and unreachable. Most HTTP servers have a
`timeout built in so that if they have waited for data
`from a client for a few minutes, they will forcibly
`close that connection. The situations where a server
`is not actively waiting for data are the ones most
`important for keepalive.
`If ‘‘netstat -an’’ shows many idle sockets in the
`kernel or idle HTTP servers waiting for them, you
`should first check whether your server software sets
`the SO_KEEPALIVE socket option. The Netscape
`Communications and Commerce servers do so. The
`second thing you should check is whether your sys-
`tem allows you to change the interval between
`keepalive probes. Many systems such as IRIX and
`Solaris
`provide mechanisms
`for
`changing
`the
`keepalive interval to minutes instead of hours. Most
`systems we’ve encountered have a default of two
`hours; we typically truncate it
`to 15 minutes.
`If
`your system is not a dedicated web server system,
`you should consider keeping the value relatively
`high so idle telnet sessions don’t cause unnecessary
`network traffic. The third thing you should check
`with your vendor is whether their TCP implementa-
`tion allows sockets to time out during the final
`stages of a TCP close. Certain versions of the BSD
`TCP code on which many of today’s systems are
`based do not use keepalive timeouts during close
`situations. This means that a connection to a system
`that becomes unreachable before it has fully ack-
`nowledged the close can stay in your machine’s ker-
`nel
`indefinitely.
`If you see this situation, contact
`your vendor for a patch.
`Your Vendor Is Your Friend; Or, The Value of the
`Patch
`In almost every case, we have gotten quite a bit
`of value from working directly with the vendor.
`Since very high volume TCP service of the nature
`we describe is a fairly new phenomenon, OS vendors
`are only beginning to adapt their kernels for this.
`There are patches to allow the system adminis-
`trator to increase the listen queue size for IRIX 5.2
`and 5.3, as well as enabling the TCP keepalive timer
`while a socket
`is in closing states. These patches
`also fix other problems including a few related to
`multiprocessor correctness and performance. Con-
`tact SGI for the current patch numbers; if you are
`using a WebFORCE system you should already have
`
`98
`
`1995 LISA IX – September 17-22, 1995 – Monterey, CA
`
`Petitioner Microsoft Corporation - Ex. 1073, p. 5
`
`

`
`Mosedale, Foss, & McCool
`
`Administering Very High Volume Internet Services
`
`them. With these patches, most parameters the
`administrator will
`need
`to
`edit
`are
`in
`/var
`/sysgen/master.d/bsd.
`If you are using Solaris 2.3, you will definitely
`to get
`the most recent release of the kernel
`want
`jumbo patch, number 101318.
`In general, we’ve
`found Solaris 2.4 able to handle much more traffic
`than even a 2.3 system with scalability patches
`installed.
`If upgrading to 2.4 is an option, we highly
`recommend it especially when your traffic starts to
`reach the range of multiple hundreds of thousands of
`hits per day. The 2.4 jumbo patch number 101945 is
`also recommended, both for security as well as sta-
`bility reasons.
`Logging
`Generally, the less information that you need to
`log, the better performance you will get. We found
`that by turning off logging entirely, we typically
`realized a performance gain of about 20%.
`In the
`future, many servers will offer alternative log file
`formats to the current ‘‘common log format,’’ which
`will provide better performance as well as record
`only the information most
`important
`to the site
`administrator.
`the ability to perform
`Many servers offer
`reverse DNS lookups on the IP addresses of the
`clients that access your server. While it is very use-
`ful information, having your server do it at run-time
`tends to be a performance problem. Since many
`DNS lookups either timeout or are extremely slow,
`the server then generates extra traffic on the local
`network,
`and devotes
`some
`(often non-trivial)
`amount of networking resources to waiting for DNS
`response packets.
`For high-volume logging, syslogd also causes
`performance problems; we suggest avoiding it.
`If
`one is logging 10 connections per second, and each
`connection causes two pieces of data to be logged
`(as it does for us),
`this could mean up to 20
`context-switches into syslogd and 20 out of it per
`second. This overhead is
`in addition to any
`logging-related I/O and all processing related to
`actual content service.
`On our site, the logs are rotated once every 24
`hours and compressed into a staging directory. A
`separate UNIX machine inside of our firewall uses
`an SSLified rsh (1) to bring the individual logs to a 4
`gigabyte partition where the logs are uncompressed,
`concatenated and piped to our analysis software.
`Reverse DNS lookups are done at this point rather
`than at run time on the server, which allows us to do
`only one reverse lookup per IP address that con-
`nected during that day. Processing and lookups on
`the logs from all of the machines on our site takes
`approximately an hour to complete.
`A single compressed log file is approximately
`70 megabytes, and consequently, we end up with
`over 250 MB of log files daily. Our method of log
`
`manipulation allows for an automated system of
`backing up and processing a months worth of data
`with little or no human intervention. Tape backups
`are generated onto 8mm tape once monthly.
`Overall analysis of the log files shows con-
`sistent data supporting the following:
`a) Peak loads occur between 12 and 3 o’clock
`PM, PST.
`Peak
`connection
`rates were
`between 120-140 connections per second, per
`machine. A second peak of roughly half the
`amplitude occurs between 5 and 6 o’clock
`PM, PST.
`load day of the
`b) Wednesday is the highest
`week, generating more than both weekend
`days combined.
`Equipment
`Networks
`We found that one UNIX machine doing high-
`volume content service was about all that an Ether-
`net could handle. Putting two such machines on a
`single Ethernet caused the performance of both
`machines
`to degrade badly as
`the net became
`saturated with traffic and collisions. More analysis
`of our network data is still needed. We are finding
`it to be more cost effective to have one Ethernet per
`host
`than to purchase FDDI equipment for all of
`them.
`As an aside, we have found SGI’s addition of
`the ‘‘-C’’
`switch to netstat (1)
`in IRIX to be
`extremely useful.
`It displays the data collected by
`netstat
`in a full-screen format which is updated
`dynamically.
`Memory
`This is fairly simple: get lots of it. You want
`to have enough memory both for buffering network
`data and for your filesystem cache to keep most of
`the frequently accessed files
`that
`it
`serves
`in
`memory. The filesystem read cache hit-rate percen-
`tage on our web servers is almost always 90% or
`above. Most modern UNIXes automatically pick a
`reasonable size for the buffer cache, but some may
`require manual
`tuning. Many OS vendors include
`useful tools for monitoring your cache hit rates: Sys-
`tem V derivatives have sar; we also found HP/UX’s
`monitor (1M) and IRIX’s osview (1) helpful.
`Our Typical Configuration
`A typical webserver at our site is a workstation
`class machine (e.g., Sun SPARC 20, SGI Indy, or
`Pentium P90)
`running between
`128 and 150
`processes. For UNIX machines at
`least, we have
`found 128 megabytes of memory to be about
`the
`most
`that our machines can use. With this much
`memory, we have all the network buffer space we
`need, we get a high filesystem read-cache hit rate,
`and usually have a few (or even tens of) megabytes
`to spare (depending on the UNIX version).
`
`1995 LISA IX – September 17-22, 1995 – Monterey, CA
`
`99
`
`Petitioner Microsoft Corporation - Ex. 1073, p. 6
`
`

`
`Administering Very High Volume Internet Services
`
`Mosedale, Foss, & McCool
`
`Generally one gigabyte or more of disk is
`necessary. These days, each of our servers generates
`over 200 megs of log data per day (before compres-
`sion), and the amount of HTML content we are
`housing continues to grow. Data from sar and ker-
`nel profiling code suggest that our boxes are spend-
`ing between 5% and 20% of their time waiting for
`disk I/O. Given the high read-cache hit-rate, we
`expect that by moving our log files onto a separate
`fast/wide SCSI drive and experimenting with filesys-
`tem parameters, this percentage will decrease fairly
`significantly.
`
`Miscellaneous Points
`In this section, we will discuss a few random
`things
`that we have learned during our
`tenure
`managing web servers.
`A Bit About Security
`In addition to the normal security concerns of
`sites on the Internet[12], web servers have some
`unique security ‘‘opportunities’’. One of the most
`notable is CGI programs[13]. These allow the
`author to add all sorts of interesting functionality to
`a web site. Unfortunately, they can also be a real
`security problem: since they generally take data
`entered by a web user as their input, they need to be
`very careful about what such data is used for.
`If a CGI script
`takes an email address and
`hands it off to sendmail on the command line, the
`script needs to go through and make sure that no
`unescaped shell characters are given to the shell that
`might cause it to do something unexpected. Such
`unexpected interplay between different programs is a
`common cause of security violations. Since many
`users who want to provide programmatic functional-
`ity on their web pages are not intimately familiar
`with the ins and outs of UNIX security, one
`approach to this problem is to simply forbid CGI
`programs in users’ personal web pages.
`However, we suggest an alternative: mandate
`use of taintperl[14] for CGI programs written by
`users. Perl is already one of the predominant script-
`ing languages used to write CGI programs;
`it
`is
`extremely powerful for manipulating data of all sorts
`and producing HTML output.
`taintperl is a version
`of perl which keeps track of the source of the data
`that
`it uses. Any data input by the user is con-
`sidered by the taintperl interpreter to be tainted, and
`can’t be used for dangerous operations unless expli-
`citly untainted. This means that such scripts can be
`reasonably easily audited by a security officer by
`grepping for
`the untaint command and carefully
`analyzing the variables on which it is used.
`Web Server Heterogeneity
`An interesting feature of our site is that it is an
`testbed for ports of our server to new plat-
`ideal
`forms. Since it was clear to us from the beginning
`that we would be using it this way, we needed to
`
`think about how we would deal with CGI programs
`and their environment. After some thought, we
`decided that it just wasn’t practical to try to port and
`test all of our CGI content (and force our users to do
`the same for their home pages) to every new plat-
`form that we wanted to test.
`It turned out that we
`were very fortunate to have considered this early, as
`we eventually ended up testing our Windows NT
`port on our web site.
`We designated a single alias to a machine that
`would run all of our CGI programs. We then
`created the guideline that all HTML pages should
`simply point all CGI script references to this alias.
`A nice side effect is that this type of content is parti-
`tioned to its own machine which can be specifically
`tailored to CGI service.
`If the machine pool con-
`tains many incompatible machines, this setup avoids
`having to maintain binaries for CGI programs com-
`piled for each machine. An average day at our site
`shows 50,000 (out of 7.5 million) requests for CGI
`scripts (note that this ratio is almost certainly very
`dependent on the type of content served).
`An additional experiment would be to partition
`other types of content (e.g., graphics) to their own
`machines in the same way.
`If necessary, DNS-based
`load-balancing could be used to spread out
`load
`across multiple machines of the same type (e.g.,
`gif.netscape.com could be used to refer to multiple
`machines whose only purpose in life is to serve GIF
`files).
`FTP vs. HTTP
`Both FTP and HTTP offer a easy way to han-
`dle file transfer, each with relative strengths.
`FTP provides a ‘‘busy signal’’, that is, feedback
`to the user indicating that a site is currently process-
`ing too many transactions. That limit is easily set
`by the site administrator. HTTP provides a mechan-
`ism for this as well, however it is not implemented
`by many HTTP servers primarily due to the fact that
`it can be very confusing for users. When a user con-
`nects to an FTP site,
`they are allowed to transfer
`every document they need in that session. Due to
`HTTP’s stateless nature and the fact that it uses a
`new connection for each file transfer, a user can
`easily get an HTML document and then be refused
`service when asking for
`the document’s inlined
`images. This makes for a very confusing user experi-
`ence. It is hoped that future work in HTTP develop-
`ment will help to alleviate this problem. Further
`work in URL or URN arenas will hopefully provide
`more formal mechanisms for defining alternative dis-
`tribution machines.
`In a system planning sense, FTP should be con-
`sidered to be a separate service, and therefore can be
`cleanly served from a completely different computer.
`This offers easier log analysis of file delivery vs.
`html served, and also aids in security efforts. A sys-
`tem running only one service, correctly configured,
`
`100
`
`1995 LISA IX – September 17-22, 1995 – Monterey, CA
`
`Petitioner Microsoft Corporation - Ex. 1073, p. 7
`
`

`
`Mosedale, Foss, & McCool
`
`Administering Very High Volume Internet Services
`
`is less likely to be breached, and if breached, does
`not mean loss of security for our entire site.
`Performance gains are also likely. Content typ-
`ically served by FTP is composed of large files,
`compared to HTTP-served data which is typically
`designed to be small and quickly accessible by
`modem users. A document and its inlined images are
`short enough to be delivered in short periods. Mix-
`ing the two different types can make it hard to pin
`down system bottlenecks, especially if FTP and
`HTTP are being served from a single machine.
`Many times the two services will compete for the
`same resources, making it hard to track down prob-
`lems in both areas.
`While running both FTP and HTTP servers on
`one machine, we found that 128 HTTP daemon
`processes and an imposed limit of 50 simultaneous
`FTP connections was about all a workstation-class
`system would tolerate. Further growth beyond that
`would cause each service to be periodically denied
`network resources. Once separated, however, 250
`simultaneous FTP connections on a workstation-class
`machine was handled easily.
`HTTP service, on the other hand, offers a
`method to gather useful information from the reques-
`tor via forms before allowing file transfer to take
`place. This became a necessity at Netscape, as the
`encryption technology in our software required a cer-
`tain amount of legal documentation to be agreed to
`prior to download.
`
`Conclusion and Future Directions
`Although sites such as ours are currently the
`exception, we expect that they will soon become the
`rule as the Internet continues its exceedingly rapid
`growth. Additionally, we expect content to become
`vastly more dynamic in the future, both on the front
`end (using mechanisms such as server push[14] and
`Java[15]) and the backend (where using SQL data-
`bases and search engines will become even more
`common). This promises to provide many new chal-
`lenges, especially in the area of performance meas-
`urement and management.
`We hope that the techniques and information in
`this paper will prove helpful to folks who wish to
`administer sites providing a very high volume of ser-
`vice.
`
`Acknowledgements
`for his
`thanks
`to Brendan Eich
`Special
`toolsmithing and his willingness to get his hands
`dirty with the kernel, to Jeff Weinstein for SSLrsh
`and SSLrdist, to Rod Beckwith for general network
`magic, and to Gene Tran for help with performance
`measurement and the SGI kernel profiler. Thanks to
`Bill Earl, Mukesh Kacker, Mike Karels and Vernon
`Schryver for TCP bug-fixes and general advice. We
`
`also appreciate the input from the folks who took the
`time to read drafts of our paper.
`
`the
`the
`
`About the Authors
`The authors have been responsible for
`design,
`implementation,
`and babysitting of
`Netscape web site since its inception.
`Dan Mosedale <dmose@netscape.com> is Lead
`UNIX Administrator at Netscape. He has been
`managing UN

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket