throbber
The following paper was originally presented at the
`Ninth System Administration Conference LISA 95
`Monterey California September 18-22 1995
`
`Administering Very High Volume Internet Services
`
`Dan Mosedale William Foss and Rob McCool
`Netscape Communications
`
`For more information about USENIX Association contact
`
`Phone
`FAX
`Email
`4.WWWURL
`
`510 528-8649
`
`510 548-5738
`
`office@usenix.org
`http //www.usenix
`
`org
`
`Petitioner IBM – Ex. 1073, p. 1
`
`

`
`Administering Very High
`Volume Internet Services
`
`Dan
`
`osedal
`
`Iii am Foss and Rob McCooI
`
`Netscape Communications
`
`small scale
`
`ABSTRACT
`Providing WWW or FTP service on
`well-solved problem
`is already
`Scaling this to work at
`per day however
`site that accepts millions of connections
`can
`In this paper we give
`easily push multiple machines and networks
`edge
`to the bleeding
`out
`concrete configuration techniques
`that have helped us get
`the best possible performance
`is mostly centered on WWW service but much of
`resources Our analysis
`of sewer
`the
`to FTP service Additionally we discuss some of the tools
`information applies equally well
`that we use for thy-to-day management
`We dont have
`how much each configuration
`statistics about exactly
`lot of specific
`through the watch the
`helped us Rather this paper
`represents our many iterations
`change
`The intent
`see various failures fix whats broken loop
`load increase
`is to help the reader
`from the start and then to supply ideas
`high-performance manageable
`sewer
`to look for when it becomes overloaded
`
`configure
`about what
`
`debated how we would spread the load across multi
`ple machines when that became necessary
`Because
`of problems reported using DNS round-robin tech
`niques3 we chose to instead implement
`randomi
`zation scheme
`of
`the Netscape Navigator
`inside
`In short when
`copy of
`the Navigator
`itself
`accessing home.mcom.com or home.netscape.com
`periodically queries the DNS for
`hostname of the
`form homeX.netscape.com
`random
`where
`number between
`and 16 Each of our web sewers
`the homeX aliases pointing to it
`number of
`be
`Since this strategy
`is not
`something
`to most sites we wont spend more time
`available
`
`is
`
`it
`
`is
`
`that will
`
`has
`
`Our Site
`
`we
`what
`Communications
`runs
`Netscape
`believe to be one of the highest volume web services
`on the Internet Our machines currently take
`total
`of between six and eight million HTTP hits per day
`and this number continues to grow Furthermore we
`make
`our web browser
`the Netscape Navigator
`via FTP and HTTP
`available for downloading
`
`The web site 2I contains online documentation
`sales and marketing
`for the Netscape Navigator
`line many gen
`information about our entire product
`directory ser
`interest pages including various
`eral
`vices as well as home pages for Netscape employ
`the machines
`ees All of
`run the Netscape Server
`but most of the strategies in this paper should apply
`to other HTTP and even FTP servers also
`
`times we have
`At various
`tried out various
`of machines running the given operat
`configurations
`shows
`ing systems Figure
`list
`
`Each of our WWW servers has an identical
`to it more on this later
`content
`tree uploaded
`Before the Netscape Navigator was released to the
`time we thought
`the first
`community for
`the web pages we intended
`and
`to serve
`
`Internet
`
`about
`
`on it here
`
`Another scheme which we have
`nameserver-based
`load
`
`This
`
`looked into is
`scheme
`balancing
`nameserver periodically polling each
`depends upon
`content server at site to find out how loaded they are
`though
`less functional version could simply use
`static weighting The nameserver
`then resolves the
`domain name to the IP address of the
`server which
`the added
`load
`This
`has
`currently has
`the
`least
`machine that
`benefit of not sending requests to
`overloaded or unavailable effectively creating
`mans failover system It
`can
`however
`
`is
`
`poor
`leave
`
`20
`
`Sun uniprocessor 60 Mhz SPARCserver
`SGI 2-processor Challenge-L
`Indy 150 Mhz
`SGI
`SGI Challenge-S 175 Mhz
`HP 9000/8 16
`90 Mhz Pentium PC
`90 Mhz Pentium PC
`
`Solaris 2.3
`2.4
`IRIX 5.2
`IRIX 5.2
`
`5.3
`
`IffiX 5.3
`HP-UX 9.04
`Windows NT 3.5
`BSD/OS 2.0
`
`Figure
`
`Tested hardware/software
`
`configurations
`
`1995 LISA IX September 17-22 1995 Monterey CA
`
`95
`
`Petitioner IBM – Ex. 1073, p. 2
`
`

`
`Administering Very High Volume Internet Services
`
`Mosedale Foss
`
`McCool
`
`dura
`dead machine in the server pooi
`for as long
`tion as the DNS TTL
`Two DNS load balancing
`schemes that we plan to investigate further are RFC
`17944J and lbnamed5J
`
`and
`
`Figures
`mance statistics
`
`summarize some of the perfor
`servers on July 26
`for the various
`load Our
`1995 along with their assigned relative
`setup allows us to give each machine
`fraction of
`to our site measured in sixteenths of
`the total
`traffic
`
`the total
`
`Note
`
`1/16
`
`of
`
`the
`
`In addition to assuming
`WWW load the
`150
`JVIHz
`Indy also wears
`the
`home .netscape .com and
`aliases www.netscape.com
`currently runs all CGI processes for the entire site
`Total CGI load for this day accounted for 49887 of
`its 1548859 HTTP accesses more on this later
`
`managing web servers and content We will avoid
`discussing HTML authoring
`and utilities
`programs
`those who
`http//home
`for
`see
`interested
`netscape com/home/how-to-create
`-web-services html
`for pointers to many such tools
`Document Control CVS
`
`are
`
`Since our content comes from several different
`company we chose
`sources within
`use
`the
`to
`CVS6J
`document
`tree
`to manage the
`worked moderately well but
`for our environment
`
`This
`
`has
`
`is not an ideal solution
`
`In some some ways the creation of content
`mid-to-large programming environment
`resembles
`revision system became necessary
`document
`to
`govern the creation of our web site content as multi
`ple contributing editors added and deleted material
`tree CVS provided
`from the content
`reasonably
`easy method to retrieve older source or detailed logs
`of changes made to HTJVIL
`dating back to the crea
`tion of the content
`tree
`One drawback of CVS is that many of the folks
`who design our content
`found it difficult
`to use and
`lack of experience with UNIX
`understand
`due to
`cross-plafform GUI-based tool would be especially
`well-suited to this market niche
`
`The Tools
`
`The traffic
`
`size of our web site
`and content
`and we collected
`number of
`grew veiy quickly
`tools to help us manage the machines and content
`We used
`number of existing programs some of
`which we enhanced and developed
`few internally
`as well Well
`go over many of
`these here the
`is to focus on tools that are directly useful
`in
`
`intent
`
`Host
`
`type
`
`Load
`
`Fraction
`
`Hits
`
`Redirects
`
`Server
`
`Errors
`
`umque
`URLs
`
`Unique
`
`hosts
`
`KB
`trans
`
`ferred
`
`SGI 150Mhz
`R4400 Indy
`
`SGI 175 Mhz
`R4400 Challenge
`
`1/16
`
`1548859
`
`68962
`
`7253
`
`4558
`
`120712
`
`12487021
`
`6/16
`
`2306930
`
`47154
`
`5/16
`
`23
`
`43
`
`2722
`
`249791
`
`11574007
`
`10626111
`
`SGI 175 Mhz
`R4400 Challenge
`
`BSD/OS 90 Mhz
`Pentium
`
`2059499
`
`43441
`
`2681
`
`225055
`
`4/16
`
`1571804
`
`31919
`
`23
`
`2351
`
`192726
`
`7936917
`
`Figure
`
`WWW Server activity for the period between 25/Jul/1995235804
`
`and 26/Jul/1995235859
`
`Host
`
`type
`
`Load
`
`Fraction
`
`Bytes/Hits
`
`Bytes/Hits
`
`Bytes/Hits
`
`Bytes/Hits
`
`Bytes/Hits
`
`type SGI 150 Mhz
`Host
`R4400 Indy
`type SGI 175 Mhz
`Host
`R4400 Challenge
`type SGI 175 Mhz
`Host
`R4400 Challenge
`
`SD/OS 90
`Host
`type
`Mhz Pentium
`
`1/16
`
`430600/82
`
`345132/61
`
`227618/61
`
`224849/60
`
`236678/59
`
`6/16
`
`613621/128
`
`646112/119
`
`656412/1 10
`
`545699/108
`
`520256/107
`
`5/16
`
`466244/93
`
`430870/88
`
`358186/84
`
`375964/84
`
`53 1421/82
`
`4/16
`
`375696/81
`
`256143/77
`
`417655/76
`
`394878/72
`
`298561/70
`
`Figure
`
`Five busiest minutes for the tested hosts
`
`96
`
`1995 LISA IX September 17-22 1995 Monterey CA
`
`Petitioner IBM – Ex. 1073, p. 3
`
`

`
`Mosedale Foss
`
`McCool
`
`Administering Very High Volnme Internet Services
`
`Content Pnsh
`
`Once we had multiple machines
`sewing our
`WWW content
`it became necessary to come up with
`reasonable mechanism for getting
`copies of our
`tree to all of the servers outside our
`master content
`firewall NCSA distributes their documents
`among
`sewer machines by keeping their content
`tree on the
`AFS distributed filesystem3
`It seemed to us that another natural solution to
`this problem was rdist
`program specifically
`in sync with
`trees of
`designed for keeping
`files
`master copy However we felt we couldnt use this
`unmodified as its security
`depended entirely on
`.rhosts file which is
`notoriously thin layer of pro
`tection With the help of some other developers we
`worked on incorporating SSL8I
`in order to
`into rdist
`provide for encryption as well as better authentica
`tion of both ends With SSL we no longer need to
`rely on
`clients IP address for its identity crypto
`authentication
`provide
`certificates
`that
`
`graphic
`
`instead
`
`of our SSLified rdist we
`In the development
`it would be
`good idea to use the latest
`decided that
`rdist from USC in part because
`it has the option of
`using rsh
`for its transport
`rather
`than rcrrd
`Because it doesnt use rcrrd
`it no longer needs to
`real security win One
`setuid root which is
`be
`side effect of this is that we now have an SSLified
`which we use to copy
`version of
`rsh
`log files
`from our sewers back to our internal nets
`
`Monitoring
`During the course of our sewer growth we
`number of tools for moth
`wrote and/or borrowed
`toring our web sewers
`These
`include
`couple of
`log analyzer and
`tools to check
`response time
`program to page us if one of the sewers goes down
`The tool
`to check
`response time is designed to
`be run from
`machine external
`to the sewer being
`monitored
`Every so often it wakes up and sends
`to an HTTP sewer
`typical document
`request
`for
`such as the home page
`It measures the amount of
`took from start
`is from
`time that
`to finish that
`zero from
`before it calls connect
`to after
`it gets
`the sewer has closed the con
`read indicating that
`relatively small document
`nection If you choose
`good general
`this time can give you
`indication of
`how long people are unnecessarily waiting for docu
`small docu
`conditions
`ments since under
`ideal
`ment should come back nearly instantaneously
`In
`our typical monitoring setup we run monitor pro
`grams from remote well-connected
`sites as well as
`from locally-networked machines This allows us to
`see when problems are
`result of network conges
`tion as opposed to lossage on the sewer machines
`
`it
`
`The logfile analyzer which is now
`standard
`part of the Netscape Communications and Commerce
`sewer products provides information about
`the busi
`the day and about how
`est hours or minutes of
`
`has
`
`much
`thta transfer client document
`caching
`saved our site The analyzer can be very helpful
`in
`determining which hours are peak hours and will
`the most attention Because of
`require
`the high
`at our site we designed it
`volume of traffic
`to pro
`cess large log files quickly
`The program to page us when
`sewer becomes
`unreachable
`is similar to our response time program
`is that when it
`The difference
`sewer
`finds
`that
`reasonable time frame for
`does not respond within
`three consecutive tries it sends an e-mail message to
`us along with
`message to our alphanumeric-pager
`to make sure we know that
`sewer needs
`gateway
`attention
`
`times when we knew that we wouldnt be
`At
`the sewer we used
`able to come in and reboot
`UNIX box with an R5232-controlled on/off switch to
`automatically hard boot any system not
`responding
`to three sequential GET requests
`small PC with
`two serial ports is enough to individually monitor 10
`for most non-
`systems and provides
`fatal
`recovery
`system errors e.g most
`problems other
`hardware failure or logfile partitions filling up
`
`than
`
`Performance
`
`10 11 have explored HTTP
`Previous works
`and have
`come to the conclusion
`that
`performance
`HTTP in its current
`form and TCP are particularly
`ill-suited to one another when it comes to perfor
`mance
`The authors of some of
`these articles
`have
`number of ways to improve the situation
`suggested
`For the present however we
`changes
`via protocol
`in making do with what we
`are more interested
`have
`
`speaking most TCP stacks
`More practically
`have never been abused in quite this way before so
`they dont deal well with
`its not
`too surprising that
`The standard UNIX model of
`load
`this level of
`new sewer each
`time
`connection
`forking
`opens
`doesnt
`and
`particularly well
`either
`the
`scale
`Netscape Sewer uses
`process-pool model for just
`
`this reason
`
`Kernels
`
`took the most time for us to
`The problems that
`solve involved the UNIX kernel Not having sources
`for most platforms makes it something of
`black
`box We hope that sharing some hard-won insights
`useful
`in this area will
`especially
`prove
`
`to the
`
`reader
`TCP Tuning
`There are several kernel parameters which one
`can tweak
`that will often improve the performance
`web or FTP sewer significantly
`of
`The size of the listen queue corresponds to the
`maximum number of connections pending in the ker
`connection is considered pending when it has
`nel
`or when it
`has been
`not been fully
`established
`
`1995 LISA IX September 17-22 1995 Monterey CA
`
`97
`
`Petitioner IBM – Ex. 1073, p. 4
`
`

`
`Administering Very High Volume Internet Services
`
`Mosedale Foss
`
`McCool
`
`established
`
`and is waiting for
`process to do
`an
`If the queue size is too small clients will
`accept
`see connection refused or connection
`sometimes
`timed out messages
`is too big results
`are
`if
`sporadic some machines
`seem to function nicely
`while others of similar or
`identical configuration
`become hopelessly bogged down You will need to
`listen
`to find the
`size for your
`experiment
`queue Version 1.1 of the Netscape Communications
`and Commerce
`Sewers will never
`request
`queue larger than 128
`
`it
`
`right
`
`listen
`
`In kernels that have
`
`is
`
`the
`
`SD-based TCP stacks the
`size of
`the
`controlled by
`listen queue
`SOMAXCONIN parameter Historically this has
`been
`define in the kernel
`so if you dont have
`to your OS source code you will probably
`access
`need to get
`vendor patch which will allow you to
`tune
`called
`In Solaris
`it
`this
`parameter
`tcp_conn_mqmax and can be read and written using
`ndd 1M on /dev/tcp
`Sun has chosen to limit
`the
`size to which one can raise tcp_conn_req_max
`using
`ndd to 32 Contact Sun to find out how to raise this
`further
`
`is
`
`limit
`
`needs
`
`that
`
`it
`
`server machine
`on
`kernel
`Additionally the
`to have enough memory to buffer all
`the data
`In variants of UNIX that use
`is sending out
`SD-based TCP stack
`these buffers are called
`mbufs
`The default number of mbufs in most ker
`nels is way too small for TCP traffic of this natum
`reconfiguration is usually mquired We
`have
`found
`and error
`to find the right
`required
`that
`trial
`-m shows
`number
`if netstat
`for
`requests
`memory are being denied you probably need more
`mbufs Under
`IRIX the pammeter you will need to
`nmclusters
`and
`called
`lives
`in
`
`is
`
`that
`
`raise
`
`is
`
`it
`
`/var/sysgenlmaster.d/bsd
`TCP employs
`mechanism called
`keepalive
`is designed to make sure that when one host of
`that
`TCP connection
`loses contact with its peer host
`and either host
`is waiting for thta from its peer the
`indefinitely for data to
`waiting system does not wait
`arrive Under
`the sockets
`the socket
`interface if
`the
`to have
`system is waiting for is
`configured
`SOKEEPALIVE option turned on the system will
`send
`to the remote system after it
`keepalive packet
`has been waiting for
`certain period of time It will
`and will give
`continue sending
`packet periodically
`up and close the connection
`system does not
`if
`the
`certain number of tries
`respond after
`
`the
`
`mechanism for chang
`Many systems provide
`ing the interval between TCP keepalive probes Typ
`ically the period of time before
`system will send
`is measured in hours
`This is to
`keepalive
`make
`the system does
`sure that
`large
`numbers of keepalive
`to hosts which for
`packets
`idle telnet sessions that simply dont
`example have
`have thta to send for long periods of time
`
`not
`
`send
`
`packet
`
`With
`time if
`sewer
`
`for
`
`close that connection
`
`web server an hour
`is an awfully long
`browser does not
`send
`information the
`few minutes it
`is waiting for within
`is
`the remote machine has become unreach
`likely that
`router failures were the typical
`able In the past
`of hosts becoming
`In todays
`cause
`unreachable
`the same
`that problem still exists while at
`Internet
`time an incmasingly large number of users are using
`modem with SLIP or PPP as their connection
`to
`has shown that
`the Internet Our experience
`these
`are unstable and cause
`types of connections
`the
`situations where
`becomes
`most
`host
`suddenly
`silent and unreachable Most HTTP sewers have
`timeout built
`they have waited for data
`if
`in so that
`few minutes they will
`from client
`forcibly
`The situations where
`sewer
`ones most
`
`thta are the
`
`is not actively waiting for
`important for keepalive
`-an shows many idle sockets
`If netstat
`kernel or idle HTTP sewers waiting for them you
`check whether your sewer software sets
`should first
`the SO_KEEPALIVE socket option The Netscape
`Communications and Commerce sewers do so The
`is whether your sys
`second thing you should check
`tem allows you to change
`between
`the
`interval
`keepalive probes Many systems such as IRIX and
`mechanisms
`provide
`changing
`Solaris
`for
`the
`to minutes instead of hours Most
`keepalive
`interval
`systems weve encountered
`two
`default of
`hours we typically truncate
`to 15 minutes
`if
`dedicated web sewer system
`your system is not
`consider
`you should
`the value
`keeping
`relatively
`high so idle telnet sessions dont cause unnecessary
`network
`The third thing you should check
`traffic
`their TCP implementa
`with your vendor
`is whether
`tion allows sockets
`to time out during the
`TCP close Certain versions of the BSD
`stages of
`TCP code
`on which many of
`todays
`systems are
`based do not use keepalive
`timeouts during close
`This means that
`situations
`connection to
`system
`has fully ack
`that becomes unwachable
`befom it
`the close can stay in your machines ker
`nowledged
`nel
`If you see this situation contact
`indefinitely
`your vendor
`patch
`Is Your Friend Or The Value of
`Your Vendor
`
`in the
`
`final
`
`the
`
`have
`
`it
`
`for
`
`Patch
`
`In almost eveiy case we have gotten quite
`bit
`of value from working directly with the vendor
`Since very high volume TCP service of
`the nature
`fairly new phenomenon OS vendors
`we describe is
`am only beginning to adapt their kernels for this
`There are patches to allow the system admiis
`size for IRIX 5.2
`trator to increase the listen queue
`and 5.3 as well as enabling the TCP keepalive timer
`while
`socket
`These
`in closing states
`patches
`few related to
`also fix other problems including
`and performance Con
`multiprocessor correctness
`tact SGI
`for the curmnt patch numbers if you are
`using WebFORCE system you should already have
`
`is
`
`98
`
`1995 LISA IX September 17-22 1995 Monterey CA
`
`Petitioner IBM – Ex. 1073, p. 5
`
`

`
`Mosedale Foss
`
`McCool
`
`Administering Very High Volume Internet Services
`
`backing
`
`backups
`
`manipulation allows for an automated
`system of
`months worth of data
`up and processing
`or no human intervention Tape
`with little
`are generated onto 8mm tape once monthly
`shows con
`
`of
`
`the
`
`Overall analysis
`log files
`sistent data supporting the following
`oclock
`Peak
`loads occur between
`12 and
`PM PST
`Peak
`rates were
`connection
`between 120-140 connections per second
`machine
`peak of roughly half
`second
`the
`amplitude occurs between
`and
`oclock
`PM PST
`load day of
`Wednesday
`the
`the highest
`week
`generating more than both weekend
`days combined
`
`them With these
`administrator will
`
`/sysgenlmaster.dlbsd
`
`patches most parameters
`need
`
`are
`
`in
`
`to
`
`edit
`
`the
`
`/var
`
`want
`
`If you are using Solaris 2.3 you will definitely
`the most
`recent
`release of
`the kernel
`to get
`In general weve
`number 101318
`jumbo patch
`found Solaris 2.4 able to handle much more traffic
`than even
`system with scalability patches
`2.3
`installed If upgrading to 2.4 is an option we highly
`it especially when your traffic
`recommend
`starts to
`reach the range of multiple hundreds of thousands of
`hits per day The 2.4 jumbo patch number 101945 is
`as well as sta
`also recommended both for security
`reasons
`
`bility
`
`Logging
`
`Generally the less information that you need to
`you will get We found
`log the better performance
`logging entirely we typically
`that by turning off
`gain of about 20% In the
`realized
`performance
`future many
`servers will offer alternative
`log file
`formats to the current common log format which
`record
`as well as
`will provide
`better performance
`information most
`only the
`administrator
`
`to the
`
`site
`
`important
`
`the
`
`Many
`to perfonu
`servers offer
`ability
`reverse DNS lookups
`on the
`IP addresses of
`the
`is very use
`clients that access your sewer While it
`information having your sewer do it at mn-time
`ful
`problem Since many
`tends to be
`performance
`DNS lookups either timeout or are extremely slow
`sewer
`on the
`then generates extra traffic
`and
`network
`some
`often
`devotes
`non-trivial
`amount of networking resources to waiting for DNS
`response packets
`
`the
`
`local
`
`it
`
`does
`
`For high-volume logging syslogd also causes
`problems we suggest avoiding
`performance
`it
`if
`per second and each
`one is logging 10 connections
`two pieces of data to be logged
`connection
`causes
`for us this could mean
`up to 20
`as
`into syslogd and 20 out of
`context-switches
`it per
`second
`This
`overhead
`in addition to
`logging-related I/O and all
`actual content service
`On our site the logs are rotated once every 24
`hours and compressed
`staging directory
`separate UNIX machine inside of our
`an SSLified rsh
`to bring the individual
`logs to
`gigabyte partition where the logs are uncompressed
`and piped
`concatenated
`software
`to our analysis
`Reverse DNS lookups
`are done at
`rather
`this point
`than at mn time on the sewer which allows us to do
`IP address that con
`only one
`reverse lookup
`per
`nected during that day Processing and lookups on
`the machines on our site takes
`the logs from all of
`approximately an hour to complete
`
`is
`
`any
`
`processing
`
`related
`
`to
`
`into
`
`firewall uses
`
`single compressed log file
`is approximately
`we end up with
`70 megabytes and consequently
`over 250 IVIB of log files daily Our method of
`log
`
`per
`
`is
`
`Equipment
`
`Ndworks
`We found that one UNIX machine doing high-
`that an Ether
`volume content service was about all
`net could handle Putting two such machines on
`caused
`of both
`single Ethernet
`the performance
`machines
`net became
`degrade badly
`the
`as
`and collisions More analysis
`saturated with traffic
`needed We are finding
`of our network
`data is still
`to be more cost effective to have one Ethernet per
`than to purchase FDDI equipment
`host
`for all of
`them
`
`to
`
`it
`
`found SGIs addition of
`As an aside we have
`-C switch
`in IRIX to
`ndstat
`be
`to
`thta collected by
`extremely useful
`It displays the
`full-screen format which is
`
`the
`
`netstat
`
`in
`
`updated
`
`dynamically
`Memary
`
`it
`
`it You want
`This is fairly simple get
`lots of
`to have enough memoiy both for buffering network
`to keep most of
`data and for your filesystem cache
`accessed
`frequently
`serves
`the
`files
`that
`in
`The filesystem read cache hit-rate percen
`memory
`tage on our web sewers is almost always 90% or
`above Most modern UNIXes automatically pick
`reasonable size for the buffer cache but some may
`tuning Many OS vendors
`require manual
`include
`tools for monitoring your cache hit rates Sys
`useful
`have sar we also found HP/UXs
`tem
`derivatives
`rronitor1M and IRIXs osviv1 helpful
`Our Typical Configuration
`
`at our site is workstation
`typical webserver
`class machine e.g Sun SPARC 20 SGI
`Indy or
`Pentium P90
`and
`running between
`150
`128
`For UNIX machines
`least we have
`at
`processes
`128 megabytes of memory to be about
`found
`the
`can use With this much
`that our machines
`most
`the network buffer space we
`memory we have all
`need we get
`high filesystem read-cache
`hit
`rate
`and usually have
`few or even tens of megabytes
`to spare depending on the UNIX version
`
`1995 LISA IX September 17-22 1995 Monterey CA
`
`99
`
`Petitioner IBM – Ex. 1073, p. 6
`
`

`
`Administering Very High Volume Internet Services
`
`Mosedale Foss
`
`McCool
`
`is
`
`or more of disk
`Generally one
`gigabyte
`thys each of our sewers generates
`These
`necessary
`over 200 megs of log data per
`thy before compres
`content we are
`sion and the amount of HTJVIL
`housing continues to grow Data from sar and ker
`that our boxes are spend
`nel profiling code suggest
`ing between 5% and 20% of
`their time waiting for
`the high read-cache hit-rate we
`I/O Given
`disk
`that by moving our log files onto
`separate
`expect
`fast/wide SCSI drive and experimenting with filesys
`tem parameters this percentage will decrease
`significantly
`
`fairly
`
`Miscellaneous Points
`
`In this section we will discuss
`that we
`have
`learned
`during our
`things
`managing web sewers
`Bit About Security
`
`few random
`
`tenure
`
`concerns of
`
`unique
`
`notable
`
`data
`
`that
`Such
`
`is
`
`it
`
`In addition to the normal security
`Internet12I web
`sites on the
`some
`have
`sewers
`security opportunities One of
`the most
`is CGI programs13
`These
`allow the
`to add all sorts of interesting functionality to
`author
`web site Unfortunately
`they can also be
`real
`problem since they generally take
`security
`entered by web user as their input
`they need to be
`very careful about what such data is used for
`CGI script
`an email address and
`takes
`If
`hands it off to sendmail on the command
`line the
`to go through and make sure that no
`needs
`script
`unescaped shell characters are given to the shell
`to do
`unexpected
`might cause
`something
`unexpected interplay between different programs
`common cause of security violations
`Since many
`users who want
`to provide programmatic functional
`ity on their web pages are not
`intimately familiar
`outs of UNIX
`and
`with the ins
`one
`security
`approach to this problem is to simply forbid CGI
`programs in users personal web pages
`However we suggest an alternative mandate
`for CGI programs written by
`use of
`taintperl14I
`users Perl
`is already one of the predominant script
`used to write CGI programs it
`for manipulating data of all sorts
`extremely powerful
`and producing HTJVIL output
`version
`taintperl
`of perl which keeps
`track of the source of the data
`is con
`Any data input by the user
`it uses
`interpreter to be tainted and
`sidered by the taintperl
`cant be used for thngerous
`operations unless expli
`citly untainted This means that such scripts can be
`audited by
`officer by
`security
`easily
`command
`and carefully
`for
`the iintaint
`grepping
`is used
`analyzing the variables on which it
`Web Server Heterogeneity
`An interesting feature of our site is that
`is an
`testbed for ports of our sewer
`to new plat
`ideal
`forms Since it was clear
`to us from the beginning
`that we would be using it
`this way we needed to
`
`ing languages
`
`is
`
`that
`
`reasonably
`
`is
`
`it
`
`it
`
`It
`
`references
`
`think about how we would deal with CGI programs
`some thought we
`and their environment
`After
`just wasnt practical
`to try to port and
`decided that
`test all of our CGI content and force our users to do
`the same for their home pages to every new plat
`form that we wanted to test
`that we
`turned out
`were very fortunate to have considered this early as
`NT
`we eventually
`our Windows
`ended up testing
`port on our web site
`We designated
`machine that
`single alias to
`would run all of our CGI programs We
`then
`that all HTML pages should
`created the guideline
`simply point all CGI script
`to this alias
`nice side effect
`is that this type of content
`is path
`tioned to its own machine which can be specifically
`tailored to CGI sewice
`the machine pool con
`If
`tains many incompatible machines this setup avoids
`having to maintain binaries for CGI programs com
`piled for each machine An average day at our site
`shows 50000 out of 7.5 million requests for CGI
`scripts note that
`this ratio is almost certainly veiy
`dependent on the type of content sewed
`An additional experiment would be to partition
`types of content e.g graphics to their own
`other
`if necessary DNS-based
`machines in the same way
`could be used to spread out
`load
`load-balancing
`same type e.g
`across multiple machines of
`gif.netscape.com could be used to refer to multiple
`machines whose only purpose in life is to sewe GIF
`files
`FTP vs HTTP
`easy way to han
`Both FTP and HTTP offer
`dle file transfer each with relative strengths
`FTP provides
`busy signal
`to the user indicating that
`site is currently process
`ing too many transactions
`That
`limit
`is easily set
`by the site administrator HTTP provides
`mechan
`ism for this as well however
`implemented
`is not
`by many HTTP sewers primarily due to the fact
`that
`user con
`it can be veiy confusing for users When
`to an FTP site they are allowed to transfer
`nects
`they need in that session Due to
`every document
`HTTPs
`stateless nature and the
`it uses
`new connection
`for each
`user
`can
`transfer
`easily get an HTML document and then be refused
`sewice when
`documents
`for
`inlined
`asking
`images This makes for
`very confusing user experi
`future work in HTTP develop
`ence
`is hoped that
`ment will
`this problem Further
`help to alleviate
`work in URL or URN arenas will hopefully provide
`more formal mechanisms
`for defining alternative dis
`tribution machines
`system planning sense FTP should be con
`separate sewice and therefore can be
`sidered to be
`sewed from completely different computer
`cleanly
`file delivery vs
`This offers easier
`log analysis of
`html sewed and also aids in secmity efforts
`sys
`tem running only one sewice correctly
`configured
`
`the
`
`it
`
`file
`
`the
`
`that
`
`is feedback
`
`fact
`
`that
`
`It
`
`In
`
`100
`
`1995 LISA IX September 17-22 1995 Monterey CA
`
`Petitioner IBM – Ex. 1073, p. 7
`
`

`
`Mosedale Foss
`
`McCool
`
`Administering Very High Volume Internet Services
`
`and if breached
`is less likely to be breached
`not mean loss of security for our entire site
`
`does
`
`also appreciate the input from the folks who took the
`time to read drafts of our paper
`
`About
`
`the Authors
`
`The
`
`authors
`
`have
`
`for
`
`the
`
`of
`
`the
`
`been
`responsible
`and babysitting
`implementation
`design
`Netscape web site since its inception
`Dan Mosedale dmose@netscape.com is Lead
`UNIX Administrator
`He has been
`at Netscape
`managing UNIX boxes for long enough to dislike
`Dan likes playing around with
`them thoroughly
`FAQ list about get
`weird Internet stuff and wrote
`ting connected to the MBONE
`William Foss bill@netscape.com is Web-
`focus is on how to
`His current
`master at Netscape
`make the
`site scale even further
`in an economical
`he had the enviable job
`fashion Prior to Netscape
`of playing with large scale UNIX systems and work
`ing for Jim Clark at Silicon Graphics
`Rob McCool
`robm@netscape.com
`staff at Netscape He designed
`member of technical
`and implemented the Netsite Communications and
`Commerce
`he designed
`servers Prior
`to Netscape
`documented
`and
`implemented
`tested
`supported
`NCSA httpd from its inception through version 1.3
`
`is
`
`Bibliography
`
`11 http//home.netscape.com/comprod/mirror
`index.html
`
`2J http//home.netscape.com
`3J Kwan
`User
`Reed
`McGrath
`to NCSAs World Wide Web
`Access Patterns
`http//www-pablo.cs.uiuc.edu/Papers/
`
`Server
`WWW.ps.Z
`DNS Support
`for Load Balancing
`4J Brisco
`RFC 1794 USC/Information Sciences
`April 1995
`ftp//ds.internic .netlrfc/rfc 1794.txt
`lbnamed
`Roland
`Load
`Name Server
`in Perl LISA IX
`http//www
`leland.stanford.edu/schemers/docs/lbnameW
`
`Institute
`
`5J Schemers
`
`Balancing
`
`Conference
`
`Proceedings
`
`lbnamed.html
`
`6J ftp//prep.ai.mit.edulpub/gnulcvs-
`
`Overhauling Rdist
`7J Cooper
`LISA VI Conference Proceedings
`ftp//usc
`.edulpub/rdist
`
`1.5 .tar.gz
`
`for the 90s
`pp 175-188
`
`typ
`
`Performance
`gains are also likely Content
`served by FTP is composed
`of
`ically
`large files
`to HTTP-served
`data which is typically
`compared
`small and quickly
`by
`accessible
`to be
`designed
`modem users
`document and its inlined images are
`short enough to be delivered in short periods Mix
`ing the two different
`types can make it hard to pin
`down
`if FTP and
`especially
`system bottlenecks
`HTTP
`served from
`single machine
`are being
`Many times the two services will
`compete for the
`same resources making it hard to track down prob
`lems in both areas
`While running both FTP and HTTP servers on
`128 HTTP
`one machine we found
`daemon
`that
`processes and an imposed limit of 50 simultaneous
`FTP connections was about all
`workstation-class
`system would tolerate Further growth beyond that
`would cause
`each service to be periodically demed
`network
`250
`Once separated
`however
`resources
`simultaneous FTP connections on
`workstation-class
`machine was handled easily
`HTTP
`service on the other hand offers
`method to gather useful
`information from the reques
`tor via forms before allowing file
`transfer to take
`place This became
`as the
`necessity at Netscape
`cer
`encryption tecimology in our software required
`tain amount of
`legal documentation
`to be agreed to
`prior to download
`
`Conclusion and Future Directions
`
`Although sites such as ours are currently the
`exception we expect
`they will soon become the
`that
`rule as the Internet
`continues
`its exceedingly
`rapid
`growth Additionally we expect content
`to become
`vastly more dynamic
`in the future both on the front
`and
`end using mechanisms
`such as server push14J
`Jav415J and the backend where using SQL data
`become even more
`and search engines will
`bases
`common This promises to provide many new chal
`in the area of performance meas
`lenges especially
`urement and management
`We hope that
`the techniques and information in
`to folks who wish to
`this paper will prove helpful
`very high volimie of ser
`administer sites providing
`vice
`
`Acknowledgements
`
`Brendan
`than

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket