throbber
Web Caching
`
`Duane Wessels
`
`O’REILLY*
`Beijing - Cambridge - Farnbam - K6éIn - Paris - Sebastopol - Taipei - Tokyo
`
`APPLE 1055
`Apple v. SpaceTime3D, Inc.
`IPR2023-00242
`
`APPLE 1055
`Apple v. SpaceTime3D, Inc.
`IPR2023-00242
`
`1
`
`

`

`Web Caching
`by Duane Wessels
`Copyright © 2001 O'Reilly & Associates, Inc. All rights reserved.
`Printed in the United States of America
`
`Published by O'Reilly & Associates, Inc., 101 Morris Street, Sebastopol, CA 95472.
`
`Editors: Nathan Torkington and Paula Ferguson
`
`Production Editor: Leanne Clarke Soylemez
`
`Gover Designer: Edie Freedman
`
`Printing History:
`June 2001;
`
`First Edition.
`
`Nutshell Handbook,the Nutshell Handbooklogo, and the O'Reilly logo are registered
`trademarks of O'Reilly & Associates, Inc. Manyof the designations used by manufacturers
`andsellers to distinguish their products are claimed as trademarks, Where those designations
`appearin this book, and O'Reilly & Associates, Inc. was aware of a trademark claim, the
`designations have been printed in capsorinitial caps. The association between the image of
`a rock thrush and web cachingis a trademark of O'Reilly & Associates, Inc.
`While every precaution has been taken in the preparation ofthis book, the publisher assumes
`no responsibility for errors or omissions, or for damagesresulting fromthe use of the
`information contained herein,
`
`Library ofCongress Cataloging-in-Publication Data
`Wessels, Duane.
`Web Caching/Duane Wessels
`p. cm.
`ISBN 1-56592-536-X
`1. Cache memory. 2. Browsers (Computer programs) 3. Software configuration
`management. 4. World Wide Web.I. Title.
`TK7895.M4 W45 2001
`004.5'3--de21
`ISBN; 1-56592-536-X
`Ic]
`
`2001033173
`
`
`
`
`
`Table of Contents
`
`PHOFACE voeccsssesseessssissessseserssauesssansnceessnisenssseseessvenesssveessssanereesseecsensiecsanisensanoanecssnnneers BOC
`
` 1. Introduction. ......
`
` 1.1 Web Architecture .
`
`1.2 Web Transport Protocols ......c.00 anarennennnnnenansrrenneesrennnennpanigrisid HU SEESAENES 6
`13 Why Cache the Web? |... niiiiaa male nanannm ERMEIER 10
`1.4 Why Not Cache the Web?
`
`15 Types of Web Caches
`ts
`16 Caching Proxy Features
`
`1.7 Meshes, Clusters, and Hierarchies .....c.ccccccee eects LE
`VB PrOCUCtS eee cece ceee reece ceeesceeseesasssnssiensesseanesersueneiseneeqiseeisesaenseneeneneneese 19
`
`2. How Webs Caching WOPRS .seccccccccsceccccsesscsesssesneees
`2.1 HTTP Requests ..
`
`2.2
`Is It Cachable? ...
`2.3 Hits, Misses, and Freshness .......cccccssessesesrssreeessreserseeseareeeererranteseesnsensentees
`
`DA Hit Ratios oe ceeees cesses ee esesee sascha hidscGAcas sca’ cuaaSaUnetebuiDeNRIR SHAS OND 37
`2.5 Validation 20...
`
`2.6 Forcing a Cache to Refresh
` 2.7 Cache Replacement.........
`3. Politics Of WED CACBIIG eccccsecssessseeseessssesseesiissnesnessecansstesesseneenrecsnneess 48
`
`3.1
`Privacy......
`3.2 Request Blocking .
`
`3.3. Copyright..
`
`2
`
`

`

`
`vt
`Table of Contents
`
`Table of Contents
`
`vit
`
`34 Offensive Content vce
`. 63
`
`3.5 Dynamic Web Pages ..ceceeee
`. 64
`
`3.6 Content Integrity oo ccceesee nee ees
`- 65
`
`. 66
`3.7. Cache Busting and Server Busting...
`
`3.8 Advertising .
`. 68
`3.9 Trust..
`69
`
`3.10 Effects of Proxies
`70
`
`4, Configuring Cache Clients
`wf
`
`4.1 Proxy Addresses ............
`73
`
`4.2. Manual Proxy Configuration
`savenaesaneanee
`73
`
`43 Proxy Auto-Configuration Script
`.
`77
`44 Web Proxy Auto-Discovery .....
`83
`
`wens OA
`4.5 Other Configuration Options
`4.6 The Bottom Line ..
`. 84
`
`
`5. Interception Proxyitig ANG CACDING vessecccsssesssssersnivisssscsssnessice: 86
`
`5.1 Overview oe
`87
`5.2 The IP Layer: Routing .
`&
`
`5.3. The TCP Layer: Ports and Delivery
`96
`5.4 The Application Layer: HTTP
`. 100
`
`5.5 Debugging Interception........
`. 101
`TSSUCS oo esc cesssneseeneesveessesseeeneesnvecsnseerssesissenviecnnusensnesnnsssivessareasaversanies 102
`5.6
`5.7 To Intercept or Not To Intercept 0... cccecccsecsecnsesereerireeserennrseeansenvevces 108
`
`6. Configuring Servers to Work with CaCBCS .oo....ccccceccccecscevccvevevrs LOD
`
`. 110
`6.1
`Important HTTP Headers
`
`215
`6.2. Being Cache-Friendly.....
`63 Being Cache-Unfriendly ....
`. 127
`6.4 OtherIssues for Content Providers ....ccecsesssesssssssiceesiessessssessesiveccesesees 128
`
`7. Cache Hierarchies......
`. 132
`
`7.1 How Hierarchies Work
`we 132
`72
`. 134
`. 136
`73
`7A Optimizing Hierarchies o......cccccccccscssesssssesrseveeevsevesvesvenscsueeusnseencavsnavesees 142
`
`8. Intercache Protocols .
`
` 8.1
`
`8.2
`8.3
`84 Cache Digests
`8.5 Which Protocol t0 US@ ciesssessssesesssseetssnsavsrsessrssdessesesreesereesreesenres LOZ
`
`10.
`
`il.
`
`12.
`
`Cache Clusters
`
`9.1 The Hot Spare
`'
`
`9.2 Throughput and Load Sharing
`
`9.3 Bandwidth ween
`
`Design Considerationsfor Caching Services
`10.1. Appliance or Software Solution
`10.2 Disk Space .acicccceeeccccens
`
`10.39 Memory occ
`
`10.4 Network Interfaces
`10.5 Operating SyStems oo... cece cents rere eerenerennanenenanenes
`
`10.6 High Availability ...
`
`Intercepting Traffic
`10.7
`
`10.8 Load Sharing
`
`10.9 Location
`10.10 Using a Hierarchy
`
`Monitoring the Health of Your Caches .
`
`se
`11.1 What to Monitor? eerie
`11.2 Monitoring TOONS 0.0... eccscsesseeseeee re ceeeeceereeteeeeseendineesneiteeentendeases 186
`
`Benchmarking Proxy Caches
`
`T2.1 Metrics ...ceecceeeeeeseteeee
`12.2 Performance Bottlenecks
`
`12.3. Benchmarking Tools
`
`12.4 Benchmarking Gotchas
`
`
`12.5 How to Benchmark a Proxy Cache 12.6 Sample Benchmark Results............
`
`
`
`3
`
`

`

`viii
`
`Table ofContents
`
`A, Analysis ofProduction Cache Trace DAtQ@ oooccccccscccsssssseseeesccs. 215
`B, Internet Cache Protocol ...occccccccccccsssssssscessevsccssssse.
`escevanee 235
`
`C. Cache Array ROUting PYOTOCOD eeecccesesisssessescrscsssnieee.. 246
`D, Hypertext Caching Protocol .o....cccccsossssssssesisvesossssssseeeeosecc. 254
`
`F. CACBC DISCSES aeeceetccseseesesnstnsstesscessetinsnintiitinnetintiiseeeee.. 266
`El HTTP Status Codes oocceoccccccccccccccccccocc
`
`G@ USC. 17 Sec. 512. Limitations on Liability
`Relating to Material Online occ. sta oronenacennsanansstya sen 279
`
`
`EL. List OfACHONYMS
`
`wooseccoessesssnsssirtictinvissstivissseteseeececcc, 282
`
`FIDHOQVAPBY esrcs sesestsisninsiectvtnsivtantats pattie. 288
`
`
`INBOX cocer seeniitincnenniiinneiesnriiinaninuniiiuiuieeeecc. 291
`
`
`
`Preface
`
`When I first started using the Internet in 1986, my friends and I were obsessed
`with anonymous FTP servers. What a wonderful concept! We could downloadall
`sorts of interesting files, such as FAQs, source code, GIF images, and PC share-
`ware. Of course, downloading could be slow, especially from the busy sites like
`the famous WSAR-SIMTEL20.ARMY.MIL archive,
`
`In order to download files to my PC, I would first fip them to my Unix account
`and then use Zmodem to transfer them to my PC through my 1200 bps modem.
`Usually,
`I deleted a file after downloading it, but there were certain files—tlike
`HAOSTS. TXT and the “Anonymous FTP List”—that I kept on the Unix system. After
`a while,
`I had some scripts to automatically locate and retrieve a list of files for
`later download. Since our accounts had disk quotas, I had to carefully remove old,
`unused files and keep the useful ones. Also, I knew that if I had to delete a useful
`file, Mark, Mark, Ed, Jay, or Wim probably had a copyin their account.
`Although I didn’t realize it at the time, I was caching the FTP files. My Unix
`account provided temporary storage for the files 1 was downloading. Frequently
`referenced files were kept as long as possible, subject to disk space limitations.
`Before retrieving a file from an FTP server, I often checked myfriend’s “caches” to
`see if they already had what I was lookingfor.
`Nowadays, the World Wide Web is whereit's at, and caching is here too. Caching
`makes the Webfeel faster, especially for popular pages. Requests for cached infor-
`mation come back muchfaster than requests sent to the content provider. Further-
`more, caching reduces network bandwidth, which translates directly into cost
`savings for many organizations.
`In many ways, web caching is similar to the way it was in the Good OP Days. The
`basic ideas are the same: retrieve and store files for the user. When the cache
`
`4
`
`

`

`Preface
`
`Preface
`
`xt
`
`becomesfull, some files must be deleted, Web caches can cooperate and talk to
`each other when looking for a particular file before retrieving it from the source.
`Of course, web caching is significantly more sophisticated and complicated than
`my early Internet years. Caches are tightly integrated into the web architecture,
`often without the user’s knowledge. The Hypertext Transfer Protocol was designed
`with caching in mind. This gives users and content providers more control (per-
`haps too much) over the treatment of cached data.
`In this book, you'll
`learn how caches work, how clients and servers can take
`advantage of caching, what issues are important, how to design a caching service
`for yourorganization, and more.
`
`Audience
`The material in this book is relevant to the following groups of people:
`Administrators
`This book is primarily writen for those of you whoare, or will be, responsible
`for the day-to-day operation of one or more web caches. You might work for
`an ISP, a corporation, or an educational institution. Or perhaps you'd like to
`set up a web cache for your home computer
`Contentproviders
`I sincerely hope that content providers take a look at this book, and especially
`Chapter 6, to see how making their content more “cache aware” can improve
`their users’ surfing experiences.
`Web developers
`Anyone developing an application that uses HTTP needs to understand how
`web caching works, Many users today are behind firewalls and caching prox-
`ies. A significant amount of HTTP traffic is automatically intercepted and sent
`to web caches. Failure to take caching issues into considcration may adversely
`affect the operation of your application.
`Web users
`Usually, the people who deploy caches want them to be transparent to the
`end user. Indeed, users are often unaware thal they are using a web cache.
`Even so, if you are “only” a user, I hope that youfind this book useful and
`interesting. It can help you understand why you sometimes see stale web
`Pages and what you can do aboutit. If you are concerned about yourprivacy
`on the Internet, be sure to read Chapter 3. If you want to know how to con-
`figure your browser for caching, see Chapter 4.
`
`To benefit from this book, you need to have only a user-level understanding ofthe
`Web. You should know that Netscape Navigator and Internet Explorer are web
`
`browsers, that Apache is a web server, and that Jttp:/Avww.oreilly.com is a URL. If
`you have some Unix system administration experience, you can use some of the
`examplesin later chapters.
`
`What You Will and Won't Find Here
`Chapter 1 introduces caching and provides some background material to help the
`rest of the book make sense. In addition, companies that provide caching products
`are listed here. In Chapter 2, we'll dive into the Hypertext Transfer Protocol and
`explore its features for caching. Chapter3 is relatively nontechnical and discusses
`some of the controversies that surround web caching, such as copyrights and pri-
`vacy.
`
`In Chapter 4, you'll see the various ways to configure user agents (browsers) for
`caching, with a focus on Netscape Navigator and Microsoft Internet Explorer. Many
`administrators prefer to automatically intercept and divert HTTP connections to a
`cache. We'll talk about that in Chapter 5. Then, in Chapter 6, we’ll turn to servers
`and see how content providers can maketheir information cache-friendly.
`Chapter 7 and Chapter 8 are about cache hierarchies. First we'll talk about them in
`general, including why you should or should notparticipate in a hierarchy. Then
`you'll learn about the protocols caches use to communicate with each other. Chap-
`ter 9 is a short chapter about cacheclusters. Although clusters have some things in
`common with cache hierarchies,
`it
`is easier to understand some of the nuances
`after you've learned about the intercache protocols.
`In Chapter 10, I'll walk you through some of the decisions you'll face in procuring
`and building a caching service for your organization. Following that, Chapter 11
`offers advice on monitoring the health of your caches once they are operational.
`For the Unix-savvy, I'll show how to set up UCD-SNMPD and RRDTool forthis
`purpose. Chapter 12 is about benchmarking the performance of caches.
`I analyze some logfiles from production caches in Appendix A. Here you can see
`some samplefile size distributions, content types, HTTP headers, andhit ratio sim-
`ulations. The next four appendixes are about intercache protocols. Appendix B
`describes the technical details of ICP. Appendix D does the same for HTCP,
`Appendix C for CARP, and Appendix E for cache digests. Appendix F is a list of
`HTTP status codes from RFC 2616. Appendix G contains the text of a U.S. copy-
`right statute that mentions caching. Finally, in Appendix H, you'll find definitions
`for many of the acronymsI use in this book.
`The new, hot topics in the caching industry are streaming media and content dis-
`tribution networks. This book focuses on HTTP and FTP caching techniques with
`proven results, eschewing technology thatis still evolving.
`
`
`
`5
`
`

`

`
`
`wet Preface
`
`Caching Resources
`information about
`Here are a few resources you can use to find additional
`caching. Caching vendors’ websites are listed in Section 1.8, “Products.”
`
`Web Sites
`
`See the following web sites for more information about web caching:
`bitp.//;www.web-caching.com
`including
`This well-designed site contains a lot of up-to-date information,
`product information, press releases, links to magazinearticles online, industry
`events, and job postings. The site also has a bulletin board discussion forum.
`The banner ads at the top of every page are mildly annoying.
`bttp://www.caching.com
`This professionalsite is full of web caching information in the form of press
`releases, upcoming events, vendor white papers, online magazine articles, and
`analyst commentaries. The site is apparently sponosred by a number of the
`caching companies.
`bitp.//wunv.web-cache.com
`At my ownsite for caching information, you'll find mostly links to other web
`sites,
`including caching vendors and caching-related services.
`I also try to
`keep up with relevant technical documents (e.g., RFCs) and research papers.
`btlp://dmoz.org/Computers/Software/Interinet/Siervers/Proxy/Caching/
`The Open Directory Project has a decent-sized collection of web caching links
`at the above URL.
`bitp://wwwwrec.org
`The Web Replication and Caching (WREC) working group of the IETF is offi-
`cially dead, but thissite still has some useful information,
`bitp://www.iwew.org
`This site provides information about the series of annual International Web
`Caching Workshops.
`
`Mailing Lists
`The following mailinglists discuss various aspects of web caching:
`isp-caching
`Currently the most active caching-related mailing list. Averages about 2-3
`messages per day. Posting here is likely to result in a number of salespeople
`knocking on your mailbox. One of the great things about this list
`is that
`replies automatically go back to the list unless the message composer is
`
`Preface
`
`
`
`careful. On many occasions people have posted messages that they wish they
`could take back! For more information,visit btip./Avww.isp-caching.com.
`WEBI
`WEBI (Web Intermediaries) is a new IETF working group, replacing WREC.
`The discussion is bursty, averaging about 1-2 messages per day. This is not an
`appropriate forum for discussion of web caching in general; topics should be
`related to the working group charter. Currently the group is addressing inter-
`mediary discovery (a la WPAD) and the resource update protocol. For addi-
`tional
`information,
`including the charter and subscription instructions, visit
`bttp:/www.telf.org/biml. charters/webi-charter.him.
`HTTP working group
`the mailing list still
`Although the HTTP working group is officially dead,
`receives a small amountof traffic. Messagesare typically from somebody ask-
`ing for clarification about the RFC. For subscription information and access to
`the archives, visit btip.//www.ics.uci.edu/pub/ietybtip/bypermay.
`loadbalancing
`The people onthis list discuss all aspects of load balancing, including hints for
`configuring specific hardware, performance issues, and security alerts. Traffic
`averages about 3-4 messages per day. For more more information, visit
`bttp://wwwlbdigest.com.
`
`Conventions Used in This Book
`Luse the following typesetting conventions in this book:
`Italic is also used for
`tOoes for emphasis and to signify the first use of a term.
`URLs, host names, email addresses, FIP sites, file and directory names, and
`commands.
`Constant width
`Used for HTTP header names and directives, such as Tf-modified-since and
`mo-cache.
`
`How To Contact Us
`You can contact the author at wessels@packet-pushers,.com.
`Please address comments and questions concerning this book to the publisher:
`OReilly & Associates, Inc.
`101 Mortis Street
`:
`Sebastopol, CA 95472
`(800) 998-9938 Cin the United States or Canada)
`
`6
`
`

`

`_ eeaLL
`
`Preface
`
`Preface
`
`oe
`
`(707) 829-0515 (international or local)
`(707) 829-0104 (fax)
`
`We have a web page for this book, where we list examples, errata, or any addi-
`tional information. You can access this page at:
`btip.//www.oreilly.com/catalog/webcaching/
`To comment or ask technical questions aboutthis book, send emailto:
`bookquestions@oreilly.com
`
`For more information about our books, conferences, software, Resource Centers,
`and the O'Reilly Network, see our website at:
`bttp://www.oreilly.com
`
`Acknowledgments
`Jam extremely lucky to have been put in the position to write this book. There
`are so many people who have helped me along the way. First, I want to thank Dr.
`Jon Sauer and Dr. Ken Klingenstein at the University of Colorado for supporting
`my graduate work in this field. Huge thanks to Michael Schwartz, Peter Danzig,
`and other members of the Harvest project for the most enjoyable job Pll probably
`ever have.
`I don’t think I can ever thank k claffy and Hans-Werner Braun of
`NLANR enoughfor taking me in and allowing me to work on the IRCache project.
`I am also in karma-debt to all of my CAIDA friends (Tracie, Amy, Rochell, Jennif-
`fer, Jambi) for taking care of business in San Diego so I could stay in Boulder,
`Thanks to Marla Mcehl and the National Center for Atmospheric Research for a
`place to sit and an OC-3 connection.
`
`This book has benefited immensely from the attentive eyes of the folks who
`reviewed the manuscript: Ittai Gilat (Microsoft), Lee Beaumont (Lucent), Jeff Boote
`(NCAR), Reuben Farrelley, and Valery Soloviev (nktomi). Special thanks also to
`Andy Cervantes of The Privacy Foundation.
`As usual, the folks at O'Reilly have done a fantastic job. Nat, Paula, Lenny, Erik:
`Let’s do it again sometime!
`
`Since I've been working on web caching, I have been very fortunate to work with
`many wonderful people. I truly appreciate the support and friendship of Jay Adel-
`son, Kostas Anagnostakis, Pei Cao, Glenn Chisholm, Ian Cooper, Steve Feldman,
`Henry Guillen, Martin Hamilton, Ted Hardie, Solom Heddaya, Ron Lee, Ulana Leg-
`edza, Carlos Maltzahn, John Martin, Ingrid Melve, Wojtek Sylwestrzak, Bill Wood-
`cock, and Lixia Zhang.
`
`Thanks to my family (Karen, John, Theresa, Roy) for their constant support and
`understanding. Despite the efforts of my good friends Ken, Scott, Bronwyn, Gen-
`nevive, and Brian, who tried to tie up all my free time,
`I finished anyway! My
`coworkers, Alex Rousskov and Matthew Weaver, are champs for putting up an
`endless barrage of questions, and for tolerating my odd working hours. A big
`thank you to everyone who writes free software, especially the FreeBSD hackers.
`But mostof all, thanks to all the Squid users and developers out there!
`
`
`
`7
`
`

`

`
`
`
`
`Introduction
`
`The term cache has French roots and means,literally, to store. As a data process-
`ing term, caching refers to the storage of recently retrieved computer information
`for future reference. The stored information may or may not be used again, so
`caches are beneficial only when the cost of storing the information is less than the
`cost of retrieving or computing the information again.
`The concept of caching has found its way into almost every aspect of computing
`and networking systems. Computer processors have both data and instruction
`caches. Computer operating systems have buffer caches for disk drives andfilesys-
`tems. Distributed (networked) filesystems such as NFS and AFS rely heavily on
`caching for good performance. Internet routers cache recently used routes. The
`Domain Name System (DNS)
`servers cache hostname-to-address and other
`lookups.
`Caches work well because of a principle known as focality of reference. There are
`two flavors of locality: temporal and spatial. Temporal locality means that some
`pieces of data are more popular than others. CNN’s home page is more popular
`than mine. Within a given period of time, somebody is more likely to request the
`CNN page than my page.Spatial locality meansthat requests for certain pieces of
`data are likely to occur together. A request for the CNN homepageis usually fol-
`lowed by requests for all of the page’s embedded graphics. Caches use locality of
`reference to predict future accesses based on previous ones. When the prediction
`is correct, there is a significant performance improvement. In practice, this tech-
`nique works so well that we would find computer systems unbearably slow with-
`out memory and disk caches. Almostall data processing tasks exhibit locality of
`reference and therefore benefit from caching.
`When requested data is found in the cache, we call it a Ait, Similarly, referenced
`data that is not cached ic nemaca mice The narfnemanne imneniuamant that a
`
`
`
`8
`
`

`

`
`
`a Chapter 1: introduction
`
`Li Web Architecture
`
`3
`
`cache provides is based mostly on the difference in service times for cache hits
`compared to misses. The percentage of all requests that are hits is called the pit
`ratio,
`
`Any system that utilizes caching must have mechanisms for maintaining cache con-
`sistency. This is the process by which cached copies are kept up-to-date with the
`originals. We say that cached data is either fresh or stale. Caches can reuse fresh
`copies immediately, but stale data usually requires validation. The algorithms that
`are to maintain consistency may be either weak or strong. Weak consistency
`means that the cache sometimes returns outdated information, Strong consistency,
`on the other hand, means that cached data is always validated beforeit is used.
`CPU andfilesystem caches require strong consistency, However, some types of
`caches, such as those in routers and DNSresolvers, are effective even if they
`return stale information.
`
`We know that caching plays an important role in modern computer memory and
`disk systems. Can it be applied to the Web with equal success? Ask different peo-
`ple and you're likelyto get different answers. For some, caching is critical to mak-
`ing the Web usable. Others view caching as a necessary evil. A fraction probably
`considerit just plain evil (Tewksbury, 1998].
`In this book, I'll talk about applying caching techniques to the World Wide Web
`and try to convince you that web caching is a worthwhile endeavor. We'll see how
`web caches work, how they interact with clients and servers, and the role that
`HTTP plays. You'll learn about a number of protocols that are used to build cache
`clusters and hierarchies. In addition to talking about the technical aspects,
`I also
`spend a lot of time on the issues and politics, The Web presents some interesting
`problems dueto its highly distributed nature.
`
`After you've read this book, you should be able to design and evaluate a caching
`proxy solution for your organization. Perhaps you'll install a single caching proxy
`on your firewall, or maybe you néed many caches located throughout your net-
`work, Furthermore, you should be well prepared to understand and diagnose any
`problems that may arise from the Operation or failure of your caches. If you're a
`content provider, then I hope I'll have convinced youto increase the cachability of
`the information youserve.
`
`L.1 Web Architecture
`Before we can talk more about caching, we need to agree on some terminology.
`Whenever possible, I use words and meanings taken from Internet standards doc-
`uments. Unfortunately, colloquial usage of web caching terminologyis often just
`different enough to be confusing.
`
`1.1.1 Chlents and Servers
`The fundamental building blocks of the Web (and indeed most distributed sys-
`tems) are clents and servers. A web server manages and provides access to a set
`of resources. The resources might be simple text files and images, or something
`more complex, such as a relational database. Clients, also known as user agents,
`initiate a transaction by sending a request to a server, The server then processes
`the request and sends a response back to the client.
`On the Web, most transactions are download operations; the client downloads
`some information from the server. In these cases, the requestitself is quite small
`(about 200 bytes) and contains the name ofthe resource, plus a smal! amount of
`additional information from the client. The information being downloaded is usu-
`ally an image ortext file with an average size of about 10,000 bytes. This charac-
`teristic of the Web makes cable- and satellite-based Internet services viable. The
`data rates for receiving are much higher than the data rates for sending because
`web users mostly receive information.
`A small percentage of web transactions are more correctly characterized as upload
`operations. In these cases, requests are relatively large and responses are very
`small. Examples of uploads include sending an email message and transferring an
`image file from your computerto a server.
`The most common webclients are called browsers. These are applications such as
`Netscape Navigator and Microsoft Internet Explorer. The purpose of a browseris
`to render the web content for us to view and interact with. Because of the myriad
`of features present in web browsers, they are really very large and complicated
`programs. In addition to the GUI-basedclients, there are a few simple command-
`line client programs, such as Lynx and Wget.
`A numberof different servers are in widespread use on the Web. The Apache
`HTTP server is a popular choice and freely available. Netscape, Microsoft, and
`other companies also have server products. Many content providers are concerned
`with the performance of their servers. The most popular sites on the Net can
`receive ten million requests per day with peak request rates of 1000 per second. At
`this scale, both the hardware and software must be very carefully designed to
`cope with the load. Many sites run multiple servers in parallel to handle their high
`requestrates and for redundancy.
`Recently, there has been a lot of excitement surrounding peer-to-peer applications,
`such as Napster. In these systems, clients share files and other resources (e.g., CPU
`cycles) directly with each other. Napster, which enables people to share MP3files,
`does not store the files on its servers. Rather,
`it acts as a directory and returns
`pointers to files so that two clients can communicate directly. In the peer-to-peer
`realm, there are no centralized servers; every client is a server.
`
`
`
`9
`
`

`

`4
`
`
`
`5
`
`Chapter 1: Introduction
`
`Li Web Architecture
`
`The peer-to-peer movementis relatively young but already very popular. It’s likely
`that a significant percentage of Internettraffic today is due to Napster alone. How-
`ever, I won't discuss peer-to-peer clients in this book. One reason for this is that
`Napster uses its own transfer protocol, whereas here we'll focus on HTTP.
`
`1.1.2 Proxies
`
`Muchof this book is about proxies. A proxy is an intermediary in a web transac-
`tion. It
`is an application that sits somewhere between the client and the origin
`server. Proxies are often used on firewalls to provide security. They allow (and
`record) requests from the internal network to the outside Internet.
`
`A proxy behaveslike both a client and a server. It acts like a server to clients, and
`like a client to servers. A proxy receives and processes requests from clients, and
`then it forwards those requests to origin servers, Some people refer to proxies as
`“application layer gateways.” This name reflects the fact that the proxy lives at the
`application layer of the OSI reference model,
`just like clients and servers. An
`important characteristic of an application layer gateway is that it uses two TCP
`connections: one to the client and oneto the server. This has important ramifica-
`tions for some of the topics we'll discuss later,
`
`Proxies are used for a numberof different things, including logging, access con-
`trols, filtering, translation, virus checking, and caching. We'll talk more about these
`and the issues they create in Chapter 3.
`
`L1.3 Web Objects
`T use the term object to refer to the entity exchanged between a client and a
`server. Some people may use document or page, but these terms are misleading
`because they imply textual information or a collection of text and images. “Object”
`is generic and better describes the different types of content returned from servers,
`such as audio files, ZIP files, and C programs. The standards documents (RFCs)
`that describe web components and protocols prefer the terms entity, resource, and
`response. My use of object corresponds to their use of entity, where an object
`(entity) is a particular response generated from a particular resource, Web objects
`have a numberof important characteristics, including size (number of bytes), type
`(HTML, image, audio, etc.), time of creation, and time of last modification,
`In broad terms, web resources can be considered either dynamic or static.
`Responses for dynamic resources are generated on the fly when the request is
`made. Static responses are pregenerated,
`independent of client requests. When
`people think of dynamic responses, often what comes to mind are stock quotes,
`live camera images, and web page counters. Digitized photographs, magazine arti-
`cles, and software distributions are all static information. The distinction between
`
`is not necessarily so clearly defined. Many web
`dynamic and static content
`resources are updated at various intervals (perhaps daily) but not uniquely gener-
`ated on a per-request basis. The distinction between dynamic andstatic resources
`is important becauseit has serious consequences for cache consistency.
`
`LL4 Resource Identifiers
`Resource identifiers are a fundamental piece of the architecture of the Web. These
`are the names and addresses for web objects, analogous to street addresses and
`telephone numbers. Officially,
`they are called Universal Resource Identifiers, or
`URIs. They are used by both people and computers alike. Caches use them to
`identify and index the stored objects. According to the design specification, RFC
`2396, URIs must be extensible, printable, and able to encode all current and future
`naming schemes. Because of these requirements, only certain characters may
`appear in URIs, and some characters have special meanings.
`Uniform Resource Locators (URLs) are the most common form of URI in use today.
`The URL syntax is described in RFC 1738. Here are some sample URLs:
`bttp://jeww.zoidbergnet
`bttp.//www.oasis-open,org/docbook/index.btml
`Sipsftpfreebsdorg/pub/FreeBSD/README.TXT
`URLs have a very important characteristic worth mentioning here. Every URL
`includes a network host address—either a hostname or an IP address. Thus, a URL
`is bound to a specific server, called the origin server. This characteristic has some
`negative side effects for caching. Occasionally, the same resource exists on two or
`more servers, as occurs with mirror sites. When a resource has more than one
`name,
`it can get cached under different names. This wastes storage space and
`bandwidth.
`
`Uniform Resource Names (URNs) are similar to URLs, but they refer to resources in
`a location-independent manner. RFC 2141 describes URNs, which are also some-
`times called persistent names. Resources named with URNs can be moved from
`one server (location) to another without causing problems. Here are some sample
`(hypothetical) URNs:
`urn:duns:0023 72413:annual-report-1997
`urnisbn:156592530X
`
`its birthplace at CERN in Geneva,
`left
`the World Wide Web Project
`In 1995,
`Switzerland, and became the World Wide Web Consortium. In conjunction with
`this move,
`their web site location changed from info.cern.ch to wuw.w3c.org.
`Everyone who used a URL with the old location received a page with a link to the
`
`
`
`10
`
`10
`
`

`

`
`
`6 Chapter 1: Introduction
`
`1.2 Web Transport Protocols
`
`
`
`new location and a reminder to “update your links and hotlist."* Had URNs been
`implemented and in use back then, such a problem could have been avoided,
`Another

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket