throbber
Reducing Network Traffic
`
`0 REILLY®
`
`Duane Wessels
`
`AHBLT-2013.001
`
`

`

`Web Caching
`Web Caching
`
`AHBLT-2013.002
`
`AHBLT-2013.002
`
`

`

`Web Caching
`
`Duane Wessels
`
`O'REILLY®
`Beijing · Cambridge · Farnham · Koln · Paris · Sebastopol · Taipei · Tokyo
`
`AHBLT-2013.003
`
`

`

`Web Caching
`by Duane Wessels
`
`Copyright © 2001 O'Rellly & Associates, Inc. All rights reserved.
`Printed in the United States of America.
`
`Published by O'Reilly & Associates, Inc., 101 Morris Street, Sebastopol, CA 95472.
`
`Editors: Nathan Tarkington and Paula Ferguson
`
`Production Editor: Leanne Clarke Soylemez
`
`Cover Designer: Edie Freedman
`
`Printing History:
`
`June 2001:
`
`First Edition.
`
`Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are registered
`trademarks of O'Reilly & Associates, Inc. Many of the designations used by manufacturers
`and sellers to distinguish their products are claimed as trademarks. Where those designations
`appear in this book, and O'Reilly & Associates, Inc. was aware of a trademark claim, the
`designations have been printed in caps or initial caps. The association between the image of
`a rock thrush and web caching is a trademark of O'Reilly & Ass9ciates, Inc.
`
`While every precaution has been taken in the preparation of this book, the publisher assumes
`no responsibility for errors or omissions, or for damages resulting from the use of the
`information contained herein.
`
`Library of Congress Cataloging-in-Publication Data
`
`Wessels, Duane.
`Web Caching/Duane Wessels
`p. cm.
`ISBN 1-56592-536-X
`1. Cache memo1y. 2. Browsers (Computer programs) 3. Software configuration
`management. 4. World Wide Web. I. Title.
`
`TK7895.M4 W45 2001
`004.5'3--clc21
`
`ISBN: 1-56592-536-X
`[CJ
`
`2001033173
`
`--
`
`Pre}
`
`1.
`
`2.
`
`3,
`
`AHBLT-2013.004
`
`

`

`Table of Contents
`
`Preface ..................................................................................................................... ix
`
`1. Introduction .................................................................................................. 1
`1.1 Web Architecture ........................................................................................ 2
`1.2 Web Transport Protocols ........................................................................... 6
`1.3 Why Cache the Web? ............................................................................... JO
`1.4 Why Not Cache the Web? ........................................................................ 13
`1. 5 Types of Web Caches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
`1.6 Caching Proxy Features ........................................................................... 17
`1.7 Meshes, Clusters, and Hierarchies .......................................................... 18
`1.8 Products .................................................................................................... 19
`
`2. How Web Caching Works ....................................................................... 21
`2.1 HTTP Requests ......................................................................................... 21
`Is It Cachable? .......................................................................................... 24
`2.2
`2.3 Hits, Misses, and Freshness ..................................................................... 34
`2.4 Hit Ratios .................................................................................................. 37
`2.5 Validation ................................................................................................. 38
`2.6 Forcing a Cache to Refresh ..................................................................... 41
`2.7 Cache Replacement ................................................................................. 44
`
`3. Politics of Web Caching ........................................................................... 48
`3.1 Privacy ...................................................................................................... 49
`3.2 Request Blocking ..................................................................................... 55
`3.3 Copyright .................................................................................................. 57
`
`v
`
`AHBLT-2013.005
`
`

`

`vi
`
`Tc:tble
`
`3.4 Offensive Content .................................................................................... 63
`3.5 Dyna1nic Web Pages ................................................................................ 64
`3.6 Content Integrity ...................................................................................... 65
`3.7 Cache Busting and Server Busting .......................................................... 66
`3.8 Advertising ............................................................................................... 68
`3.9 Trust .......................................................................................................... 69
`3.10 Effects of Proxies ................................................................................... 70
`
`4. Configuring Cache Clients ..................................................................... 72
`4.1 Proxy Addresses ....................................................................................... 73
`4.2 Manual Proxy Configuration ................................................................... 73
`4.3 Proxy Auto-Configuration Script ............................................................. 77
`4.4 Web Proxy Auto-Discovery ..................................................................... 83
`4.5 Other Configuration Options .................................................................. 84
`4.6 The Botto111 Line ...................................................................................... 84
`
`5. Interception Proxying and Caching .................................................. 86
`5.1 Overview .................................................................................................. 87
`5.2 The IP Layer: Routing .............................................................................. 89
`The TCP Layer: Ports and Delive1y ......................................................... 96
`5.4 The Application Layer: HTTP ............................................................... JOO
`5.5 Debugging Interception ........................................................................ 101
`Issues ...................................................................................................... 102
`5.6
`5.7 To Intercept or Not To Intercept .......................................................... 108
`
`6. Configuring Servers to Work with Caches .................................... 109
`6.1
`Important HTTP Headers ...................................................................... 110
`6.2 Being Cache-Friendly ............................................................................ 115
`6.3 Being Cache-Unfriendly ........................................................................ 127
`Other Issues for Content Providers ...................................................... 128
`
`7. Cache Hierarchies .................................................................................. 132
`7.1 How Hierarchies Work .......................................................................... 132
`7.2 Why Join a Hierarchy? ........................................................................... 134
`7.3 Why Not Join a Hierarchy? .................................................................... 136
`7.4 Optimizing Hierarchies ......................................................................... 142
`
`8. Ii
`8.
`8.
`8.
`8.
`8
`9. c
`9
`9
`9
`
`10. L
`11
`11
`11
`1<
`1<
`
`}1
`
`}i
`
`1
`
`1·
`
`11. A
`1
`1
`
`12. 1
`1
`1
`1
`1
`1
`1
`
`AHBLT-2013.006
`
`

`

`Table of Contents
`
`vii
`
`8. Intercache Protocols .............................................................................. 144
`8.1
`ICP .......................................................................................................... 145
`8.2 CARP ...................................................................................................... 156
`8.3 HTCP ...................................................................................................... 158
`8.4 Cache Digests ........................................................................................ 159
`8.5 Which Protocol to Use .......................................................................... 163
`
`9. Cache Clusters ......................................................................................... 165
`9.1 The Hot Spare ........................................................................................ 166
`9.2 Throughput and Load Sharing .............................................................. 167
`9.3 Bandwidth .............................................................................................. 168
`
`10. Design Considerations for Caching Services ............................... 170
`10.1 Appliance or Software Solution .......................................................... 170
`10.2 Disk Space ........................................................................................... 173
`10.3 Memory ................................................................................................ 175
`10.4 Network Interfaces .............................................................................. 175
`10.5 Operating Systems ............................................................................... 176
`10.6 High Availability .................................................................................. 177
`10.7
`Intercepting Traffic .............................................................................. 178
`10.8 Load Sharing ........................................................................................ 179
`10.9 Location ................................................................................................ 180
`10.10 Using a Hierarchy .............................................................................. 180
`
`11. Monitoring the Health of Your Caches .......................................... 182
`11.1 What to Monitor? ................................................................................. 183
`11.2 Monitoring Tools .................................................................................. 186
`
`12. Benchmarking Proxy Caches ............................................................. 191
`12.1 Metrics .................................................................................................. 192
`12.2 Performance Bottlenecks .................................................................... 194
`12.3 Benchmarking Tools ............................................................................ 197
`12.4 Benchmarking Gotchas ....................................................................... 203
`12.5 How to Benchmark a Proxy Cache .................................................... 206
`12.6 Sample Benchmark Results ................................................................. 210
`
`AHBLT-2013.007
`
`

`

`Table
`
`A. Analysis of Production Cache Trace Data .................................... 215
`
`B. Internet Cache Protocol ....................................................................... 235
`c. Cache Array Routing Protocol .......................................................... 246
`
`D. Hypertext Caching Protocol ............................................................... 254
`
`E. Cache Digests ........................................................................................... 266
`
`E HTTP Status Codes ............................. ; ................................................... 274
`
`G. US.C. 17 Sec. 512. Limitations on Liability
`Relating to Material Online ............................................................... 279
`
`H. List of Acronyms ..................................................................................... 282
`
`Bibliography ..................................................................................................... 288
`
`Index .................................................................................................................... 291
`
`-
`
`When
`with a
`sorts c
`ware.
`the far
`
`In ord
`and th
`Usuall1
`HOST~
`a whH
`later di
`unusec
`file, M<
`
`Althou
`accour
`referer
`Before
`see if t
`
`Nowac
`makes
`mation
`more,
`savingf
`
`In man
`basic i·
`
`AHBLT-2013.008
`
`

`

`42
`
`2:How Web
`
`Works
`
`2. 6. J The no-cache Directive
`The no-cache directive notifies a cache that it cannot return a cached copy. Even if
`a fresh copy of the response-with a specific expiration time-is in the cache, the
`client's request must be foiwarded to the origin server. RFC 2616 calls such a
`request an "end-to-end validation" (Section 14.9.4). The no-cache directive is sent
`when you click on the Reload button on your browser. In an HTTP request, it
`looks like this:
`
`GET /index.html HITP/1.1
`Cache-control: no-cache
`
`Recall that the Cache-control header does not exist in the HTTP/1.0 standard.
`Instead, HTTP/1.0 clients use a Pragma header for the no-cache directive:
`
`Pragrna: no-cache
`
`no-cache is the only directive defined for the Pragma header in RFC 1945. For back(cid:173)
`wards compatibility, RFC 2616 also defines the Pragma header. In fact, many of the
`recent HTTP /1.1 browsers still use Pragma for the no-cache directive instead of the
`newer Cache-control.
`
`Note that the no-cache directive does not necessarily require the cache to purge its
`copy of the object. The client may generate a conditional request (with If-modi(cid:173)
`fied-since or another validator), in which case the origin server's response may
`be 304 (Not Modified). If, however, the server responds with 200 (OK), then the
`cache replaces the old object with the new one.
`
`The interaction between no-cache and If-modified-since is tricky and often the
`source of some confusion. Consider, for example, the following sequence of
`events:
`
`1. You are viewing an HTML page in your browser. This page is cached in your
`browser and was last modified on Friday, February 16, 2001, at 12:00:00.
`
`2. The page author replaces the current HTML page with an older, backup copy
`of the page, perhaps with this Unix command:
`
`mv index.html.old index.html
`
`Now there is a "new" version of the HTML page on the server, but it has an
`older modification timestamp.
`
`3. You try to reload the HTML page by using the Reload button. Your browser
`sends this request:
`
`GET http://www.foo.com/index.html
`Pragrna: no-cache
`If-Modified-Since: Fri, 16 Feb 2001 09:46:18 GMT
`
`2.6 Fore
`
`4. The
`play:
`
`You cou
`the "neV1
`
`If you a1
`ing on f,
`If you u
`Alternati
`prevents
`Note tha
`
`In additi
`browser
`single ir
`are <lisp
`tion" co
`external
`for you
`objects.
`
`Another
`When t
`retrieve
`course,
`request:
`ply mo'
`
`As a ca
`the no(cid:173)
`migh t t
`necess2
`ment f<
`these f<
`to get i
`If-mod:
`
`2.6.2
`The ma
`that th
`
`AHBLT-2013.009
`
`

`

`4. The origin server sends a 304 (Not Modified) response and your browser dis-
`plays the same page as before.
`
`You could click on Reload until your mouse wears out and you would never get
`the "new" HTML page. What can you do to see the correct page?
`
`If you are using Netscape Navigator, you can hold the Shift key down while click-
`on Reload. This instructs Netscape to leave out the If-modified-since header.
`If you use Internet Explorer, hold down the Ctr! key while clicking on Reload.
`Alternatively, you can flush your browser's cache and then press Reload, which
`prevents the browser from sending an If-modified-since header in its request.
`Note that this is a user-agent problem, not a caching proxy problem.
`
`In addition to the above problem, the Reload button, as implemented in most web
`browsers, leaves much to be desired. For example, it is not possible to reload a
`single inline image object. Similarly, it is not possible to reload web objects that
`are displayed externally from the browser, such as sound files and other "applica(cid:173)
`tion" content types. If you need to refresh an image, Postscript document, or other
`externally displayed object, you may need to ask the cache administrator to do it
`for you. Some caches may have a web form that allows you to refresh cache
`objects. For this you need to know (and type in) the object's full URL.
`
`Another problem with Reload is that it is often misused simply to rerequest a page.
`When the Web seems slow, we often interrupt a request as the page is being
`retrieved. To request the page again, you might use the Reload button. This, of
`course, sends the no-cache directive. Browsers do not have a button which
`requests a page again without sending no-cache. You can accomplish this by sim(cid:173)
`ply moving the cursor to the URL location box and pressing the Enter key.
`
`As a cache administrator, you might wonder if caches ever can, or should, ignore
`the no-cache directive. A person who keeps a close watch on bandwidth usage
`might have the impression that the Reload button gets used much more often than
`necessary. Some products, such as Squid, have features that provide special treat(cid:173)
`ment for no-cache requests. However, I personally do not recommend enabling
`these features because they violate the HTTP/1.1 protocol and leave users unable
`to get up-to-date information. One Squid option turns a no-cache request into an
`If-modified-since request. Another ignores the no-cache directive entirely.
`
`2. 6.2 The max-age Directive
`The max-age directive specifies in seconds the maximum age of a cached response
`that the client is willing to accept. Whereas no-cache means "I won't accept any
`
`AHBLT-2013.010
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket