throbber

`
`Reducing Network Traffic
`
`O’RE ILLY*
`
`Duane Wessels
`
`

`

`Web Caching
`
`Duane Wessels
`
`O’REILLY*
`Beijing » Cambridge « Farnbam « Kdln « Parts « Sebastopol - Taipei « Tokyo
`
`

`

`Web Caching
`by Duane Wessels
`Copyright © 2001 C’Reilly & Associates, Inc. All rights reserved.
`Printed in the United States of America.
`
`Published by O'Reilly & Associates, Inc., 101 Morris Street, Sebastopol, CA 95472.
`
`Edifors: Sathan Torkingion and Paula Ferguson
`
`Production Editor: Leanne Clarke Soylemez
`
`Cover Designer: Edie Freedman
`
`Printing History:
`June 2001:
`
`First Edition.
`
`Nutshell Handbook, the Nutshell Handbook logo, and the O'Reilly logo are registered
`trademarks of O'Reilly & Associates, Inc. Manyof the designations used by manufacturers
`and sellers to distinguish their products are claimed as trademarks. Where those designations
`appearin this book, and O'Reilly & Associates, Inc. was aware of a trademark claim, the
`designations have been printed in caps or initial caps. The association berween the image of
`a sock thrush and web caching is a trademark of O'Reilly & Associates, Inc,
`
`While every precaution has been taken in the preparation of this book, the publisher assumes
`no responsibility for emors or omissions, or for damages resulting from the use of the
`information contained herein,
`
`Library ofCongress Cataloging-in-Publication Data
`Wessels, Duane,
`Web Caching/Duane Wessels
`p. cm.
`ISBN 1-56592-536-X.
`1. Cache memory. 2. Browsers (Computer programs) 3. Software configuration
`Management. 4. World Wide Web. L Title.
`TRTS95.M4 W45 2001
`004.5'3—de21
`
`2001035173
`
`ISBN; 1-56592-536-X
`[c]
`
`

`

`
`
`iso Chapter &Intercache Protocols
`
`Rr
`
`P
`
`Average ICP service time from neighbor caches
`
`ICP hit ratio
`
`The average service ime without ICP is simply:
`4
`
`The average service time with ICP is:
`P5,, +01 — PS, + Ra
`We are interested in the case when:
`
`which reduces to:
`
`PS, 401-5, + Ry < 5,
`
`P>
`
`
`Ry
`So7 3h
`In other words, when the ICP hit ratio is larger than the ratio of the ICP service
`time to the difference in the HTTP service times, then the benefits of the neighbor
`hits are greater than the costs of doing the ICP queries. If the ICP hit ratio is
`smaller than the service time ratio, using ICP does not reduce the overall service
`time for user requests.
`Also note that some people use caches primanily for saving bandwidth, not for
`reducing service times. They probably don't mind the slight extra delay if it
`reduces their monthly bandwidth costs.
`
`813.2 Bandwidth
`
`Speaking of bandwidth, you may be wondering how much ofit ICP consumes. It
`is quite easy to calculate the additional bandwidth ICP uses. The size of an ICP
`query is 24 bytes plus the length of the URL. The average URL length seems to be
`about 55 characters, so ICP queries average about 80 bytes. ICP replies are four
`bytes smaller than queries.
`
`Taken together, an ICP query/reply transaction uses about 160 bytes per neighbor
`before including UDP and IP headers. A cache with 3 neighbors uses 480 bytes for
`each cache miss. To put this in perspective, consider that the average web object
`size is about 10 KB. Thus, a cache with three [CP neighbors increases its band-
`width consumption by about 5% on average. Note that this corresponds to a mea-
`surement made at the cache’s network interface. Whether ICP increases your wide
`area bandwidth depends on the location of your neighbor caches.
`
`

`

`ot
`
`Chapter & Intercache Protocols
`
`To predict hits in a neighbor, a cache needs to know which objects ane stored
`there, A neighbor's cache contents can be represented as a database or directory.
`These directories must be exchanged between neighbors. By looking up objects in
`the directory, we can predict cache hits. Since a cache's content changes over
`time, the directories must also be updated. A primary goal of our new protecol is
`to make the transmission and storage of this directory as efficient as possible.
`
`Probably the worst we could do is use a plain list of cached URLs. The average
`URL length is about 55 bytes, or 440 bits. The URL list is poor choice for a direc-
`tory because URLs are an inefficient representation—they use only printable char-
`acters, and some sequences appear very frequently. In information theory terms,
`URLs have a low amount of entropy. A better option is to use a list of hash values
`of the URLs, With hash values, we may have collisions where two or more URLs
`have the sanie hash value. A collision's probability is related to the hash value
`range and the number of URLs in the list. Fora cache of about one million objects,
`we probably need 24 to 32 bits per object. That's a significant reduction in direc-
`tory size compared with 440 bits per object in a URL list. Believe it or not, we can
`do even better. With an algorithm known as a Bloom filter, storage is reduced to
`only a fewbits per object.
`
`&4.1 Bloom Filters
`
`The Bloom filter algorithm, first described by Burton Bloom in 1970, encodes a set
`of input keys with the help of some hash functions. Recall that a hash function is
`simply an algonthm that takes a chunk of data as input (the #ey) and produces an
`integer as output, A good hash function has the characteristic of being random.
`That is, given a sufficiently large number of input keys, the ourput values should
`have an apparently random distribution. Another hash function characteristic is the
`range of its output. Often, the range is expressed as a power of 2, such as 32 bits
`or 22, One of the most often used hash functions is called MD5, which is short for
`Message Digest (Version 5) [Rivest, 1992], The MD5 hash function outputs 128-bit
`values and is used extensively in the current Cache Digest implementation.
`A Bloomfilter is defined by two parameters, K independent hash functions and an
`array of M bits. The bits of the filter are used to efficiently encode a collection of Vv
`items, For a Cache Digest, the collection is the set of object keys (URIs or URLs)
`stored ina cache. To construct a Bloom filter, we iterate over the items in the col-
`lection. Applying the K hash functions to each item gives us A hash values. Each
`hash value represents a bit position in the filter that should be turned on, If a hash
`value is larger than AW, we apply the modulus operator to keep it in range. Note
`that M should be less than the hash function range; otherwise, some bits can never
`be turned on.
`
`

This document is available on Docket Alarm but you must sign up to view it.


Or .

Accessing this document will incur an additional charge of $.

After purchase, you can access this document again without charge.

Accept $ Charge
throbber

Still Working On It

This document is taking longer than usual to download. This can happen if we need to contact the court directly to obtain the document and their servers are running slowly.

Give it another minute or two to complete, and then try the refresh button.

throbber

A few More Minutes ... Still Working

It can take up to 5 minutes for us to download a document if the court servers are running slowly.

Thank you for your continued patience.

This document could not be displayed.

We could not find this document within its docket. Please go back to the docket page and check the link. If that does not work, go back to the docket and refresh it to pull the newest information.

Your account does not support viewing this document.

You need a Paid Account to view this document. Click here to change your account type.

Your account does not support viewing this document.

Set your membership status to view this document.

With a Docket Alarm membership, you'll get a whole lot more, including:

  • Up-to-date information for this case.
  • Email alerts whenever there is an update.
  • Full text search for other cases.
  • Get email alerts whenever a new case matches your search.

Become a Member

One Moment Please

The filing “” is large (MB) and is being downloaded.

Please refresh this page in a few minutes to see if the filing has been downloaded. The filing will also be emailed to you when the download completes.

Your document is on its way!

If you do not receive the document in five minutes, contact support at support@docketalarm.com.

Sealed Document

We are unable to display this document, it may be under a court ordered seal.

If you have proper credentials to access the file, you may proceed directly to the court's system using your government issued username and password.


Access Government Site

We are redirecting you
to a mobile optimized page.





Document Unreadable or Corrupt

Refresh this Document
Go to the Docket

We are unable to display this document.

Refresh this Document
Go to the Docket