`
`Reducing .150!“ mi? )“rqflic
`
`
`
`O’RE ILLY“
`
`Hum“! 11' kcm'rfi-i
`
`
`
`Web Caching
`
`Duane Wessels
`
`O'REILLY'
`Baffin-g - Cambridge - Fambam - Kdl'n - Parts -W - Taipei - Tatum
`
`
`
`mm
`by Dwain: W't'mls
`Cami-ism O 5001 O'Neiliy a Lwiams. Inc. All riglus mien-ed.
`Printed. in 1111- L‘nik'd Slam: of America.
`
`FLIhIL-ihcd by O'Hullly at Associates. Inc. 101 Morris Sum. Sclxmupul. CA SHAH.
`
`HID”: Nathan kainm and Paula. Fergus-on
`
`WHEN”: Luann: Clad“.- Snylcmcz
`
`Damn-Hymn Edit: Freudman
`
`Mutiny "Mani:
`Jum- ZDIJI=
`
`Firs: Edition.
`
`Nutshell Handbuuk. 1J.'|L' Nuxsl-le Kamila-ink logo. and the D'Rcilly Irma m mgislcnsd
`mdmmrks #01121.in 8; Wales. Inc. Many of the migmlians used by manufactums
`and sellers :odL-mnguish lhm'l products art- daimml as lrddtn’ladu. Where mm unsigriauans
`appear in [his funk. and fl'llrJlly & Associates. Inc. was swan: afa Iradmmrk ciaim, U11:
`dcfiignalions have Ewen printed in capr- ot lnilial caps. ‘l‘he- assoc'mmn bum-cm lllt.‘ image of
`:1 rock IJ'LI'II5h and WL‘I) caching LI:
`:1 mdmnnrk nf U'Ilcilly a. mink-s. Inc.
`
`W'hl'lc every pitcnulion has bcen taken Ln the prupalaflnn Dl'iJ'IJs hook, the publimw mules
`no responsibility for arms DI omlfilonsmr for damages muhm [mm m:- use Dime
`inf-unnamn mnuincd hen-in.
`
`may qr01w (2:myngin-Pubmwfau Dara
`Wmlfi. Duane.
`Web Caching-“Duane Wcsacls
`p. cm.
`HEN I-SfiWZ-fl‘G-K
`l. Cadu- mommy. 2. Bum-acts ((1.111qu woman“! 3. Software configuralbun
`management. ii. W'ulld Widc Web. I. Tlllr.
`137395.314 W45 21501
`004.521"ch
`
`1130193318
`
`ISBN: 1-56592-536-X
`[C]
`
`
`
`
`
`J50 amp-'9!- s.- trim-raw: Moods
`
`RN
`
`F
`
`Average ICP service tLrne From neighhor caches
`
`ICP hit nrlirr
`
`The average senior:- time wirhour 161’ is simply:
`5..
`
`The avctajrc serviw rims: with ICP is:
`Ps,.+t1— 959+ R"
`We are inrerEsrL-d in the case when:
`
`which reduces to.-
`
`P5fl+tl —r"l|.ii..+ R" a 5,,
`
`P:-
`
`
`R»
`is},- 5‘...
`In other m, Mien rhe- ICP hit rario is larger than 1hr: rarin of the {CP service
`time to the difference in the HTTP sewicr: times. then the benefits of the neighbor
`hits are greater than rhe costs of doing the ICP queries.
`II‘ the ICP hi: ratio is
`smaller than the: service time ratio. using ICP does not reduce the overall sen-ice
`time for user requests.
`.r'ilsrr note that want: pwplc war: caches primarily Fur saving handWidth. not for
`reducing service times.
`"l'her,’ pro‘rrabl}.r don't mind the slight extra r.lr5la.).r
`if it
`reduces their monthly bandwidth costs.
`
`8. 1.3.2 Bandwidth
`
`Speaking of bandwidth. you may be wondering how much of it ICP comumes. It
`is quire easy to calculate Ihr: addiiiunai Mind-width ICP uses. 111:: size or an ICP
`query.r is 24 bytes plus the length of the URL. The average URL length seems to be
`about 55 characters. so [CP queries average about 80 bytes. ICP replies are four
`bytes smaller than queries.
`
`Taken together. :in IC? queryrreply rransaoirm uses about [60 Iii-res per neighbor
`before including UL]? and IP headers. i‘. cuclrt: Will! 35 neighhcmi use!" 4190 bytes for
`each cache miss. To put this in perspective. consider that the average W'Eb object
`size is about
`It} KB. Thus, 3 cache with three [CP neighbors increases ils haml-
`widlli consumption by about 5% on average. Note that this corresponds to a men-
`SHWMtflI made at the tulle-'5 nerwmk interface. Whether ICP increases ymr wide
`area bandwidth depends on the location of yrmr ricighlmr caches.
`
`
`
`“it?
`
`chapter tr. Jflt‘rfl‘ac‘bl’ me
`
`To pnalic‘l hits in :l. neighimr. a cache needs It: knuw which ()l't'pects are stored
`them. A neighbor's cache contents can be represented] as a database or cliteuory.
`These directories must he exchanged between neighbors. By looking up objects in
`the directory. we can predict cache hits. Since a caches content changes over
`time, the directories must also be updated. A primary goal of our new protocol '15
`to malice the transmission and tfit'tl'dgl‘.‘ of this diretam' as efficient :I.‘i p-tm'i'tl’flc.
`
`Probably the worst we could do is use a plain list of cached. URLs. The aye-rage
`Ul-‘tL length is about ‘55 bytes. or IMO hits. 1110 URI. list is ptxn choice For a tlitec—
`ton' because URLs are an inefficient representation—they use only printable char—
`acters. and me sequences appear very frequently. in infrm‘natinn theory terms,
`Ul-lLs have .1 low amount of entropy. A. better option is tn ust: a. litn of hash values
`of the URLs. With hash values. we may have collisions where two or more URLs
`him-.- the same hash value. A ctilli-titws pniialtility is related to the hash value
`range and the number of L‘Iu'th in the list. Fora cache of about (Int: million alike-ts.
`we probably need 2-5. to 52 bits- pet object. That's a significant reduction in cliree-
`'tt'u‘y size imputed with 44E} hits per [infect in a URL list. Believe it ur not, we can
`do even better. With an algorithm known as a Blwmfilrtm storage is reduced to
`only a Few hits per ohiect.
`
`3.4. I Blown Fitters
`
`The Bit-Joni filter algorithm. first described. by Burton Bloom in 1WD. enctxles it set
`of input keys with the help of some hash Functions. Recall that a hash Function is
`simply an algmithrn that takes a chunk of data as input (the hey) and [Induce—s an
`integer as output. A good hash function has the charaneristic of being random.
`That is. given a sufficiently litl'gt: number of input keys. the output values should
`have an apmtently random distribution. Another hash function characteristic is the
`range of its output. Often. the range is expressed as a power 0F 2, such as 52 hits
`or 253. One nF the most often used hash Functions is call-L13 M05. which is shun Fur
`Meme-age Digest {Version 51 IRivest. 1992l. The MDS hash function (mtpuls lfiflrbit
`values and is used extensively in the current Cache Digest implementation.
`A Bin-nun Filter is defined by [Wu parameters. K independent hash Functions and an
`array of M hits. The hits of the filter are used It! efficiently encode :I collectt'un of N
`items. For a Cache Digest. the collection is the set of ohicd keys [Uliis 0r Uliifil
`stored in a cache. To construct .1 Bloom filter, we iterate over the items in the col—
`Lcuiun. Applying the K lush functions to each item gives us It hash ir'alLlL’ti. Each
`hash value represents a bit position in the filter that should be turned (In. If u hat—sh
`value is Luger than All we apply the modulus operator to keep it in range. Note
`that Msho'ttltl be less than the hash Function range; otherwise. stat-it: itiLs can never
`be turned on.
`
`