`DeSimone et al.
`
`1111111111111111111111 111111111111111111111111111111111111111111111111111
`US005787470A
`[11] Patent Number:
`[451 Date of Patent:
`
`5,787,470
`Jul. 28, 1998
`
`[54]
`
`[75]
`
`INTER-CACHE PROTOCOL FOR
`IMPROVED WEB PERFORMANCE
`
`Inventors: Antonio DeSimone. Ocean; David
`Hilton Shur. Middletown; Sandeep
`Sibal. Matawan. all of N.J.
`
`[73] Assignee: AT&T Corp. Middletown, N.J.
`
`[21] Appl. No.: 733,485
`
`Oct. 18, 19%
`
`[22] Filed:
`Int. CI.6
`...................................................... G06F 12100
`[51]
`[52] U.S. Cl ...................... 7111124; 711/165; 395/200.48;
`395/200.56; 395/200.6; 395/200.68
`[58] Field of Search ..................................... 395/440. 445.
`395/446. 447. 448. 451. 468. 471. 621.
`200.53. 473. 608. 610, 200.3, 200.31. 200.32.
`200.43. 200.46, 200.48. 200.55. 200.6
`
`[56]
`
`References Cited
`
`U.S. PATENI' DOCUMENTS
`
`4,959,777
`5,222,224
`5,226,144
`5,452,447
`5,510,934
`
`9/1990 Holman, Jr. . ........................... 395/468
`6/1993 Flynn et al. . ........................... 395/471
`7/1993 Moriwaki et al ....................... 395/448
`9/1995 Nelson et al.
`. ......................... 395/621
`4/1996 Bremman et al ....................... 395/446
`
`4/1996 Boyles et al ....................... 395/200.53
`5,511,208
`5,586,298 1Vl996 Shah ........................................ 395/473
`5,604.882 Vl997 Hoover et al ........................... 395/448
`5,611,049
`3/1997 Pitts ........................................ 395/608
`5,623,656
`4/1997 Lyons ...................................... 395/610
`
`Primary Examiner-Tod R. Swann
`Assistant Examiner-Fred F. Tzeng
`Attorney, Agent, or Firm-Stephen M. Gurey
`[57]
`ABSTRACT
`
`On the Internet. different caches may contain copies of
`objects that have been copied from originating servers when
`they were accessed by users. Interconnected caches may
`have different objects stored thereon that might at some time
`be requested by a client terminal that is connected to a cache
`other than the one on which the object is stored. Rather than
`awaiting a request for a particular object and then querying
`each neighbor cache to determine whether a copy of the
`requested object is stored thereon. and then downloading the
`requested object if it is found. information about the contents
`of the neighbor caches is exchanged between these caches so
`that when a request for an object is received. the object can
`be retrieved from the cache in which it is stored. In the
`alternative. the object may be retrieved from the originating
`server if. for example, the object stored in a cache is stale
`based on the date and time it was last modified in the cache.
`
`12 Claims, 6 Drawing Sheets
`
`f 104
`INTERNET ACCESS
`SERVICE PROVIDER
`107
`
`-
`
`CACHE
`
`CACHE
`
`103
`
`108
`~
`
`0
`
`0
`
`109
`
`101
`
`102
`
`lZJ
`
`lZJ
`
`0
`
`0
`
`
`
`U.S. Patent
`
`Jul. 28, 1998
`
`Sheet 1of6
`
`5,787,470
`
`FIG. 1
`
`(ff)
`
`0
`
`101
`
`102
`
`; 104
`
`INTERNET ACCESS
`SERVICE PROVIDER
`107
`"-"
`
`CACHE
`
`CACHE
`
`103
`
`108
`
`( ~)
`
`0
`
`( ~) 109
`
`0
`
`
`
`U.S. Patent
`
`Jul. 28, 1998
`
`Sheet 2 of 6
`
`5,787,470
`
`FIG. 2
`
`REQUESTING
`CACHE
`(ROC)
`
`201
`
`RESPONDING
`CACHE
`(RSC)
`
`203
`
`202
`
`r-
`
`REQUEST:
`UST OF URLs:
`lurl -range l
`i IMS date ~
`
`"Rsc-cHECKs-i
`FOR MODIFIED :
`DOCUMENTS
`204
`1
`~--J.-~-~----------~
`r - - - - - - - - - t - - - - - - - i RESPONSE:
`UST OF URLs WITH
`RQC
`1
`: PARSES
`MODIFIED TIMES.
`: RESPONSE
`THEN
`I
`: DECIDES
`l WHAT TO
`GET
`I
`L _______ _
`
`205
`
`11-----'---~
`
`REQUEST:
`GET url HTTP/ 1.0
`
`RESPONSE:
`L-----1 [body of url]
`
`TIME
`
`TIME
`
`
`
`U.S. Patent
`
`Jul. 28, 1998
`
`Sheet 3 of 6
`
`5,787,470
`
`FIG. 3
`
`REQUESTING
`CACHE
`(ROC)
`
`301
`
`RESPONDING
`CACHE
`(RSC)
`
`303
`
`--~ __ __ ._ ___ ________ _J
`
`-----------,
`!
`
`l REGISTRATION
`..___ ____ _)
`
`302
`REGISTER REQUESTS FOR
`UPDATE EVENTS FOR A r-----(cid:173)
`RANGE OF URLs
`304
`
`ACKNOWLEDGE REQUEST
`
`305
`
`1 . - - - - - - - i UPDATE MESSAGE
`ON URL u:rl1
`REQUEST:
`GET url1 HTTP/ 1.0
`
`306
`
`307
`UPDATE MESSAGE
`L------1 ON URL url2
`
`---------:--,
`URL urlt UPDATED 1
`
`....,_ ____ _)
`
`---------:--,
`URL url2 UPDATED1
`
`308
`
`REQUEST:
`GET url3 HTIP /1.0
`
`309
`
`TIME
`
`---------:--,
`URL url3 UPDATED1
`- - - - __ _J
`
`~---------1
`
`UPDATE MESSAGE
`ON URL url3
`
`TIME
`
`
`
`U.S. Patent
`
`Jul. 28, 1998
`
`Sheet 4 of 6
`
`5,787,470
`
`FIG. 4
`
`CACHE 1
`
`401
`
`403
`
`CACHE 2
`
`402
`
`MESSAGE: SET OF
`MODIFIED URLs
`
`1
`
`:
`
`1
`
`----------,
`UPDATE STATE-
`INFORMATION ON
`CACHE 1. SEND
`WHATS NEW THAT i
`IS NEWER THAN
`:
`WHAT CACHE 1
`HAS BASED ON THE:
`STATE-INfORMATION:
`FOR CACHE 1
`404
`1
`....----~~---~-------------~
`MESSAGE: S T OF
`MODIFIED URLs
`
`r----------
`1 UPDATE STATE-
`: INFORMATION ON
`1 CACHE 2. SEND
`J WHATS NEW THAT
`: IS NEWER THAN
`1 WHAT CACHE 2
`I HAS BASED ON THE
`: STATE-INFORMATION
`405
`1 FOR CACHE 2
`L----------r--~~~...._~
`MESSAGE: SET OF
`MODIFIED URLs
`
`1
`
`1
`
`-----------,
`UPDATE STATE-
`INFORMATION ON :
`•
`I
`I
`I
`I
`I
`__________ _J
`
`TIME
`
`TIME-
`
`
`
`U.S. Patent
`
`FIG. 5
`
`Jul. 28, 1998
`
`Sheet 5 of 6
`
`5,787,470
`
`CACHE 1
`
`501
`
`503
`
`MESSAGE: SET OF
`MODIFIED URLs
`
`CACHE 2
`
`502
`
`1
`
`-----------1
`l
`UPDATE STATE-
`INFOR~iA TION OF
`CACHING SYSTE~. :
`SEND WHATS NEW 1
`ON CACHING
`1
`SYSTEM UNLESS
`:
`CACHE 1 HAS THEI
`l
`MOST R£CENT
`VERSION.
`504
`I
`---J------'------------J
`MESSAG : SET OF
`MODIFIED URLs
`
`,----------
`UPDATE STATE(cid:173)
`INFORMATION OF
`CACHING SYSTEM.
`SEND WHATS NEW
`ON CACHING
`SYSTEM UNLESS
`CACHE 2 HAS THE
`SOS
`1 MOST RECENT
`I VERSION.
`L----------,__ __ __.._
`MESSAGE: SET OF
`MODIFIED URLs
`
`1
`
`-----------,
`UPDATE STATE-
`INFORMATION ON i
`
`o
`
`I
`I
`o
`I
`•
`I
`I
`__________ _J
`
`TIME
`
`TIME
`
`
`
`FIG. 6
`
`601
`
`RESPONDING
`CACHE(RSC)
`
`602
`
`603
`
`REQUEST:
`CONTENTS * HTIP / 1.0
`Accept: application/ www-contents
`·r-RSC-FINDS ALC i
`If-Modified-Since: Sat, 29 Oct 1996 19:43:31 GMT
`OBJECTS MATCHINGi
`Range: http://www.nytimes.com/*
`REQUEST
`I
`.----------~-----+- __________ J
`RESPONSE: 201 O.K.
`- . -1 - - - - - - i Content-Type: application/www-contents
`#Version: 1.0
`#Syntax: last-Modified CRLF URL SP Content-Length
`Sat, 29 Oct 1996 19:54:02 GMT
`http://www.nytimes.com/index.html 575
`Sat, 29 Oct 1996 19:56:34 GMT
`http://www.nytimes.com/info/textpath.html 4096
`
`604-
`
`r--RQC-
`l PARSES
`: RESPONSE
`:
`THEN
`1 CHOOSES
`: WHICH
`l OBJECT
`I TO GET
`L------·~-----------------.
`REQUEST:
`GET http://www.nytimes.com/info/textpath.html HTTP /1.0
`
`TIME
`
`605
`
`TIME
`
`Cj
`• rJJ.
`•
`
`~ = """" ~ a
`
`""'" = :-
`~ ....
`~
`QO
`
`g3 a
`
`~
`
`s,
`
`~
`
`01
`.,..
`.......
`QC
`.......
`.,..
`
`~ ....... =
`
`
`
`5.787.470
`
`1
`INTER-CACHE PROTOCOL FOR
`IMPROVED WEB PERFORMANCE
`
`Cross Reference to Related Applications
`
`This application relates to subject matter described in
`co-pending U.S. Pat. Application Ser. No.08n33/486. filed
`simultaneously herewith. for Antonio DeSimone and Sand(cid:173)
`eep Sibal. co-inventors herein. and assigned to the assignee
`hereof.
`
`TECHNICAL FIELD
`
`This invention relates to data communications and com(cid:173)
`puter networking. and more particularly. to the transfer of
`digital information on packet data networks such as the
`Internet. between caches.
`
`BACKGROUND OF THE INVENTION
`
`In a transaction on the World Wide Web between a client
`terminal and a Web server in which the client terminal
`retrieves a Web object from a server connected on the
`Internet. the client terminal normally accesses the Internet
`through an Internet Access Service Provider (IASP). Such
`an object may be one or more pages of textual information, 25
`a picture. a sound clip, a video clip, a JAVA applet or other
`software. any combination or the former. or anything that is
`capable of being transmitted digitally over the Internet to a
`client terminal. The term "object" will be use hereinafter to
`include all of the foregoing. A cache. located within the 30
`IASP network, functions as an intermediary in transactions
`involving the retrieval of such Web objects from servers by
`a client terminal. In particular, in its simplest form. a cache
`within the IASP saves a copy of a retrieved object for itself
`when the object is moved from the server to the requesting 35
`client terminal. This caching operation is transparent to the
`user and, under normal circumstances. does not incur any
`significant delay due to the copying operation which is
`performed simultaneously as the object is retrieved from the
`server and delivered to the client terminal.
`Advantageously. the cache within the IASP network can
`satisfy subsequent requests for those objects that are stored
`therein, thereby obviating the necessity of retrieving the
`object from the originating server on the Internet. This
`reduces the delay as perceived by the user to access the 45
`object and further, saves bandwidth on links that connect the
`IASP network to the Internet. FIG. 1 is a block diagram of
`a prior art network in which plural client terminals. such as
`101 and 102. are connected to a cache 103 within IASP 104.
`Cache 103, in turn. is connected to a server 105 connected 50
`to the Internet 106. By storing an object from server 105 in
`cache 103 when it is first retrieved by client 101, subsequent
`requests for that same object by client 101, or any another
`client connected to cache 103 within IASP 104, such as
`client 102, can be satisfied directly from cache 103. IASP 55
`104 likely also includes a second or more caches, such as
`cache 107. which serves other clients connected to that same
`IASP, such as 108 and 109. The objects stored in this second
`cache 107, or other caches within IASP 104, but not shown,
`could be used to serve the clients attached to the first cache 60
`103, if the caches communicate with each other.
`Generally, in the prior art when a request for an object is
`received, discovery of objects on other than the cache to
`which the requesting client is attached is done by explicit
`querying at the time when the request for a particular object
`is made. That is, whenever a request from a client cannot be
`satisfied by the cache to which the client is connected, a
`
`2
`plurality of requests are sent out to neighboring caches
`asking them whether they have a copy of the desired object
`stored in their memory.
`The steps of querying a plurality of caches. waiting for a
`5 reply. and then downloading a copy of the object from the
`cache that has the object or retrieving a copy of the object
`from the originating server of the object if the object cannot
`be found in a neighboring cache. may impart a delay to the
`retrieval process that is unacceptable to the requesting user.
`10 Furthermore, the flooding of requests to all neighboring
`caches in response to each request for an object can be a
`wasteful use of network bandwidth. as well as draining the
`caches' computing resources. Even furthermore, the copy of
`an object as it exists on a neighboring cache may differ from
`15 the object as it then exists in the server from which it
`originated. This will result when the object in the server is
`modified after the initial request for the object was made and
`a copy of the object stored in the neighboring cache. Thus,
`the object available to a requesting cache from a responding
`20 cache may not be current and may be a stale or outdated
`version of the object as it currently exists in the server. and
`thus not suitable to be supplied to a client requesting that
`object.
`
`SUMMARY OF THE INVENTION
`
`In accordance with the present invention. a cache is
`provided information about what objects its neighboring
`cache carries. and other information about those objects in
`the neighboring cache. such as the times these objects were
`modified in the neighboring cache. the type of content of
`these objects. and the sizes of these objects. The latter
`information may be useful for planning disk space usage.
`This information gathering process is distinct from the
`actual retrieval of an object from a neighboring cache. or in
`the alternative. from the actual server carrying the most
`recent version of the object, and is performed asynchronous
`to a request from any client for the object.
`Web caches update each other about Web objects in their
`40 cache through various mechanisms. In a first mechanism. a
`requesting cache queries a responding cache for information
`about all or a subset of objects in the responding cache that
`have been modified since a given date and time. "Modified"
`as used herein includes objects that may have been updated
`or created in that cache since the given time. In a second
`mechanism. a responding cache informs the requesting
`cache periodically about modifications to objects in which
`that the requesting cache had previously express interest. In
`third and forth more intelligent mechanisms. neighboring
`caches maintain state-information on what objects the other
`caches have based on the messages previously sent to it. In
`particular, in the third mechanism each cache maintains
`state-information on objects in every neighbor cache with
`which it communicates. The fourth mechanism is applicable
`to the case where numerous peering caches exist. and each
`of which is interested in each other cache's content. Infor-
`mation that a cache has collected about the content of other
`caches is then used in together with the content of its own
`cache for compiling the messages sent to any given cache.
`
`BRIEF DESCRIPTION OF THE DRAWINGS
`FIG. 1 is a block diagram of a prior art network showing
`two caches connected within an Internet Access Service
`Provider, interconnecting client terminals and a server
`65 within the Internet;
`FIG. 2 shows the interactions between a requesting cache
`and a responding cache along a time-line in accordance with
`
`
`
`5.787.470
`
`3
`a requesting cache-initiated notification mechanism of the
`present invention;
`FIG. 3 shows the interactions between a requesting cache
`and a responding cache along a time-line in accordance with
`a responding cache-initiated notification mechanism of the 5
`present invention;
`FIG. 4 shows the interaction between two caches in which
`each cache maintains separate state-information for each of
`the other neighbor caches with which it communicates;
`FIG. S shows the interaction between two caches in which 10
`each cache maintains single-state information for the entire
`caching system; and
`FIG. 6 shows the message interaction between a request(cid:173)
`ing cache and a responding cache for a specific irnplemen- 15
`tation of a requesting cache-initiated mechanism in accor(cid:173)
`dance with the present invention.
`
`DEfAlLED DESCRIPTION
`As noted. the present invention is a mechanism by which
`Web caches update each other about what Web objects are
`in their cache. Update notification is distinct from actually
`sending the modified Web objects. The notification typically
`involves information from which the receiving cache can
`determine what Web objects are stored in the neighboring
`cache. but also information such as the time when the object
`in the neighboring cache was last modified. Optionally, the
`notification may contain other information that might be
`useful to the requesting cache such as the size, in bytes, of
`the object to enable the requesting cache to plan its disk
`space usage. and the type of contents of the object.
`To distinguish between the two caches involved in the
`transfer of information about the objects stored in each. the
`cache that desires information about the contents of its
`neighbor cache is referred to hereinafter at the Requesting 35
`Cache (RQC), and the neighbor cache that sends out infor(cid:173)
`mation about what it stores is referred to as the Responding
`Cache (RSC). The mechanism of the present invention is
`operable in an environment where some caches may only act
`as RQCs and others only as RSCs. Specifically in a hierar- 40
`chical topology. a situation can be envisioned where caches
`lower in the hierarchy only request information. and those
`higher up in the hierarchy only respond. "Higher" is used in
`the sense, that the link to the Internet is at the highest level,
`and the Web clients or browsers are at the lowest level. Most 45
`popular commercial browsers. have caches co-located. Such
`a client-cache would typically always be a RQC. It may also
`function as an RSC, if neighboring browsers have the
`permission to "look" at the private caches of other browsers.
`FIG. 2 illustrates the mechanism for a RQC-initiated 50
`notification. In this mechanism the RQC 201 initiates a
`request 202 to the RSC 203 to inform it about all or a subset
`of objects that are stored in RSC 203 and have been updated
`within RSC 203 since a given date and time or are new to
`RSC 203. As previously noted, the term "modified since a 55
`given date and time" as applied to objects shall be used to
`mean those objects that are new to the RSC since that time
`and date, as well as those objects that have been updated
`since that time and date. The address list of URLs in the
`request may include objects with individual URLs, or a 60
`plurality of objects within a range of URLs using a wildcard
`symbol to represent all those objects whose URLs share
`address commonality. As noted. this request may include a
`request for information about both updates to existing
`objects previously stored in RSC 203 as well as objects that 65
`have been newly added in RSC 203. The request 202
`includes an "if modified since" (IMS) date, thereby indicat-
`
`4
`ing to the RSC that information about objects is requested
`only if the object has been modified in the cache or newly
`added to the cache since that IMS date.
`In response to request 202, RSC 203 checks those listed
`URLs to determine whether the object has in fact been
`modified in the cache since the IMS date. The response 204
`thereto is a list of those URLs that in fact have been modified
`in the RSC since the IMS date, together with the times at
`which those URLs were modified. This information may
`also include the size of the requested URL object, as well as
`other information about the object, such as the type of
`content of the object. The RQC 201 then parses the response
`and decides. using the retrieved information about those
`requested URLs, specifically which URLs it desires to
`download to itself. This downloading can in fact be done on
`its own. based on its own set of rules, or may be done at a
`later time in response to an actual request for that object by
`a user who is connected to RQC 201. H. either on its own,
`or in response to a request from a user, RQC decides to
`20 retrieve the object with
`URL url, it makes a request 205 to GEi' the object with
`URL url using the HyperText Transfer Protocol (lfITP),
`which is the predominant World Wide Web Internet proto(cid:173)
`col. The RSC 203. in reply to that GEi' request, sends a
`25 response 206 back to RQC 201 that includes a copy of the
`body of the object with URL url that is has stored.
`FIG. 3 illustrates a mechanism for an RSC-initiated
`scheme in which the RSC tells the RQC every so often about
`changes to objects in which the RSC had previously
`30 expressed interest. In a first mode, information about objects
`that have been modified is transmitted to the RSC on a
`periodic basis. In a second mode. information is sent when(cid:173)
`ever a requested object in fact is updated or created in the
`RQC. Both modes can co-exist. In the second mode, the
`RQC 301 first registers a request 302 with RSC 303 for
`update events for a range of URLs. The RSC 303 then
`registers those requests and transmits an acknowledgment
`304 back to RQC 301. After registration, RSC 303 transmits
`an update message back to RQC 301 whenever one of those
`registered URLs is modified. Thus, when URL url 1 is
`modified, a message 305 containing information about the
`object is transmitted to the RQC 301. In response thereto.
`RQC 301 may decide to make a request 306 to GEi' urll
`using the lfITP/1.0protocol. Alternatively, in response to an
`information message that a particular URL has been
`modified, the RQC 301 may decide not to download it. Thus,
`as noted in FIG. 3, when URL url2 is modified and an update
`message 307 on URL url2 is transmitted to RQC 301, the
`RQC decides not to make a request to download the modi(cid:173)
`fied url2. In a similar manner, when URL url3 is modified
`and a message transmitted to RQC 301. the RQC makes a
`request to download the modified object.
`The motivation of the present invention for simply send(cid:173)
`ing notification messages instead of the Web objects them(cid:173)
`selves is twofold. Firstly, it enables a sense of cache coher(cid:173)
`ency even if the RQC may not have space left on its system
`to copy the Web object itself. Since this is done asynchro(cid:173)
`nous to user requests, and not as a consequence to a request,
`caches know ahead of time what other caches carry, and
`therefore can save delay as perceived by the user (by
`preventing fruitless queries to neighbor caches), as well as
`network and cache resources. Secondly, the invention per(cid:173)
`mits a logical separation between information regarding the
`modification time of an object, and the content of the object
`itself. A cache can therefore choose what objects it would
`like to refresh or cache, before it downloads them. Since the
`information carried can include other aspects of the object,
`
`
`
`5,787.470
`
`5
`such as the size in bytes. the cache is better prepared before
`downloading the object. For example, the size of a particular
`object may be very large and the cache may choose not to
`download it due to insufficient storage capacity.
`Following the exchange of information about the Web
`objects stored in neighboring caches, the new or modified
`Web objects can be retrieved either individually or as a batch
`using multipart messages, Keep-Alive connections, or
`compression. or any other scheme.
`As previously noted. documents that have been updated at
`a cache since a given date and time are with respect to
`activities at the cache. not at the server from which the object
`originates. Thus. a document that has been modified at a
`cache after time tl. might well have been modified at the
`server at a time t2<tl.
`In the previously described RSC-initiated mechanism of
`FIG. 3, every so often the RSC sends to the RQC informa(cid:173)
`tion on all objects it has in its cache that have been modified
`or created at the cache since the last message it had sent to
`the RQC in the previous epoch. This mechanism is extended
`so that the RQC automatically responds by sending back to
`the RSC another message which has a list of all objects it has
`it its cache that have been modified since the last interaction
`it had with the RSC in the previous epoch. This combined
`mechanism is similar to the mechanism of FIG. 3. but is one
`in which both neighbor caches become both requesting and
`responding caches. In this mechanism. the request becomes
`implicit in every response. Since the difference between a
`"request" and a ''response" is now blurred, the term "mes(cid:173)
`sage" will be used instead.
`A more intelligent scenario can be considered. Here,
`caches maintain state-information on what other caches
`have, partly based on the messages sent to it. Once this
`state-information is available, messages sent back and forth
`can be further streamlined. Specifically. it is useless to send
`messages about updated objects at one's cache to a neighbor
`for which is known. from the neighbor's previous message.
`has a more recent or equally recent version of the object.
`This third mechanism, shown in FIG. 4, is only viable
`between caches that can act as both RQCs and RSCs. This
`mechanism is particularly designed for use for the case when
`peering caches within an IASP are interested in knowing
`about all the modified objects on all the caches. While this
`mechanism requires each cache to maintain some state
`information about what is on its neighbors's caches, it
`significantly reduces communication overhead, especially
`when the number of objects in each cache is large.
`In the mechanism of FIG. 4, cache 401 sends neighbor
`cache 402 a message 403 containing information about a set
`of URLs. When cache 402 receives that message, it updates
`the state-information it has accumulated on its neighbor
`cache 401. Cache 402 then, based on the state-information
`that cache 402 has on cache 401, sends a message 404 that
`contains information about a set of modified URLs that is
`newer than what cache 401 has. When message 404 is
`received by cache 401, the state-information cache 401 has
`on cache 402 is updated and. based on that state information,
`information about URLs that is newer than what cache 402
`has is sent to cache 402 in message 405. This algorithm is
`repeated at caches 401 and 402 in response to each received
`message from the other.
`In this mechanism. each cache maintains state(cid:173)
`information (as a table) on every neighbor cache with which
`it communicates. Each table is essentially a list of URLs
`with their modification times at the cache. Each item in the
`list is thus a 2-tuple of the form: (url;, ti) where ti is the
`
`6
`modified time of the object identified by URL url;. It should
`be noted that there may exist a document at the original
`server/site that is newer than what the cache has, with a
`modification time later than t;. However, the table simply
`5 lists the most recent document that the neighboring cache
`has based on knowledge built up from messages that the
`neighboring cache has issued.
`The table is updated as follows. Every message from the
`neighboring cache comprises a lists of URLs with associated
`10 modification times. For each item (url. t) the following
`operation is used. Let (url;. t;) be an entry in the table. If
`url=url1: replace (url;. t1) by (url. t) in the table. If a URL that
`the neighbor has been purged from the cache. that URL is
`also sent, with t set to a negative value (indicating that it has
`15 been purged). Let the list of new URLs (since one's last
`response to the neighboring cache) stored in one's own
`cache consist of a set of 2-tuples in the form (urlx, tx). Each
`of these is sent in the response message unless url_.=url1and
`t_.<=ti. In other words, this means. send (url_.. tx). unless the
`20 table for cache i. represented by (url;. t1) suggests that cache
`i has an equally or more recent version of the document.
`The previously described mechanism of FIG. 4 can be
`further streamlined for the case of an IASPwhere numerous
`peering caches exits. each of which are interested in every
`25 other cache'S s content. In this mechanism. illustrated in
`FIG. 5, information that a cache has collected about content
`on other caches is also used in addition to the content on its
`own cache in compiling messages to any given cache. This
`mechanism further reduces overhead because each cache
`30 does not have to directly exchange messages with all other
`caches. It also needs to only maintain a single state for the
`entire caching system instead of separately maintaining the
`state for each peering cache; the sum of state information on
`all caches being much larger than the state of the entire
`35 caching system. Theoretically, as long as the set of caches
`have direct or indirect connections, this mechanism will be
`able to diffuse information on all caches correctly. In graph(cid:173)
`theoretic terms, as long as the inter-cache communication
`channels link up the caches in a "tree", this mechanism will
`40 work. To prevent against failure of caches and communica(cid:173)
`tion channels, and to reduce end-to-end communication
`delays, a "2-connected" or "3-connected" tree might be
`more appropriated. where a "k-connected" tree is one where,
`if any (k-1) of its links are destroyed. the set of nodes still
`45 remains connected.
`Since this mechanism diffuses information on other
`caches in its messages, the messages transmitted between
`caches need to be expanded from the 2-tuples of the previ(cid:173)
`ously described mechanism. to a 3-tuple of the form (url1, ti.
`50 c1), where c1 is the cache which actually has stored the object
`referenced by url1, and where, as before, t1 is the modified
`time of the object url1• In those cases where multiple caches
`have equally recent objects, the last item in the 3-tuple is
`itself a set of caches. In FIG. 5, every time a message (503
`55 or 504) from either cache 501 or 502 arrives at the other
`consisting of a list of items of the form (url, t, c). if url =url;
`and t>t;. replace (url;. t;. c;) by (url. t, c) in the state table.
`However, if url=url; and t=ti. append c to the set c1 as well,
`if c is not already present in the set c1• Let the list of new
`60 URLs (since one's last response to the neighboring cache)
`one knows about consist of a set of 3-tuples of the form (url.r
`t,.. C,,). Each of these are sent in the response message unless
`Cxec;. Thus, as noted in FIG. 5, the message sent to a
`neighbor cache in response to receiving a set of modified
`65 URLs from that neighbor cache is a list of URLs that are that
`new on the caching system unless the neighbor to which the
`response is directed has the most recent version of the URL.
`
`
`
`5.787.470
`
`l5
`
`8
`7
`to the RSC to specify the format, although the format
`The above-described mechanisms can be implemented in
`definition itself needs to have a specific syntax. The format
`various ways. A possible implementation of the RQC-
`of the contents file contains a sequence of lines containing
`initiated mechanism is described below. This mechanism is
`ASCII characters terminated by either the sequence LF (line
`realized by defining a new method called CONTENTS to the
`Hypeffext Transfer Protocol (lfITP). Alternate designs are 5 feed) or CRLF. Content file generators should follow the line
`possible. that may use a separate protocol suite, outside of
`termination convention for the platform on which they are
`lfITP.
`executed. Each line may contain either a directive or a (part
`FIG. 6 illustrates the protocol mechanism for a RQC-
`of a) entry. Entries consist of a sequence of fields relating to
`a single lfITP object. If a field is unused in a particular
`initiated notification mode between RQC 601 and RSC 602.
`In the Request made to the RSC for information about 10 entry. "-" marks the omitted field. Directives provide infor-
`specific URLs. the HITP/1.0 and HITP/1.1 syntax for the
`mation about the version, as well as header fields of the
`Request-Line is:
`objects that follow. Lines beginning with the # character
`Request-Line=Method SP (space) Request-URL SP
`contain directives. The following directives are defined:
`lfITP-Version CRLF (Carriage Return Line Feed)
`*Version: <integer>.<integer>
`and the syntax for the Request-URL is:
`The version of the extended log file format used.
`Request-URL="*"labsoluteURL I abs-path
`Syntax: [<specifier> ... ]
`This does not permit expressing complete set of URLs.
`Specifies the fields recorded in the Jog. The strings SP and
`"*"is therefore chosen as the Request-URL, which fortu-
`CRLF have special meaning.
`itously means that by default the request pertains to all of the
`Remark: <text>
`contents of the serving cache or RSC. The Request-Line is 20
`Comment infonnation. Data recorded in this field should
`thus:
`be ignored by analysis tools.
`Request-Line="CONTENfS" SP"*" SP lfITP-Version
`The directives Version and Syntax are required. The Syntax
`CRLF
`directive may appear multiple times. with the understanding
`that all entries obey the Syntax directive that is above and
`The If-Modified-Since field in the request header is used
`to specify that only those content changes that took place 25 closest to them. The Syntax directive specifies the data
`after the date specified by the If-Modified-Since field are of
`recorded in the fields of each entry.
`In the response message 604. the line "201 0.K." indi-
`interest. This field is a departure from the way this field is
`normally used in the GEf method. In the latter, the IMS field
`cates that the request was understood and that a valid
`specifies that the document is of interest only if the actual
`response follows. Content-Type on the next line indicates
`modification time is more recent than the IMS field. Here, 30 that the a special document with a certain syntax follows that
`the IMS field is used to indicate interest in the document
`is not just textual in nature. The directive #Version defines
`only if the document has been modified at the cache since
`the type of syntax, specifically ve