`BOrder et al.
`
`(10) Patent No.:
`(45) Date of Patent:
`
`US 6,795,848 B1
`Sep. 21, 2004
`
`USOO6795848B1
`
`(54) SYSTEM AND METHOD OF READING
`AHEAD OF OBJECTS FOR DELIVERY TO
`AN HTTP PROXY SERVER
`
`(75) Inventors: John Border, Germantown, MD (US);
`Douglas Dillon, Gaithersburg, MD
`(US); Matt Butehorn, Mt. Airy, MD
`(US)
`(73) Assignee: Hughes Electronics Corporation, El
`Segundo, CA (US)
`Subject to any disclaimer, the term of this
`patent is extended or adjusted under 35
`U.S.C. 154(b) by 765 days.
`
`(*) Notice:
`
`(21) Appl. No.: 09/708,134
`(22) Filed:
`Nov. 8, 2000
`(51) Int. Cl. .............................................. G06F 15/167
`(52) U.S. Cl. ..................
`709/213; 709/203; 709/201
`(58) Field of Search ................................. 709/201-203,
`709/238, 225, 226, 229, 213; 711/117,
`118, 122, 137
`
`(56)
`
`References Cited
`U.S. PATENT DOCUMENTS
`
`5,995,725 A * 11/1999 Dillon ........................ 709/203
`6,112,228 A
`8/2000 Earl et al. ................... 709/205
`6,226,635 B1 * 5/2001 Katariya ........................ 707/4
`6,282,542 B1 * 8/2001 Carneal et al. .....
`... 707/10
`6,442,651 B2
`8/2002 Crow et al. ................. 711/118
`6,658,463 B1 12/2003 Dillon et al. ............... 709/219
`FOREIGN PATENT DOCUMENTS
`
`DE
`WO
`WO
`
`198 21 876 A 11/1999
`WO 98/53410
`11/1998
`WO 99/08429
`2/1999
`
`OTHER PUBLICATIONS
`Luotonen A: “Web Proxy Servers', Web Proxy Servers, XX,
`XX, 1998 pp. 156–170, XP002928629 p. 170, line 12–p.
`170, line 40.
`
`Ari Loutonen, Web Proxy Servers, XP002928.629, pp.
`156-170, Prentice Hall PTR, Upper Saddle River, NJ.
`H. Inoue, et al., “An Adaptive WWW Cache Mechanism in
`the AI3 Network”, Proceedings of the INET'97, www.iso
`c.org/inet 97/proceedings/A1/A1 2.HTM, pp. 1-5.
`Y. Zhang, et al., “HBX High Bandwidth X for Satellite
`Internetworking", 10" Annual X Technical Conference, San
`Jose, CA, Feb. 12-14, 1996 (The X Resource, Issue 17, pp.
`85-94), pp. 1-10.
`T. Baba, et al., “AI Satellite Internet Infrastructure and the
`Deployment in Asia”, IEICE Trans. Commun., vol. E84-B,
`No. 8, Aug. 2001, pp. 2048-2057.
`H. Inoue, “An Adaptive WWW Cache Mechanism in the
`AI3 Network”, INET’97 (Jun. 24–27, 1997), www.ai3.net/
`pub/inet 97/cache ppt/foils.html, pp. 1-9.
`
`* cited by examiner
`
`Primary Examiner Mehmet B. Geckil
`(74) Attorney, Agent, or Firm John T. Whelan
`(57)
`ABSTRACT
`
`A communication System for retrieving web content is
`disclosed. A downstream proxy server receives a URL
`request message from a web browser, in which the URL
`request message Specifies a URL content that has an embed
`ded object. An upstream proxy Server receives the URL
`request message from the downstream proxy server. The
`upstream proxy server Selectively forwards the URL request
`message to a web server and receives the URL content from
`the Web Server, wherein the upstream proxy server forwards
`the URL content to the downstream proxy server and parses
`the URL content to obtain the embedded object prior to
`receiving a corresponding embedded object request message
`initiated by the web browser.
`
`4 Claims, 7 Drawing Sheets
`
`407
`
`STREAM
`SERWER
`
`SATELLE
`NETWORK
`409
`
`
`
`URLCB.
`414 -1 CACHE
`
`NSO,
`URL
`CACHE
`
`43
`
`13
`
`Wes
`SERVER
`
`400
`
`401
`
`DOWN
`STREAM
`SERVER
`
`URL. B.J.
`CACHE
`
`dUtstan.
`REC, TABLE
`
`- 405
`
`403
`
`c
`
`- ROWSEr
`
`101 -
`
`103
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1012
`Page 1 of 15
`
`
`
`U.S. Patent
`
`Sep. 21, 2004
`
`Sheet 1 of 7
`
`US 6,795,848 B1
`
`
`
`
`
`
`
`
`
`(LENAJELNI “?’E)
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1012
`Page 2 of 15
`
`
`
`U.S. Patent
`
`Sep. 21, 2004
`
`Sheet 2 of 7
`
`US 6,795,848 B1
`
`
`
`
`
`TWILH ILES)(Q)
`
`S LOETEO
`
`GJEGJOJEIEWE
`
`(9)S LOETEO
`
`CIENCICIE|8|WE LE|5)
`
`
`
`
`
`C1ECIOJE EVNE LE5)
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1012
`Page 3 of 15
`
`
`
`U.S. Patent
`
`Sep. 21, 2004
`
`Sheet 3 of 7
`
`US 6,795,848 B1
`
`1?NHB1N11?NHELNIIHNHHINI
`
`909
`
`
`
`XHOM LHN»HOMIEN×JONALEN
`
`×JONALEN
`
`
`
`
`
`
`
`909
`
`(allº)
`
`009
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1012
`Page 4 of 15
`
`
`
`U.S. Patent
`
`US 6,795,848 B1
`
`007
`
`| 0 ||
`
`
`
`
`
`| || 7
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1012
`Page 5 of 15
`
`
`
`U.S. Patent
`
`Sep. 21, 2004
`
`Sheet 5 of 7
`
`US 6,795,848 B1
`
`
`
`s
`
`gg
`
`:
`
`s
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1012
`Page 6 of 15
`
`
`
`U.S. Patent
`
`Sep. 21, 2004
`
`Sheet 6 of 7
`
`US 6,795,848 B1
`
`c
`co
`C
`
`f
`O
`s
`O
`
`N
`
`8
`
`g
`s
`
`-
`2
`
`-
`
`Y
`
`s
`Of
`
`?y
`
`d
`Co
`O
`
`g
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1012
`Page 7 of 15
`
`
`
`U.S. Patent
`
`Sep. 21, 2004
`
`Sheet 7 of 7
`
`US 6,795,848 B1
`
`
`
`
`
`TO}}_LNOO || 2 / | /
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1012
`Page 8 of 15
`
`
`
`1
`SYSTEMAND METHOD OF READING
`AHEAD OF OBJECTS FOR DELIVERY TO
`AN HTTP PROXY SERVER
`
`CROSS-REFERENCES TO RELATED
`APPLICATION
`This application is related to co-pending U.S. patent
`application Ser. No. 09/498,936, filed Feb. 4, 2000, entitled
`“Satellite Multicast Performance Enhancing Multicast
`HTTP Proxy System and Method,” the entirety of which is
`incorporated herein by reference.
`
`BACKGROUND OF THE INVENTION
`
`1. Field of the Invention
`The present invention relates to a communication System,
`and is more particularly related to retrieving web content
`using proxy servers.
`2. Discussion of the Background
`AS businesses and Society, in general, become increas
`ingly reliant on communication networks to conduct a
`variety of activities, ranging from business transactions to
`personal entertainment, these communication networks con
`tinue to experience greater and greater delay, Stemming in
`part from traffic congestion and network latency. For
`example, the maturity of electronic commerce and accep
`tance of the Internet, in particular the World Wide Web
`(“Web’), as a daily tool pose an enormous challenge to
`communication engineers to develop techniques to reduce
`network latency and user response times. With the advances
`in processing power of desktop computers, the average user
`has grown accustomed to Sophisticated applications (e.g.,
`Streaming video, radio broadcasts, video games, etc.), which
`place tremendous Strain on network resources. The Web as
`well as other Internet Services rely on protocols and net
`working architectures that offer great flexibility and robust
`neSS; however, Such infrastructure may be inefficient in
`transporting Web traffic, which can result in large user
`response time, particularly if the traffic has to traverse an
`intermediary network with a relatively large latency (e.g., a
`Satellite network).
`FIG. 6 is a diagram of a conventional communication
`System for providing retrieval of web content by a personal
`computer (PC). PC 601 is loaded with a web browser 603 to
`access the web pages that are resident on web server 605;
`collectively the web pages and web server 605 denote a
`“web site.” PC 603 connects to a wide area network (WAN)
`607, which is linked to the Internet 609. The above arrange
`ment is typical of a business environment, whereby the PC
`601 is networked to the Internet 609. A residential user, in
`contrast, normally has a dial-up connection (not shown) to
`the Internet 609 for access to the Web. The phenomenal
`growth of the Web is attributable to the ease and standard
`ized manner of “creating a web page, which can possess
`textual, audio, and Video content.
`Web pages are formatted according to the Hypertext
`Markup Language (HTML) standard which provides for the
`display of high-quality text (including control over the
`location, size, color and font for the text), the display of
`graphics within the page and the “linking from one page to
`another, possibly stored on a different web server. Each
`HTML document, graphic image, Video clip or other indi
`vidual piece of content is identified, that is, addressed, by an
`Internet address, referred to as a Uniform Resource Locator
`(URL). As used herein, a “URL" may refer to an address of
`an individual piece of web content (HTML document,
`
`15
`
`25
`
`35
`
`40
`
`45
`
`50
`
`55
`
`60
`
`65
`
`US 6,795,848 B1
`
`2
`image, Sound-clip, video-clip, etc.) or the individual piece of
`content addressed by the URL. When a distinction is
`required, the term “URL address' refers to the URL itself
`while the terms “web content”, “URL content” or “URL
`object” refers to the content addressed by the URL.
`In a typical transaction, the user enterS or Specifies a URL
`to the web browser 603, which in turn requests a URL from
`the web server 605. The web server 605 returns an HTML
`page, which contains numerous embedded objects (i.e., web
`content), to the web browser 603. Upon receiving the HTML
`page, the web browser 603 parses the page to retrieve each
`embedded object. The retrieval process often requires the
`establishment of Separate communication Sessions (e.g.,
`TCP (Transmission Control Protocol) sessions) to the web
`server 605. That is, after an embedded object is received, the
`TCP session is torn down and another TCP session is
`established for the next object. Given the richness of the
`content of web pages, it is not uncommon for a web page to
`possess over 30 embedded objects. This arrangement dis
`advantageously consumes network resources, but more
`Significantly, introduces delay to the user.
`Delay is further increased if the WAN 607 is a satellite
`network, as the network latency of the Satellite network is
`conventionally a longer latency than terrestrial networks. In
`addition, because HTTP utilizes a separate TCP connection
`for each transaction, the large number of transactions ampli
`fies the network latency. Further, the manner in which
`frames are created and images are embedded in HTML
`requires a separate HTTP transaction for every frame and
`URL compounds the delay.
`Based on the foregoing, there is a clear need for improved
`approaches for retrieval of web content within a communi
`cation System.
`There is a need to utilize Standard protocols to avoid
`development costs and provide rapid industry acceptance.
`There is also a need for a web content retrieval mecha
`nism that makes the networks with relatively large latency
`viable and/or competitive for Internet access.
`Therefore, an approach for retrieving web content that
`reduces user response times is highly desirable.
`SUMMARY OF THE INVENTION
`According to one aspect of the invention, a communica
`tion System for retrieving web content comprises a down
`Stream proxy server that is configured to receive a URL
`request message from a web browser. The URL request
`message Specifies a URL content that has an embedded
`object. An upstream proxy server is configured to commu
`nicate with the downstream proxy server and to receive the
`URL request message from the downstream proxy server.
`The upstream proxy server selectively forwards the URL
`request message to a web server and receives the URL
`content from the Web Server, wherein the upstream proxy
`server forwards the URL content to the downstream proxy
`server and parses the URL content to obtain the embedded
`object prior to the web browser having to issue an embedded
`object request message. The above arrangement advanta
`geously reduces user response time associated with web
`browsing.
`BRIEF DESCRIPTION OF THE DRAWINGS
`A more complete appreciation of the invention and many
`of the attendant advantages thereof will be readily obtained
`as the same becomes better understood by reference to the
`following detailed description when considered in connec
`tion with the accompanying drawings, wherein:
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1012
`Page 9 of 15
`
`
`
`US 6,795,848 B1
`
`15
`
`25
`
`3
`FIG. 1 is a diagram of a communication System employ
`ing a downstream proxy Server and an upstream proxy
`Server for accessing a web server, according to an embodi
`ment of the present invention;
`FIG. 2 is a Sequence diagram of the process of reading
`ahead used in the system of FIG. 1;
`FIG. 3 is a block diagram of the protocols utilized in the
`system of FIG. 1;
`FIG. 4 is a diagram of a communication System employ
`ing a downstream proxy Server and an upstream proxy
`server that maintains an unsolicited URL (Uniform
`Resource Locator) cache for accessing a web server, accord
`ing to an embodiment of the present invention;
`FIG. 5 is a Sequence diagram of the process of reading
`ahead used in the system of FIG. 4;
`FIG. 6 is a diagram of a conventional communication
`System for providing retrieval of web content by a personal
`computer (PC); and
`FIG. 7 is a diagram of a computer System that can be
`configured as a proxy Server, in accordance with an embodi
`ment of the present invention.
`DESCRIPTION OF THE PREFERRED
`EMBODIMENTS
`In the following description, for the purpose of
`explanation, Specific details are Set forth in order to provide
`a thorough understanding of the invention. However, it will
`be apparent that the invention may be practiced without
`these Specific details. In Some instances, well-known Struc
`tures and devices are depicted in block diagram form in
`order to avoid unnecessarily obscuring the invention.
`The present invention provides a communication System
`for retrieving web content. A downstream proxy server
`receives a URL request message from a web browser, in
`which the URL request message specifies a URL content that
`has an embedded object. An upstream proxy server receives
`the URL request message from the downstream proxy
`Server. The upstream proxy Server Selectively forwards the
`40
`URL request message to a web server and receives the URL
`content from the Web Server. The upstream proxy server
`forwards the URL content to the downstream proxy server
`and parses the URL content to obtain the embedded object
`prior to receiving a corresponding embedded object request
`message initiated by the web browser.
`Although the present invention is discussed with respect
`to a protocols and interfaces to Support communication with
`the Internet, the present invention has applicability to any
`protocols and interfaces to Support a packet Switched
`network, in general.
`FIG. 1 shows a diagram of a communication System
`employing a downstream proxy server and an upstream
`proxy server for accessing a web server, according to an
`embodiment of the present invention. Communication SyS
`tem 100 includes a user station 101 that utilizes a standard
`web browser 103 (e.g., Microsoft Internet Explorer,
`Netscape Navigator). In this example, the user station 101 is
`a personal computer (PC); however, any computing platform
`may be utilized, Such as a WorkStation, web enabled Set-top
`boxes, web appliances, etc. System 100 utilizes two proxy
`servers 105 and 107, which are referred to as a downstream
`proxy server 105 and an upstream proxy server 107, respec
`tively. PC 101 connects to downstream server 105, which
`communicates with upstream server 107 through a network
`111. This communication with downstream server 105 may
`be transparent to PC 101. According to an embodiment of
`
`4
`the present invention, the network 111 is a VSAT (Very
`Small Aperture Terminal) satellite network. Alternatively,
`the network 111 may be any type of Wide Area Network
`(WAN); e.g., ATM (Asynchronous Transfer Mode) network,
`router-based network, T1 network, etc. The upstream Server
`107 has connectivity to an IP network 113, such as the
`Internet, to access web server 109.
`Proxy servers 105 and 107, according to an embodiment
`of the present invention, are Hypertext Transfer Protocol
`(HTTP) proxy servers with HTTP caches 115 and 117,
`respectively. These servers 105 and 107 communicate using
`persistent connections (which is a feature of HTTP 1.1). Use
`of persistent connections enables a Single TCP connection to
`be reused for multiple requests of the embedded objects
`within a web page associated with web server 109. Further,
`TCP Transaction Multiplexing Protocol (TTMP) may be
`utilized. TTMP and persistent-TCP are more fully described
`with respect to FIG. 3.
`Web browser 103 may be configured to either access
`URLs directly from a web server 109 or from HTTP proxy
`servers 105 and 107. A web page may refer to various source
`documents by indicating the associated URLS. AS discussed
`above, a URL specifies an address of an “object” in the
`Internet 113 by explicitly indicating the method of accessing
`the resource. A representative format of a URL is as follows:
`http://www.hns.com/homepage/document.html. This
`example indicates that the file “document.html is accessed
`using HTTP.
`HTTP proxy server 105 and 107 acts as an intermediary
`between one or more browsers and many web servers (e.g.,
`server 109). A web browser 103 requests a URL from the
`proxy server (e.g., 105) which in turn “gets” the URL from
`the addressed web server 109. An HTTP proxy 105 itself
`may be configured to either access URLS directly from a
`web server 109 or from another HTTP proxy server 107.
`According to one embodiment of the present invention,
`the proxy servers 105 and 107 may support multicast
`delivery. IP multicasting can be used to transmit information
`from upstream server 107 to multiple downstream servers
`(of which only one downstream server 105 is shown). A
`multicast receiver (e.g., a network interface card (NIC)) for
`the downstream proxy server 105 operates in one of two
`modes: active and inactive. In the active mode operation, the
`downstream proxy Server 105 opens multicast addresses and
`actively processes the received URLS on those addresses.
`During the inactive mode, the downstream proxy server 105
`disables multicast reception from the upstream proxy server
`107. In the inactive state the downstream proxy server 105
`minimizes its use of resources by, for example, closing the
`cache and freeing its RAM memory (not shown).
`For downstream proxy Server 105 operating on a general
`purpose personal computer, the multicast receiver for the
`downstream proxy server 105 may be configured to Switch
`between the active and inactive States to minimize the proxy
`Server's interfering with user-directed processing. The
`downstream proxy server 105 utilizes an activity monitor
`which monitors user input (key clicks and mouse clicks) to
`determine when it should reduce resource utilization. The
`downstream proxy server 105 also monitors for proxy cache
`lookups to determine when it should go active.
`Upon boot up, the multicast receiver is inactive. After a
`certain amount of time with no user interaction and no proxy
`cache lookups (e.g., 10 minutes), the downstream proxy
`server 105 sets the multicast receiver active. The down
`stream proxy server 105 sets the multicast receiver active
`immediately upon needing to perform a cache lookup. The
`
`35
`
`45
`
`50
`
`55
`
`60
`
`65
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1012
`Page 10 of 15
`
`
`
`US 6,795,848 B1
`
`15
`
`40
`
`45
`
`25
`
`S
`downstream proxy server 105 sets the multicast receiver
`inactive whenever user activity is detected and the cache 115
`has not had any lookups for a configurable period of time
`(e.g., 5 minutes).
`For downstream proxy servers 105 running on systems
`with adequate CPU (central processing unit) resources to
`simultaneously handle URL reception and other
`applications, the user may configure the downstream proxy
`server 105 to set the multicast receiver to stay active
`regardless of user activity. The operation of system 100 in
`the retrieval of web content, according to an embodiment of
`the present invention, is described in FIG. 2, below.
`FIG. 2 shows a Sequence diagram of the process of
`reading ahead used in the system of FIG.1. To retrieve a web
`page (i.e., HTML page) from web server 109, the web
`browser 103 on PC 101 issues an HTTP GET request, which
`is received by down stream proxy server 105 (per step 1).
`For the purposes of explanation, the HTML page is
`addressed as URL “HTML. The downstream server 105
`checks its cache 115 to determine whether the requested
`URL has been previously visited. If the downstream proxy
`server 105 does not have URL HTML stored in cache 115,
`the server 105 relays this request, GET URL “HTML", to
`upstream server 117.
`The HTTP protocol also supports a GET IF MODIFIED
`SINCE request wherein a web server (or a proxy server)
`either responds with a status code indicating that the URL
`has not changed or with the URL content if the URL has
`changed since the requested date and time. This mechanism
`updates cache 115 of proxy server 105 only if the contents
`have changed, thereby saving unnecessary processing costs.
`The upstream server 117 in turn searches for the URL
`HTML in its cache 117; if the HTML page is not found in
`cache 117, the server 117 issues the GET URL HTML
`35
`request the web server 109 for the HTML page. Next, in step
`2, the web server 109 transmits the requested HTML page to
`the upstream server 117, which stores the received HTML
`page in cache 117. The upstream server 117 forwards the
`HTML page to the down stream server 105, and ultimately
`to the web browser 103. The HTML page is stored in cache
`115 of the downstream server 105 as well as the web
`browser's cache (not shown). In Step 3, the upstream server
`117 parses the HTML page and requests the embedded
`objects within the HTML page from the web server 109; the
`embedded objects are requested prior to receiving corre
`sponding embedded object requests initiated by the web
`browser 103.
`Step 3 may involve the issuance of multiple GET
`requests; the web page within web server 109 may contain
`50
`over 30 embedded objects, thus requiring 30 GET requests.
`In effect, this scheme provides a way to “read ahead' (i.e.,
`retrieve the embedded object) in anticipation of correspond
`ing requests by the web browser 103. The determination to
`read-ahead may be based upon explicit tracking of the
`content of the downstream server cache 115; only those
`embedded objects that are not found in the cache 115 are
`requested. Alternatively, the upstream server 107 may only
`request those embedded objects that are not in the upstream
`server cache 117. Further, in actual implementation wherein
`multiple web servers exist, the upstream server 107 may
`track which web server tend to transmit uncachable objects,
`for Such Servers, objects Stored therein are read-ahead.
`Moreover if the HTML contains a cookie and the GET
`HTML request is directed to the same web server, then the
`65
`upstream server 107 includes the cookie in the read-ahead
`request to the web server 109 for the embedded objects. A
`
`55
`
`60
`
`6
`cookie is information that a web server 109 stores on the
`client system, e.g., PC 101, to identify the client system.
`Cookies provide a way for the web server 109 to return
`customized web pages to the PC 101.
`In step 4, the web server 109 honors the GET request by
`transmitting the embedded objects to the upstream Server
`107. The upstream server 107, as in step 5, then forwards the
`retrieved objects to the downstream server 105, where the
`objects are Stored until they are requested by the web
`browser 103. It should be noted that the upstream server 107
`forwards the embedded objects prior to being requested to
`do so by the web browser 103; however, the upstream server
`107 performs this forwarding step based on an established
`criteria. There are scenarios in which all the embedded
`objects that are read-ahead may not Subsequently be
`requested by the web browser 103. In such cases, if the
`upstream server 107 transfers these embedded objects over
`network 111 to the downstream server 105, the bandwidth of
`network 111 would be wasted, along with the resources of
`the downstream server 105. Accordingly, the forwarding
`criteria need to reflect the trade off between response time
`and bandwidth utilization. These forwarding criteria may
`include the following: (1) object size, and (2) “cachability.”
`That is, upstream server 107 may only forward objects that
`are of a predetermined size or less, So that large objects
`(which occupy greater bandwidth) are not sent to the down
`stream server 105. Additionally, if the embedded object is
`marked uncacheable, then the object may be forwarded to
`the downstream server 105, which by definition will not
`have the object stored. The upstream server 107 may be
`configured to forward every retrieved embedded object, if
`bandwidth is not a major concern.
`In the Scenario in which the embedded objects correspond
`to a request that contains a cookie, the upstream Server 107
`provides an indication whether the embedded objects has the
`corresponding cookie.
`In step 6, the web browser 103 issues a GET request for
`the embedded objects corresponding to the web page within
`the web server 109. The downstream server 105 recognizes
`that the requested embedded objects are Stored within its
`cache 115 and forwards the embedded objects to the web
`browser 103. Under this approach, the delays associated
`with network 111 and the Internet 113 are advantageously
`avoided.
`The caching HTTP proxy servers 105 and 107, according
`to one embodiment of the present invention, Stores the most
`frequently accessed URLs. When web server 109 delivers a
`URL to the proxy servers 105 and 107, the web server 109
`may deliver along with the URL an indication of whether the
`URL should not be cached and an indication of when the
`URL was last modified.
`At this point, web browser 103 has already requested URL
`HTML, and has the URL HTML stored in a cache (not
`shown) of the PC 101. To avoid stale information, the web
`browser 103 needs to determine whether the information
`stored at URL HTML has been updated since the time it was
`last requested. As a result, the browser 103 issues a GET
`HTML IF MODIFIED SINCE the last time HTML was
`obtained. Assuming that URL HTML was obtained at 11:30
`a.m. on Sep. 22, 2000, browser 103 issues a GET HTML IF
`MODIFIED SINCE Sep. 22, 2000 at 11:30 a.m. request.
`This request is sent to downstream proxy server 105. If
`downstream proxy server 105 has received an updated
`version of URL HTML since Sep. 22, 2000 at 11:30 a.m.,
`downstream proxy server 105 Supplies the new URL HTML
`information to the browser 103. If not, the downstream
`
`Code200, UAB v. Bright Data Ltd.
`Code200's Exhibit 1012
`Page 11 of 15
`
`
`
`US 6,795,848 B1
`
`15
`
`35
`
`40
`
`25
`
`7
`proxy server 105 issues a GET IF MODIFIED SINCE
`command to upstream proxy Server 107. If upstream proxy
`server 107 has received an updated URL HTML since Sep.
`22, 2000 at 11:30 a.m., upstream proxy server 107 passes the
`new URL HTML to the downstream proxy server 105. If
`not, the upstream proxy server 107 issues a GET HTML IF
`MODIFIED SINCE command to the Web Server 109. If URL
`HTML has not changed since Sep. 22, 2000 at 11:30 a.m.,
`web server 109 issues a NO CHANGE response to the
`upstream proxy server 107. Under this arrangement, band
`width and processing time are Saved, Since if the URL
`HTML has not been modified since the last request, the
`entire contents of URL HTML need not be transferred
`between web browser 103, downstream proxy server 105,
`upstream proxy server 107, and the web server 109, only an
`indication that there has been no change need be exchanged.
`Caching proxy servers 105 and 107 offer both reduced
`network utilization and reduced response time when they are
`able to satisfy requests with cached URLs.
`FIG.3 shows a block diagram of the protocols utilized in
`the system of FIG.1. The servers 105,107, and 109 and PC
`101 employ, according to one embodiment of the present
`invention, a layered protocol stack 300. The protocol stack
`300 includes a network interface layer 301, an Internet layer
`303, a transport layer 305, and an application layer 307.
`HTTP is an application level protocol that is employed for
`information transfer over the Web. RFC (Request for
`Comment) 2616 specifies this protocol and is incorporated
`herein in its entirety. In addition, a more detailed definition
`of URL can be found in RFC 1737, which is incorporated
`herein in its entirety.
`The Internet layer 303 may be the Internet Protocol (IP)
`version 4 or 6, for instance. The transport layer 305 may
`include the TCP (Transmission Control Protocol) and the
`UDP (User Datagram Protocol). According to one embodi
`ment of the present invention, at the transport layer, persis
`tent TCP connections are utilized in the system 100; in
`addition, TCP Transaction Multiplexing Protocol (TTMP)
`may be used.
`The TCP Transaction Multiplexing Protocol (TTMP)
`allows multiple transactions, in this case HTTP transactions,
`to be multiplexed onto one TCP connection. Thus, transac
`tion multiplexing provides an improvement over Separate
`connection for each transaction (HTTP 1.0) and pipelining
`(HTTP 1.1) by preventing a single stalled request from
`Stalling other requests. This is particularly beneficial when
`the downstream proxy server 105 is Supporting Simultaneous
`requests from multiple browsers (of which only browser 103
`is shown in FIG. 1).
`The downstream proxy server 105 initiates and maintains
`a TCP connection to the upstream proxy server 107 as
`needed to carry HTTP transactions. The TCP connection
`could be set up and kept connected as long as the down
`Stream proxy server 105 is running and connected to the
`55
`network 111. The persistent TCP connection may also be set
`up when the first transaction is required and torn down after
`the connection has been idle for Some period.
`An HTTP transaction begins with a request header,
`optionally followed by request content which is sent from
`the downstream proxy server 105 to the upstream proxy
`server 107. An HTTP transaction concludes with a response
`header, optionally followed by response content. The down
`Stream proxy server 105 maintains a transaction ID Sequence
`number, which is incremented with each transaction. The
`downstream proxy server 105 breaks the transaction request
`into one or more blocks, creates a TTMP header for each
`
`45
`
`50
`
`60
`
`65
`
`8
`block, and sends the blocks with a TTMP header to the
`upstream proxy server 107. The upstream proxy server 107
`Similarly breaks a transaction response into blocks and sends
`the blocks with a TTMP header to the downstream proxy
`Server 105. The TTMP header contains the information
`necessary for the upstream proxy server 107 to reassemble
`a complete transaction command and to return the matching
`transaction response.
`In particular, the TTMP header contains the following
`fields: a transaction ID field, a Block Length field, a Last
`Indication field, an Abort Indication field, and a Compres
`Sion Information field. The transaction ID (i.e., the transac
`tion sequence number) must rollover less frequently than the
`maximum number of Supported outstanding transactions.
`The Block Length field allows a proxy server 105 and 107
`to determine the beginning and ending of each block. The
`Last Indication field allows the proxy server 105 and 107 to
`determine when the end of a transaction response has been
`received. The Abort Indication field allows the proxy server
`105 and 107 to abort a transaction when the transaction
`request or response cannot be completed. Lastly, the Com
`pression Information field defines how to decompress the
`block.
`The use of a single HTTP connection reduces the number
`of TCP acknowledgements that are sent over the network
`111. Reduction in the number of TCP acknowledgements
`Significantly reduces the use of inbound networking
`resources which is particularly important when the network
`111 is a VSAT system or other wireless systems. This
`reduction of acknowledgements is more significant when
`techniques, such as those described in U.S. Pat. No. 5,995,
`725 to Dillon entitled “Method and Apparatus for Request
`ing and Retrieving Information for a Source Computer
`Using Terrestrial and Satellite Interface' issued Nov. 30,
`1999 (which is incorporated herein in its entirety), minimize
`the number of TCP acknowledgements per second per TCP
`connection.
`Alternatively, downstream proxy server 105, for
`efficiency, may use the User Datagram Protocol (UDP) to
`transmit HTTP GET and GET IF MODIFIED SINCE
`requests to the upstream proxy server 107. This is done by
`placing the HTTP request header into the UDP payload. The
`use of UDP is very efficient as the overhead of establishing,
`maintaining and clearing TCP connections is not incurred. It
`is “best effort” in that there is no guarantee that the UDP
`packets will be delivered.
`FIG. 4 shows a diagram of a communication System
`employing a downstream proxy server and an upstream
`proxy serve