`TCP/IP
`
`
`
`
`
`
`
`
`
`
`INTERNETWORKINGwith
`
`PRINCIPLES, PROTOCOLS, AND ARCHITECTURES
`
`FOURTH EDITION
`
`.
`
`OUGLAS E. COMER
`
`
`
`Cisco Systems,Inc.
`Exhibit 1020
`we bage| of 15
`
`
`
`Cisco Systems, Inc.
`Exhibit 1020
`Page 1 of 15
`
`
`
`
`
`Library of Congress Cataloging-in-Publication Data
`
`
`
`Comer, Douglas
`Internetworking with TCP/IP / Douglas E. Comer. -- Ath ed.
`p.
`cm.
`Inciudes bibliographical references and index.
`ISBN 0-13-018380-6
`1. Principles, protocols, and architecture. 2. Client/server
`computing. 3. Internetworking (Telecommunications) I. Title
`
`ssTISTne=Page orp
`
`Publisher: Alan Apt
`Project Manager: Ana Arias Terry
`Editorial Assistant: Toni Holm
`Vice-president and director ofproduction and manufacturing, ESM: DavidW. Riccardi
`Vice-president and editorial director of ECS: Marcia Horton
`Executive Managing Editor: Vince O’Brien
`Managing Editor: David A. George
`Editorial/production supervision: Irwin Zucker
`Art Director: Heather Scott
`Assistant to Art Director: John Christiana
`Manufacturing Buyer: Pat Brown
`Marketing manager: Danny Hoyt
`a ©2000, 1995 Prentice Hall
`aKOueS
`Prentice-Hall, Inc.
`pe Upper Saddle River, New Jersey 07458
`
`
`
`Prentice Hall books are widely usedby corporations and government agencies for training, marketing,
`and resale. The publisher offers discounts on this book when ordered in bulk quantities.
`For more information, contact Corporate Sales Department, Phone: 800-382-3419;
`Fax: 201-236-7141; E-mail: corpsales@prenhall.com
`Or write: Prentice Hall PTR, Corp. Sales Dept., One Lake Street, Upper Saddle River, NJ 07458.
`UNIX is a registered trademark of UNIX System Laboratories, Incorporated
`proNBT-10 is a trademark of Proteon Corporation
`LSI 11 is a trademark of Digital Equipment Corporation
`Microsoft Windowsis a trademark of Microsoft Corporation
`EUL-64is trademark of the Institute of Flectrical and Electronics Engineers (EEE)
`All sights reserved. Nopartofthis book may be reproduced, in any form or by any means, without
`permission in writing from the publisher.
`
`Printed in the United States ofAmerica
`1098 765 4
`
`ISBN 0-13-018380-6
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Prentice-Hall International (UK) Limited, London
`Prentice-Hall of Australia Pty. Limited, Sydney
`Prentice-Hall Canada Inc., Toronto
`Prentice-Hall Hispanoamericana, S.A., Mexico
`Prentice-Hall of India Private Limited, New Delhi
`Prentice-Hall of Japan, Inc., Tokyo
`Pearson Education Asia Pte Ltd
`Editora Prentice-Hall do Brasil, Ltda., Rio de Janeiro
`
`
`
`Cisco Systems,Inc.
`Exhibit 1020
`
`
`Cisco Systems, Inc.
`Exhibit 1020
`Page 2 of 15
`
`
`
`
`
`Internetworking With TCP/IP
`Vol I:
`Principles, Protocols, and Architecture
`Fourth Edition
`
`DOUGLASE. COMER
`
`Department of Computer Sciences
`Purdue University
`West Lafayette, IN 47907
`
`PRENTICE HALL
`Upper Saddle River, New Jersey 07458
`
`
`
`
`Cisco Systems,Inc.
`Exhibit 1020
`
`Face sof13
`
`
`Cisco Systems, Inc.
`Exhibit 1020
`Page 3 of 15
`
`
`
`
`
`28
`
`on
`
`‘he
`lid
`on
`
`va
`
`nd
`
`Applications: World Wide
`Web (HTTP)
`
`
`
`During the early history of the Internet, FTP data transfers accounted for approxi-
`mately one third of Internet traffic, more than any other application. From its inception
`in the early 1990s, however, the Web had a much higher growth rate. By 1995, Web
`traffic overtook FTP to become the largest consumer of Internet backbone bandwidth,
`and has remained the leader ever since. By 2000, Webtraffic completely overshadowed
`other applications.
`Although traffic is easy to measure and cite, the impact of the Web cannot be un-
`derstood from suchstatistics. More people know about and use the Web than any other
`Internet application. Most companies have Web sites and on-line catalogs; references to
`the. Web appear in advertising. In fact, for many users, the Internet and the Webare in-
`distinguishable.
`
`28.1 Introduction
`
`This chapter continues the discussion of applications that use TCP/IP technology
`by focusing on the application that has had the most impact:
`the World Wide Web
`(WWW). After a brief overview of concepts, the chapter examines the primary protocol
`used to transfer a Web page from a server to a Web browser. The discussion covers
`caching as well as the basic transfer mechanism.
`
`28.2 Importance Of The Web
`
`-
`
`327
`
`Cisco Systems,Inc.
`Exhibit 1020°°
`Page 4 of 15
`
`Cisco Systems, Inc.
`Exhibit 1020
`Page 4 of 15
`
`
`
`528
`
`Applications: World Wide Web (HTTP)
`
`Chap. 28
`
`28.3 Architectural Components
`
`28.4 Uniform Resource Locators
`
`Each Web pageis assigned a unique namethat is used to identify it. The name,
`which is called a Uniform Resource Locator (URL)*, begins with a specification of the
`scheme used to access the item.
`In effect, the schemespecifies the transfer protocol; the
`format of the remainder of the URL depends on the scheme. For example, a URL.that
`follows the hitp scheme has the following form#:
`
`http: // hostname [: port] /path [; parameters] [? query]
`
`where brackets denote an optional item. For now,it is sufficient to understand that the
`hostname stting specifies the domain name or IP address of the computer on which the
`server for the item operates, :port is an optional protocol port number needed only in
`cases where the server does not use the well-knownport (80), path is a string that iden-
`tifies one particular documenton the server, ;parameters is an optional string that speci-
`fies additional parameters supplied by the client, and ?query is an optional string used
`when the browser sends a question. A user is unlikely to ever see or use the optional
`parts directly.
`Instead, URLsthat a user enters contain only a hostname and path. For
`example, the URL:
`
`Conceptually, the Web consists of a large set of documents, called Web pages, that
`are accessible over the Internet. Each Web page is classified as a hypermedia docu-
`ment. The suffix media is used to indicate that a document can contain items other than
`text (e.g., graphics images); the prefix hyper is used because a document can contain
`selectable links that refer to other, related documents.
`Two main building blocks are used to implement the Web on top ofthe global In-
`ternet. A Web browser consists of an application program that a user invokes to access
`and display a Web page. The browser becomesa client that contacts the appropriate
`Webserverto obtain a copy of the specified page. Because a given server can manage
`more than one Web page, a browser must specify the exact page when making a re-
`quest.
`The data representation standard used for a Web page depends onits contents. For
`example, standard graphics representations such as Graphics Interchange Format (GIF)
`or Joint Picture Encoding Group (JPEG) can be used for a page that contains a single
`graphics image. Pages that contain a mixture of text and other itemsare represented us-
`ing HyperText Markup Language (HTML). An HTML document consists of a file that
`contains text along with embedded commands, called tags,
`that give guidelines for
`display. A tag is enclosed in less-than and greater-than symbols; some tags come in
`pairs that apply to all
`items between the pair. For example,
`the two commands
`<CENTER> and </CENTER> cause items between them to be centered in the
`browser’s window.
`
`
`
`tA URL is a specific type of the more general Uniform Resource Identifier (URD.
`+Someoftheliterature refers to the initial string, htip:, as a pragma.
`
`Cisco Systems,Inc.
`Exhibit 1020
`Page 5 of 15
`
`
`
`siaSRLSREETNSnaaSDSdoeeeaasoom
`
`
`
`Cisco Systems, Inc.
`Exhibit 1020
`Page 5 of 15
`
`
`
`
`
`|
`
`28
`
`iat
`u-
`an
`in
`
`n-
`SS
`ite
`ge
`ee
`;
`or
`)
`
`Is-
`
`a
`is
`he
`
`hn
`he
`lat
`
`he
`he
`in
`n-
`
`4
`ial
`or
`
`Sec, 28.4
`
`Uniform Resource Locators
`
`http://www.cs.purdue.edu/people/comer/
`
`529
`
`specifies the author’s Web page. The server operates on computer www.cs.purdue.edu,
`and the document is named /people/comer/.
`The protocol standards distinguish between the absolute form of a URL illustrated
`above, and a relative form. A relative URL, which is seldom seen by a user, is only
`meaningful when the server has already been determined. Relative URLs are useful
`once communication has been established with a specific server. For example, when
`communicating with server www.cs.purdue.edu, only the string /people/comer/ is needed .
`to specify the document named by the absolute URL above. We can summarize.
`
`Each Web page is assigned a unique identifier known as a Uniform
`Resource Locator (URL). The absolute form of a URL contains a full
`specification; a relative form that omits the address of the server is
`only useful when the server is implicitly known.
`
`28.5 An Example Document
`In principle, Web accessis straightforward. All access originates with a URL — a
`user either enters a URL via the keyboard or selects an item which provides the browser
`with a URL. The browser parses the URL, extracts the information, and uses it to ob-
`tain a copy of the requested page. Because the format of the URL depends on the
`scheme, the browser begins by extracting the scheme specification, and then uses the
`scheme to determine how to parse the rest of the URL.
`An example will illustrate how a URL is produced from a selectable link in a do-
`cument.
`In fact, a document contains a pair of values for each link: an item to be
`displayed on the screen and a URL to follow if the user selects the item.
`In HTML, the
`pair of tags <A> and </A> are known as an anchor. The anchordefines a link; a URL
`is added to the first tag, and items to be displayed are placed between the two tags. The
`browser stores the URL internally, and follows it when the user selects the link. For
`example, the following HTML documentcontains a selectable link:
`
`
`
`<HIML>
`The author of this text is
`<A HREF="http://www.cs.purdue. edu/people/camer">
`Douglas Comer.</A>
`</HIML>
`
`When the documentis displayed, a single line of text appears on the screen:
`The author of this text is Douglas Comer.
`
` Cisco Systems,Inc.
`
`xhibit 1020
`Page 6 of 15
`
`Cisco Systems, Inc.
`Exhibit 1020
`Page 6 of 15
`
`
`
` 530
`
`Application Level. HTTP operates at the application level. It assumes
`a reliable, connection-oriented transport protocol such as TCP, but does not
`provide reliability or retransmissionitself.
`
`Request/Response. Once a transport session has been established, one
`side (usually a browser) must send an HTTP request to which the other side
`responds.
`
`Stateless. Each HTTP request is self-contained; the server does not
`keep a history of previous requests or previous sessions,
`
`In most cases, a browser requests a Web
`Bi-Directional Transfer.
`page, and the server transfers a copy to the browser. HTTP also allows
`transfer from a browser to a server (e.g., when a user submits a so-called
`“‘form’’).
`
`Capability Negotiation. HTTP allows browsers and servers to nego-
`tiate details such as the character set to be used during transfers. A sender
`can specify the capabilities it offers and a receiver can specify the capabili-
`ties it accepts.
`
`Support For Caching. To improve response time, a browser caches a
`copy of each Web page it retrieves. If a user requests a page again, HTTP
`allows the browser to interrogate the server to determine whether the con-
`tents of the page has changed since the copy was cached.
`
`Support For Intermediaries. HTTP allows a machine along the path
`between a browser and a server to act as a proxy server that caches Web
`pages and answers a browser’s request from its cache.
`
`28.7 HTTP GET Request
`
`In the simplest case, a browser contacts a Web server directly to obtain a page.
`The browser begins with a URL, extracts the hostname section, uses DNS to map the
`name into an equivalent IP address, and uses the IP address to form a TCP connection
`
`Applications: World Wide Web (HTTP)
`
`Chap. 28
`
`The browser underlines the phrase Douglas Comer to indicate that it corresponds
`to a selectable link.
`Internally, of course, the browser stores the URL from the <A>
`tag, which it follows when the user selects the link.
`
`28.6 Hypertext Transfer Protocol
`
`The protocol used for communication between a browser and a Web server or
`between intermediate machines and Web servers is known as the HyperText Transfer
`Protocol (HTTP). HTTP has the following set of characteristics:
`
`~
`
`Cisco Systems,Inc.
`
`Exhibit 1020
`Page 7 of 15
`
`
`
`Cisco Systems, Inc.
`Exhibit 1020
`Page 7 of 15
`
`
`
`Sec. 28.7
`
`HTTP GET Request
`
`531
`
`
`
`28
`
`ids
`4>
`
`or
`fer
`
`to the server. Once the TCP connection is in place, the browser and Web server use
`HTTP to communicate; the browser sends a request to retrieve a specific page, and the
`server responds by sending a copy ofthe page.
`A browser sends an HTTP GET command to request a Web page from a serverf.
`The request consists of a single line of text that begins with the keyword GET andis
`followed by a URL and an HTTP version number. For example, to retrieve the Web
`page in the example above from server www.cs.purdue.edu, a browser can send the fol-
`lowing request:
`
`
`
`GET http://www.cs.purdue.edu/people/comer/ HTTP/1.1
`
`Once a TCP connection is in place, there is no need to send an absolute URL — the
`following relative URL will retrieve the same page:
`
`GET /people/comer/ HTTP/1.0
`
`To summarize:
`
`The Hypertext Transfer Protocol (HTTP) is used between a browser
`and a Web server. The browser sends a GET request to which a
`server responds by sending the requested item.
`
`28.8 Error Messages
`
`How should a Web server respond when it receives an illegal request? In most
`cases, the request has been sent by a browser, and the browser will attempt to display
`whatever the server returns. Consequently, servers usually generate error messages in
`valid HTML. For example, one server generates the following error message:
`
`<HIML>
`
`<HEAD> <TITLE>400 Bad Request</TITLE>
`</HEAD>
`<BODY>
`<Hi>Bad Request</H1l> Your browser sent a request
`that this server could not understand.
`</BODY>
`</HIML>
`
`The browser uses the ‘‘head’’ of the document (i.e., the items between <HEAD> and
`</HEAD>) internally, and only shows the ‘‘body’’ to the user. The pair of tags <H1>
`and </H1> causes the browser to display Bad Request as a heading (i.e., large and
`bold), resulting in two lines of output on the user’s screen:
`
`+The standard uses the object-oriented term method instead of command.
`
`Cisco Systems,Inc.
`Exhibit 1020
`Page 8 of 15
`
`~
`
`Cisco Systems, Inc.
`Exhibit 1020
`Page 8 of 15
`
`
`
`532
`
`Bad Request
`
`Applications: World Wide Web (HTTP)
`
`Chap. 28
`
`Your browser sent a request that this server could not understand.
`
`28.9 Persistent Connections And Lengths
`
`28.10 Data Length And Program Output
`
`
`
`Early versions of HTTP follow the same paradigm as FTP by using a new TCP
`connection for each data transfer. That is, a client opens a TCP connection and sends a
`GET request. The server transmits a copy of the requested item, and then closes the
`TCP connection. Until it encounters an end offile condition, the client reads data from
`the TCP connection. Finally, the client closes its end of the connection.
`Version 1.1, which appeared as an RFC in June of 1999, changed the basic HTTP
`paradigm in a fundamental way.
`Instead of using a TCP connection for each transfer,
`version 1.1 adopts a persistent connection approach as the default. That is, once a
`client opens a TCP connection to a particular server, the client leaves the connection in
`place during multiple requests and responses. Wheneither a client or server is ready to
`close the connection, it informs the other side, and the connection is closed.
`The chief advantage of persistent connections lies in reduced overhead — fewer
`TCP connections means lower response latency, less overhead on the underlying net-
`works, less memory used for buffers, and less ‘CPU time used, A browser using a per-
`sistent connection can further optimize by pipelining requests (i.e., send requests back-
`to-back without waiting for a response). Pipelining is especially attractive in situations
`where multiple images must be retrieved for a given page, and the underlying internet
`has both high throughput and long delay.
`The chief disadvantage of using a persistent connection lies in the need to identify
`the beginning. and end of each item sent over the connection. There are two possible
`techniques that handle the situation: either send a length followed by the item, or send a
`sentinel value after the item to mark the end. HTTP cannot reserve a sentinel value be-
`cause the items transmitted include graphics images that can contain arbitrary sequences
`of octets. Thus, to avoid ambiguity between sentinel values and data, HTTP uses the
`approach of sending a length followed by an item of that size.
`
`
`
`
`
`
`
`Page 9 of 15
`
`It may not be convenient or even possible for a server to know the length of an
`item before sending. To understand why, one must know that servers use the Common
`Gateway Interface (CGD) mechanism that allows a computer program running on the
`server machine to create a Web page dynamically. When a request arrives that
`corresponds to one of the CGI-generated pages, the server runs the appropriate CGI pro-
`gram, and sends the output from the program backto the client as a response. Dynamic
`Webpage generation allows the creation of information that is current (e.g., a list of the
`current scores in sporting events), but means that the server may not know the exact
`data size in advance. Furthermore, saving the datatoafile before sending it is undesir-
`
`
`
`
`Cisco Systems, Inc.
`Exhibit 1020
`Page 9 of 15
`
`
`
`
`
`
`
`28
`
`Sec, 28.10
`
`Data Length And Program Output
`
`533
`
`able for two reasons: it uses resources at the server and delays transmission. Thus, to
`provide for dynamic Web pages, the HTTP standard specifies that if the server does not
`know the length of an item a priori, the server can inform the browser that it will close
`the connection after transmitting the item. To summarize:
`
`/
`
`i
`
`To allow a TCP connection to persist through multiple requests and
`responses, HTTP sends a length before each response. If it does not
`know the length, a server informs the client, sends the response, and
`then closes the connection.
`
`28.11 Length Encoding And Headers
`
`What representation does a server use to send length information? Interestingly,
`HTTP borrows the basic format from e-mail, using 822 format and MIME Extensions}.
`Like a standard 822 message, each HTTP transmission contains a header, a blank line,
`and the item being sent. Furthermore, each line in the header contains a keyword, a
`colon, and information. Figure 28.2 lists a few of the possible headers and their mean-
`ing.
`
`\
`
`Header
`Content-Length
`Content-Type
`Content-Encoding
`Content-Language
`
`Meaning
`Size of item in octets
`Type of the item
`Encoding used for item
`Language(s) used in item
`
`-
`
`/
`
`.
`
`Figure 28.1 Examples of items that can appear in the header sent before an
`item. The Content-Type and Content-Encoding are taken directly
`from MIME.
`
`As an example, consider Figure 28.2 which shows a few of the headers that are
`used when a HTML documentis transferred across a persistent TCP connection.
`
`34
`Content-Length:
`Content-Language: en
`Content-Encoding: ascii
`
`<HTML> A trivial example. </HIML>
`Figure 28.2 An illustration of an HTTP transfer with header lines used to
`specify attributes, a blank line, and the document itself. A
`.
`was
`. .
`Content-Length header is required if the connection is persistent.
`
`+See Chapter 27 for a discussion of e-mail, 822 format, and MIME.
`
`oP
`3a
`the
`
`m T
`
`P
`er,
`a
`in
`to
`
`rer
`ot-
`oL-
`k-
`ns
`et
`
`fy
`dle
`la
`\e-
`‘es
`he
`
`an
`mM
`he
`at>.
`.
`tic
`he
`ict
`
`ir-
`
`Cisco Systems,Inc.
`
`Page 10 of 15
`
`Cisco Systems, Inc.
`Exhibit 1020
`Page 10 of 15
`
`
`
`534
`
`:
`
`Applications: World Wide Web (HTTP)
`
`Chap. 28
`
`In addition to the examples shown in the figure, HTTP includes a wide variety of
`headers that allow a browser and server to exchange meta information. For example,
`we said that if a server does not know the length of an item, the server closes the con-
`nection after sending the item. However,the server does not act without warning — the
`server informs the browser to expect a close. To do so, the server includes a Connec-
`tion header before the item in place of a Content-Length header:
`
`Connection: close
`
`the browser knows that the server intends to
`Whenit receives a connection header,
`close the connection after the transfer; the browser is forbidden from sending further re-
`quests. The next sections describe the purposes of other headers,
`
`28.12 Negotiation
`
`Page 11 of 15
`
`In addition to specifying details about an item being sent, HTTP uses headers to
`permit a client and server to negotiate capabilities. The set of negotiable capabilities in-
`cludes a wide variety of characteristics about the connection (e.g., whether access is au-
`thenticated), representation (e.g., whether graphics images in jpeg format are acceptable
`or which types of compression can be used), content (e.g., whether text files must be in
`English), and control (e.g., the length of time a page remains valid),
`There are two basic types of negotiation: server-driven and agent-driven (ie.,
`browser-driven). Server-driven negotiation begins with a request from a browser. The
`tequest specifies a list of preferences along with the URL of the desired item. The
`server selects, from among the available representations, one that satisfies the browser’s
`preferences. If multiple items satisfy the preferences, the server makes a “‘best guess.’’
`For example, if a documentis stored in multiple languages and a request specifies a
`preference for English, the server will send the English version.
`Agent-driven negotiation simply means that a browser uses a two-step process to
`perform the selection. First, the browser sends a request to the server to ask what is
`available. The server returnsalist of possibilities. The browser selects one of the pos-
`sibilities, and sends a second request to obtain the item. The disadvantage of agent-
`driven negotiation is that it requires two server interactions; the advantage is that a
`browser retains complete control overthe choice.
`‘
`A browser uses an HTTP Accept headerto specify which media or representations
`are acceptable. The header lists names of formats with a preference value assigned to
`each. For example,
`
`
`
`Accept: text/html, text/plain; g=0.5, text/x-dvi; q=0.8
`
`specifies that the browseris willing to accept the text/html media type, but if that does
`not exist, the browser will accept text/x-dvi, and, if that does not exist, text/plain. The
`numeric values associated with the second and third entry can be thoughtof as a prefer-
`
`Cisco Systems, Inc.
`Exhibit
`1020
`
`Cisco Systems, Inc.
`Exhibit 1020
`Page 11 of 15
`
`
`
`Sec, 28.12
`
`Negotiation
`
`535
`
`ence level, where no value is equivalent to g=J, and a value of g=0 meansthe type is
`unacceptable. For media types where ‘‘quality’’ is meaningful (e.g., audio), the value
`of g can be interpreted as a willingness to accept a given media type if it is the best
`available after other forms are reduced in quality by g percent.
`A variety of Accept headers exist that correspond to the Content headers described
`eatlier. For example, a browser can send any of the following:
`
`Accept-Encoding:
`Accept-Charset:
`Accept-Language:
`
`to specify which encodings, character sets, and languages the browser is willing to ac-
`cept.
`
`To summarize:
`
`HTTP uses MIME-like headers to carry meta information. Both
`browsers and servers send headers that allow them to negotiate
`agreement on the document representation and encoding to be used.
`
`28.13 Conditional Requests
`
`
`
`HTTP allows a sender to make a request conditional. That is, when a browser
`sends a request, it includes a header that qualifies conditions under which the request
`should be honored. If the specified condition is not met, the server does not return the
`requested item. Conditional requests allow a browser to optimize retrieval by avoiding
`unnecessary transfers. The If-Modified-Since request specifies one of the most straight-
`forward conditionals — it allows a browser to avoid transferring an item unless the item
`has been updated since a specified date. For example, a browser can include the
`header:
`
`If-Modified-Since: Sat, 01 Jan 2000 05:00:01 GMT
`
`with a GET request to avoid a transferif the item is older than January 1, 2000.
`
`28.14 Support For Proxy Servers
`
`Proxy servers are an important part of the Web architecture because they provide
`an optimization that decreases latency and reduces the load on servers. However, prox-
`ies are not transparent —. a browser must be configured to contact a local proxy instead
`of the original source, and the proxy must be configured to cache copies of Web pages.
`For example, a corporation in which many employees use the Internet may choose to
`have a proxy server. The corporation configures all its browsers to send requests to the
`
`oe
`
`~
`
`Cisco Systems,Inc.
`~ Exhibit 1020 ~
`Page 12 of 15
`
`Cisco Systems, Inc.
`Exhibit 1020
`Page 12 of 15
`
`
`
`536
`
`Applications: World Wide Web (HTTP)
`
`Chap. 28
`
`proxy. The first time a user in the corporation accesses a given Web page, the proxy
`must obtain a copy from the server that manages the page. The proxy places the copy
`in its cache, and returns the page as the response to the request. The next time a user
`accesses the same page, the proxy extracts the data from its cache without sending a re-
`quest across the Internet. Consequently, traffic from the site to the Internet is signifi-
`cantly reduced.
`To guarantee correctness, HTTP includes explicit support for proxy servers. The
`protocol specifies exactly how a proxy handles each request, how headers should be in-
`terpreted by proxies, how a browser negotiates with a proxy, and how a proxy nego-
`tiates with a server. Furthermore, several HTTP headers have been designed specifical-
`ly for use by proxies. For example, one header allows a proxy to authenticate itself to a
`server, and another allows each proxy that handles an item to record its identity so the
`ultimate recipient receives a list of all intermediate proxies. Finally, HTTP allows a
`server to control how proxies handle each Web page. For example, a server can include
`the Max-Forwards header in a response to limit the number of proxies that handle an
`item before it is delivered to a browser. If the server specifies a count of one, as in:
`
`Max-Forwards: 1
`
`at most one proxy can handle the item along the path from the server to the browser. A
`count of zero prohibits any proxy from handling the item.
`
`28.15 Caching
`
`
`
`
`The goal of caching is improved efficiency: a cache reduces both latency and net-
`work traffic by eliminating unnecessary transfers. The most obvious aspect of caching
`is storage: when a Web pageis initially accessed, a copy is stored on disk, either by the
`browser, an intermediate proxy, or both. Subsequent requests for the same page can
`short-circuit the lookup process and retrieve a copy of the page from the cache instead
`of the server.
`The central question in all caching schemes concerns timing —- how long should
`an item be kept in a cache? On one hand, keeping a cached copy too long results in the
`copy becoming stale, which means that changes to the original are not reflected in the
`cached copy. On the other hand, if the cached copyis not kept long enough,inefficien-
`cy results because the next request must go back to the server.
`HTTP allows a server to control caching in two ways. First, when it answers a re-
`quest for a page, a server can specify caching details, including whether the page can be
`cached at all, whether a proxy can cache the page, the community with which a cached
`copy can be shared, the time at which the cached copy must expire, and limits on
`transformations that can be applied to the copy. Second, HTTP allows a browser to
`force revalidation of a page. To do so, the browser sends a request for the page, and
`uses a header to specify that the maximum ‘‘age’’ (i.e., the time since a copy of the
`page was stored) cannot be greater than zero. No copy of the page in a cache can be
`
`
`
`ae
`
`-
`
`-
`
`Cisco Systems,Inc.
`Exhibit 1020
`Page 13 of 15
`
`Cisco Systems, Inc.
`Exhibit 1020
`Page 13 of 15
`
`
`
`The World Wide Web consists of hypermedia documents stored on a set of Web
`servers and accessed by browsers. Each document is assigned a URL that uniquely
`identifies it; the URL specifies the protocol used to retrieve the document, the location
`of the server, and the path to the document on that server.
`The HyperText Markup Language, HTML, allows a document to contain text
`along with embedded commands that control formatting. HTML also allows a docu-
`ment to contain links to other documents.
`A browser and server use the HyperText Transfer Protocol, HTTP, to communi-
`cate. HTTP is an application-level protocol with explicit support for negotiation, proxy
`servers, caching, and persistent connections.
`
`
`
`|
`|
`4
`/
`.
`.
`
`.
`.
`
`|
`7
`|
`.
`
`|
`
`/
`-
`-
`.
`.
`
`
`
`|
`
`28
`
`cy
`0
`er
`e-
`i~
`
`1
`Q-
`L
`a
`le
`a
`le
`in
`
`A
`
`t-
`ig
`1e
`in
`id
`
`ld
`de
`ie
`O-
`
`o
`
`e o
`
`d m t
`
`o
`
`id.
`1e
`ye
`
`Sec, 28.15
`
`Caching
`
`.
`
`537
`
`used to satisfy the request because the copy will have a nonzero age. Thus, only the
`original server will answer the request.
`Intermediate proxies along the way will receive
`a fresh copy for their cache as will the browser that issued the request.
`To summarize:
`Caching is key to the efficient operation of the Web. HTTP allows
`servers to control whether and how a page can be cached as well as
`its lifetime; a browser can force a request for a page to bypass caches
`andobtain afresh copyfrom the server that owns thepage.
`
`28.16 Summary
`
`FOR FURTHER STUDY
`
`Berners-Lee, et. al. [RFC 1768] defines URLs. A variety of RFCs contain propo-
`sals for extensions. Daniel and Mealling [RFC 2168] considers how to store URLs in
`the Domain Name System.
`Berners-Lee and Connolly [RFC 1866] contains the standard for version 2 of
`HTML. Nebel and Masinter [RFC 1867] specifies HTML form upload, and Raggett
`[RFC 1942] gives the standard for tables in HTML.
`Fielding et. al. [RFC 2616] specifies version 1.1 of HTTP, which adds many
`features, including additional support for persistence and caching, to the previous ver-
`sion. Franks et. al, [RFC 2617] considers access authentication in HTTP.
`
`Cisco Systems,Inc.
`Exhibit [020
`Page 14 of 15
`
`
`
`|
`
`|
`
`||
`
`Cisco Systems, Inc.
`Exhibit 1020
`Page 14 of 15
`
`
`
`Extend the previous exercise.
`Web server? Why or why not?
`How does a browser distinguish between a document that contains HTML and a docu-
`ment that contains arbitrary text? To find out, experiment by using a browser to read
`from a file. Does the browser use the name ofthefile or the contents to decide how to
`interpret the file?
`Whatis the purpose of an HTTP TRACE command?
`28.4
`28.5|Whatis the difference between an HTTP PUT command and an HTTP POST command?
`Whenis each useful?
`28.6|When is an HTTP Keep-Alive header used?
`28.7.
`‘Can an arbitrary Web server function as a proxy? To find out, choose an arbitrary Web
`server and configure your browser to use it as a proxy. Do the results surprise you?
`Read about HTTP’s must-revalidate cache control directive. Give an example of a Web
`page that would use such a directive.
`If a browser does not send an HTTP Content-Length header before a request, how does a
`server respond?
`
`28.2
`
`28.3.
`
`28.8
`
`28.9
`
`
`
`538
`
`Applications: World Wide Web (HTTP)
`
`_
`
`Chap. 28
`
`EXERCISES
`
`28.1
`
`Read the standard for URLs. What does a pound sign (#) followed by a string mean at
`the end of a URL?
`
`
`
`
`
`Is it legal to send the pound sign suffix on a URL to a
`
`Cisco Systems,Inc.
`Exhibit
`T020
`Page 15 of 15
`
`Cisco Systems, Inc.
`Exhibit 1020
`Page 15 of 15
`
`