`
` fe
`
`
`
`ME wanninc
`
`001
`
`Apple Inc.
`Apple Inc.
`APL1009
`APL1009
`U.S. Patent No. 8,724,622
`U.S. Patent No. 8,724,622
`
`001
`
`
`
`Illustrated Guide
`to HTTP
`
`PAUL S. HETHMON
`
`MANNING
`
`Greenwich
`(74° w. long.)
`
`002
`
`
`
`For electronic browsing and ordering ofthis book, see hrep://www.browsebooks.com,
`
`The publisher offers discounts on this book when ordered in quantity. For more
`information, please contact:
`
`Special Sales Deparcment
`Manning Publications Co.
`3 Lewis Srreer
`Greenwich, CT 06830
`
`Fax: (203) 661-9018
`email: orders@manning.com
`
`©1997 by Manning Publications Co.All rights reserved.
`
`Nopart of this publication may be reproduced,stored in a retrieval system, or
`transmitted, in any form or by meanselectronic, mechanical, photocopying, or
`otherwise, withourprior written permission ofthe publisher.
`
`@ Recognizing the importance of preserving what has been written,it is
`Manning's policy to have the books it publishes printed on acid-free paper, and
`we exert our best efforts to thar end.
`
`Manyof the designations used by manufacturers and sellers ro distinguish their
`products are claimed as trademarks, Where those designations appear in the book,
`and Manning Publications was aware of a trademark claim, the designations have
`been printed in initial caps or all caps.
`
`Library of Congress Cataloging-in-Publication Data
`Hethmon, Paul S.
`Illustrated guide to HTTP / Paul S, Hethman.
`Pp.
`cm.
`Includes bibliographical refrences and index.
`ISBN 1-884777-37-6
`L. Hypertext systems. 2. HTTP (Computer network protocol)
`L. Title.
`QA76.76.H94H484 1997
`004.6'2—de21
`
`97-1596
`cip
`
`a Manning Publications Co.
`3 Lewis Street
`Greenwich, CT 06830
`
`Copyeditor: Maggie Mitchell
`Typesetcer: Dorothy Marsico
`Cover designer: Leslie Haimes
`
`Printed in the United Staces ofAmerica
`123456789 10-CR- 060 99 98 97
`
`003
`
`003
`
`
`
`chapter 2
`
`HTTP overview
`
`8
`
`2.1 What is the World Wide Web?
`2.2 General operation
`10
`2.3 Abitofhistory 12
`2.4 HTTP/1.1
`16
`2.5 Finishing 24
`
`004
`
`004
`
`
`
`CHAPTER 2 HTTP OVERVIEW
`
`
`2.1 Whatis the
`World Wide Web?
`
`Just what is the World Wide Web? During the last few years, just about every-
`body has defined wharit is (and isn’t). I'm not going to add another definition
`here, but if you are reading this book you shouldbe familiar enough with it, Dis-
`regarding any definition, the World Wide Web has become one of the most
`important information technologies of the nineties.
`
`2.1.1 The client/server model
`
`the World Wide Web is the largest client/
`From a programmer's viewpoint,
`server system implemented to date. It is made up of innumerable clients and
`servers, all exchanging information. In a typical client/server system, a propri-
`etary client talks to a proprietary server to accomplish sometask. The task might
`be a sales order system for a mail order firm, or a data mining system for corpo-
`rate executives. The Web changes things a bit, making them more complicated
`and simple at the same time. The simple part comes from the open, well-defined
`protocols used between the clients and the servers. The complicated part comes
`from theloss of extensive programmer-defined protocols.
`Let me explain the latter a lirtle more thoroughly. If you were given the rask
`of writing an application to handle order entry, you would typically define the
`types of transactions to occur berween yourclient and server. A typical exchange
`might be to look up a description of an item in the catalog. The client would
`make a connection to rhe server, send a request which might be binary or plain
`text, and then would receive the reply which would typically be plain text. The
`reply might contain binary data also, such as a picture. Given a TCP/IP environ-
`ment using sockets, the client would make a connection to a port on which the
`serveris listening. Then it would send a packet of information to theserver. In
`order to make interpreting the data easier, you might have defined a structure for
`request packets that consist of 4 bytes for a numerical request code. The server
`then knows to read 4 bytes from the socket and then interpret accordingly.
`When the server sends the response ro the client, the client knows to expect a
`
`005
`
`005
`
`
`
`WHAT 1S THE WORLD WIDE WEB?
`
`
`certain type of reply, (See Figure 2.1). In this case, you've defined a header of
`4 bytes that contains the length of the description (in plain text), and the
`description immediately follows the header. If data follows the description, then
`the 4 bytes after it are the length of the binary data, the picture of the irem, Once
`the binary dara has been received, the server closes the connection and the trans-
`action is finished.
`
`4-byte request packel
`
`transaction
`
`Figure 2.1 Client/server
`
`In this scenario, you as the programmer, had the utmost flexibility. You
`were able to define the exact messages and the format of the replies to them.
`Being able to do this makes your code very efficient. You don’t have to interpret
`the transactions to any extent. You are able to minimize the amountof network
`traffic you generate and maximize the amountof data in each transaction. Con-
`tinuing on with your application, you can quickly define and implementall of
`the transactions yourclient and server need to know for proper response.
`Bur a couple of months down the road, the word comes down from the IS
`department that your nifty client/server application also needs to run under
`Windows 95 and OS/2 as well as the Mac client you originally wrote. So now
`you've got to go back and program two new clients and have the possibility of
`doing more in the furure. Ir would have been nice to write a single client which
`would run on all of the possible operating systems. This is where HTTP comes
`into play. Instead of writing clients for every possible operating system, you can
`use a Web client such as Netscape Navigator, along with a Webserver, to build
`your client/server system.
`
`006
`
`006
`
`
`
`CHAPTER 2 HTTP OVERVIEW
`
`
`Routines are a bit different in the Internet world however. In your original
`client/server application you had the freedom to define your own messaging
`standards. Now, someoneelse is going to give you the blueprint to work from in
`the form of an RFC. As mentioned previously, RFC is short for Request For
`Comments. RFCs are the technical documents which describe the protocols in
`use on the Internet. HTTP is the protocol used to send and receive messages
`between Web clients and servers. HTMLis the protocol used to create the Web
`pages sent as the data resource of the HTTP message. The twoare closely related
`but distinct. The latest RFCs are on the CD-ROM accompanying this book,
`The principal USrespository for RFCs is held at the Internic, the agency respon-
`sible for domain registrations, among other
`functions. The Web site is
`www. internic.net. From the main page, follow the prompts to the Directory
`and Database Services and from there to the RFCinformation.
`
`2.2 General operation
`HTTP is a request-response type of protocol. The client application sends a
`request to the server and then the server responds to the request. In HTTP/0.9
`and HTTP/1.0, this was generally accomplished by making a new connection
`for each request. HTTP/1.1 introduces persistent connections as the default
`behavior. With persistent connections, the client and server maintain the con-
`nection, exchanging multiple requests and responses until
`the connection is
`explicitly closed by one. Even with persistent connections, HTTP remains a
`stateless protocol. No informationis retained by the server between requests.
`There are three general request-response chains in which HTTP operates.
`The first is when a user agent makes a request directly to the origin server as
`shown in Figure 2.2 herein, In this scenario, the user agent makes a connection
`directly to the origin server on the default port of 80 (unless otherwise specified)
`and sends its request. The server will be listening for incoming connections and
`start a new threador process to serve the new request. Once the request has been
`processed, the server sends the response back over the connection.
`The second request-response chain involves a proxy or cache agent as an
`intermediary. In this scenario,
`the user agent makes its request to the proxy
`
`10
`
`007
`
`007
`
`
`
`GENERAL OPERATION
`
`
`User agent
`
`I
`
`TGP port 40
`
`HTTP server
`
` Raquast massage
` Figure 2.2 Basic client to server
`
`
`
`HTTP operation
`
`instead ofto the origin server (See Figure 2.3). The proxy then makes the request
`to the origin server on behalf ofthe client. The server replies to the proxy, and
`then the proxy relays this to the user agen, thus fulfilling the request. This type
`operation is mostly seen in firewall environments wherethe local LAN is isolated
`from the Internet. An alternate on this procedures is for the intermediate agent
`to also serve as a caching agent.
`When making a request through the cache agent, the cache agent tries to
`serve the response from its internal cache ofresources. The cache itselfsaves any
`response it receives, if the response is a cachable one. This shortens the request-
`response chain, improves response time, and reduces network load. Most proxy
`agents are also caching agents,
`The final scenario is one involving an intermediate agent, acting as a tunnel.
`A tunnel blindly funnels
`requests and responses between two HTTP
`applications. As shown in Figure 2.4,it is, in essence, providing a path for the
`user agent to the server.
`A tunnelis different from a proxy in how it operates. A tunnel is simply a
`mechanism via which the user agent sends requests and receives responses from
`an origin server, The tunnel itself does nothing to the requests, unlike a proxy
`
`User agent
`
`Request to proxy
`
`
`
`Response to proxy
`
`Figure 2.3 Client to proxy to server HTTP operation
`
`il
`
`008
`
`008
`
`
`
`CHAPTER 2 ATTP OVERVIEY
`
`Figure 2.4 Client to server via tunnel HTTP operation
`
`which may rewrite certain headers or require authentication from the user before
`providing services, A tunnel would be used most often to route HTTP traffic
`over a non-TCP/IP link.
`Past the three basic request-response chains, anyone can put together any
`combination of intermediate agents. It is entirely reasonable for a user agent to
`send a request to a proxy, which sendsit through a cunnel which reaches another
`proxy, and finally makes it to the origin server. Through all of this, the basic idea
`still maintains the request-response paradigm, although ir may make many
`contortions along the way. Next, we will need to look in depth at the specific
`operation of HTTP.
`
`2.3 A bit ofhistory
`Before we delve into HTTP/1.1, a bit of backgroundis in order. In this section
`we'll examine the previous versions of HTTP: HTTP/0.9 and HTTP/1.0.
`HTTP/1.1 is a response to those established previous versions—their strengths
`and their shortcomings.
`
`23.4 HTTPIO9
`
`The first implementation of HTTP is now known as HTTP/0.9. The entire
`description of that protocol encompasses only a few pages. In HTTP/0.9, a
`
`i2
`
`009
`
`009
`
`
`
`A BIT OF HISTORY
`
`
`client program makes a connection to the server on TCP port 80. The client
`then sends its request in the following form:
`
`GET document.html CRLF
`
`The request starts with the word GET. No other methods are supported. A
`space character is then sent, followed by the document name. The document
`name may be fully qualified and is not allowed to have any spaces. To end the
`line, the client should send a carriage return line feed combination. ‘The specifi-
`cation mentions that servers should be tolerant of clients by only transmitting
`the line feed.
`Oneother option is allowed for the document name. The client may send a
`search request by appending a question mark, followed by a search term. Multi-
`ple search terms may be specified by putting a plus sign between each. This type
`request should only be generated when the document specified contains the
`ISINDEX HTMLtag. This allows a requestof;
`
`GET document .html?help+me CRLF
`
`For the reply, the server returns the contents of the document. There is no
`content information, MIME type, or any other information returned to thecli-
`ent. The protocolis, in fact, restricted to sending only HTMLtext documents.
`When the documenthasbeen sent, the server closes the connection to signify the
`end of the document. This is necessary since no length informationis exchanged
`between the server and client. When sending the document,the server delimits
`each line by an optional carriage return, whichis then followed by a mandatory
`line feed character.
`As can be seen from this description, implementing the HTTP/0.9 protocol
`can be done in a few dozen lines of code. The problem, however, was the
`limitation it imposed. Only text documents could be served and there was no
`methodfor the client to submit information to the server.
`
`232 ALLTHLO
`
`It has only
`The HTTP/1.0 protocol was developed from 1992 to 1996.
`appeared as an Informational RFC as recently as May 1996. Before that point,
`
`13
`
`010
`
`010
`
`
`
`CHAPTER 2 HTTP OVERVIEW
`
`
`HTTP/1.0 was based on what the major Web servers and clients did, Since
`RFC 1945 is only an informational RFC, it does not actually specify an official
`standard of the Internet. It does, however, describe the common usage of
`HTTP/1.0 and provides the reference for ourserver's later implementation via
`the enclosed CD.
`HTTP/1.0 developed from the need to exchange more than simple text
`information. It became a way to build a distributed hypermedia information
`system adapted to many needs and purposes. From 1994 to 1997, the Web
`developed from a forum in which computer science departments could showcase
`their research into a center where everyone has a Web page. Infact half of the
`television commercials today include a URL. In order for this to happen, HTTP
`expanded tremendously from its original specification.
`The first major change from the HTTP/0.9 specification was the use of
`MIME-like headers in request messages and in response messages. Ontheclient
`side, the request message grew from the oneline request to a structured, stable
`multi-line request:
`
`Full-Request =Request-Line
`*( General-Header
`Request-Header
`Entity-Header
`CRLF
`
`|
`|
`
`)
`
`[ Entity-Bedy ]
`Request-Line = Method SP Request-URI SP HTTP-Version CRLF
`
`The added headers resulted from the need to transmit more information in
`the request. For clients, this information included sending preferences for the
`type of informarion desired. ‘his was expressed in terms of MIME media types:
`terms such as text/html and image/gif were initiated so clients and servers
`could send information each could understand and use. The additional headers
`also let clients implement conditional retrievals using the Tf-Modified-Since
`header. This header allows the client to request that the resource be returned
`only if it has changed since the given date, With this, clients could cache fre-
`quently requested pages and update them only when necessary, thus saving valu-
`able time and bandwidth.
`On the server side, the server was finally allowed to send back content infor-
`mation, along with the resource, In HTTP/0.9, only the resource was sent. With
`
`l4
`
`011
`
`011
`
`
`
`A BIT OF HISTORY
`
`
`the expanded response syntax, the server could now tell the client exactly what
`type information was in the resource and, finally, substantially send more than
`HTMLdocuments:
`
`Full-Response = Status-Line
`*( General-Header
`Response-Header
`Entity-Header }
`CRLF
`
`|
`|
`
`{ Entity-Body ]
`Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
`
`The addition of the Content-Type headerallowed the server to include the
`media type of the resource. Along with the original HTML documents, images
`and audio files became popular and commonplace as forms ofinformation to
`present on a Website.
`The next HTTP change was the definition of new request methods. Along
`with the original GET request, HEAD and POST were now allowed. The HEAD
`request allows a client application to request a resource and receive all of the
`information about the resource without actually receiving the resource. This had
`uses for Web robots and spiders, which traverse links to gather update informa-
`tion and detect broken links. The Post method is what broughtreal interactivity
`to the Web. Now clients had a way to send substantial information to a server
`for processing. The Get method had been used arfirst as a way to transmit infor-
`mation to a server, but was limited by the amountof information a server would
`accept as part of the request -URI.* Now with post,virtually unlimited entity
`bodies could be sent in a request message. With this, came the use of the Web
`for inputting information: order forms, surveys, and requests could be made
`from a Web page.
`Servers also gathered theability to respond with a status code to theclient's
`request, The infamous 404 Not Foundstatus code could now be sent whenever
`the resource was not present. Beyond this, the server could also respond with
`200 to indicate a general success response, 302 to indicate a resource had moved
`temporarily to a different location, 401 to indicate authorization was required,
`or 500 to indicate a general server error while trying to fulfill the request.
`
`* Uniform Resource Identifiers (URIs) are covered in Chapter 3.
`
`Ly
`
`012
`
`012
`
`
`
`CHAPTER 2 HITP OVERVIEW
`
`
`The 401 Unauthorized status code leads us into the final point to make
`about HTTP/1.0. It
`introduced the idea of restricted access to resources. A
`server could require a client to supply a username and password before returning
`certain resources. The idea of basic authentication allowed someone to build a
`Web site with private information. Information could be restricted to a certain
`person or group of people. This also allowed a Web site to track a person
`throughouthis visit. This ability permits a site to create a shoppingcart for a user
`to track the items he wishes to purchase through multiple pages. At the end of
`the visit, the server can supply the completelist of items the user has selected.
`Given the srareless nature of HTTP,rhis allows commerce to flourish much eas-
`ier on the Web.
`From these enhancements to the protocol, HTTP developed from a simple
`information retrieval system into a general purpose transaction system capable of
`building quite complex systems with standard applications across multiple plat-
`forms. With this success came problems. Users demanded faster loading of
`pages, which led to clients making multiple connections to a single server. The
`higher number of connections led to bandwidth and server overload at times.
`Problems also appeared as more vanity servers appeared on the Internet. Servers
`which host multiple virtual domains on a single machine required a unique IP
`address for each virtual domain to identify each to the software. This has caused
`the finite supply of IP addresses to dwindle just a bit faster. Problems also arose
`as caching agents were introduced. Servers did not have a good way to specify
`what could and could notbesafely cached, which led many sites to use cache-
`busting techniques, which prohibit a cache agent from being able to cache a par-
`ticular response. Throughout 1995 and 1996, the IETF/HTTP Working Group
`worked ro develop HTTP/1.1 to build upon HTTP/1.0, improve HTTP’s gen-
`eral capabilities, and fix some of the problems which had appeared.
`
`24 HITP/11
`
`In operation, HTTP/1.1 closely resembles HTTP/1.0, It still consists of the
`request-response paradigm and
`is highly compatible with HTTP/1.0
`
`16
`
`013
`
`013
`
`
`
`ALTEEl
`
`
`applications, There are seven areas we'll discuss here about how HTTP/1.1 dif-
`fers from HTTP/1.0:
`
`¢ New request methods
`® Persistent connections
`
`® Chunked encoding
`® Byte range operations
`* Content negotiation
`* Digest Authentication
`* Caching
`
`2.4.1 New request methods
`The HTTP 1.1 specification has defined two new methods which are highly
`beneficial to the end user: put and prLETE. The pur method allows a user agent
`to request a server to accept a resource andstore it as the request-URI given by
`the client. This method allows a user agent to update or create a new resource on
`a server. In use, an HTMLeditor might implementthis as a way for the user to
`maintain pages on a Website. The user could create the pages and have them
`automatically updated by the editor, Notice thar this behavioris different from
`the previously available post method. Using Post, the user agent was requesting
`the resourceidentified by the request-URI to accept the entity sent by the client.
`In essence, it was viewed as subordinate to the request-URI. The put methodis
`asking the server to accept the entity as the request-URI. Another use of this
`method mightinclude implementing an HTTPbasedrevision control system.
`The DELETE methodis self-explanatory: the user agentis requesting that the
`request-URI be removed from the server. Along with put, there is now a stan-
`dard method to implement Web based editing. ‘he protocolspecification spe-
`cifically allows the server to defer the actual deletion of a resource when it
`receives a request. It should move the resource to a nonaccessible location how-
`ever. This relaxation allows a server to save deleted resources in a safe place for
`review before final deletion and should probably be implemented in this way by
`
`L7
`
`014
`
`014
`
`
`
`CHAPTER 2 HTTP OVERVIEW
`
`
`any server. Both the DELETE and PuT methods allow a user agent to create,
`replace, and delete resources on a server. Because ofthis, access to both methods
`should be controlled in some manner, either using IP address based restrictions
`or via one of the authentication methods within HTTP.
`The oprrons method is used to query a server about the capabilities of a
`specific resource or about the server in general. A user agent can make an
`OPTIONS request against a specific resource to find out which methodstheserver
`supports when accessing the resource. The response returned by the server
`should include any communications related information about the resource.
`Typical information in the response would include an Allow headerlisting the
`supported methods when requesting the resource. A user agent may also make a
`general OPTIONS request of the server and receive the same information as it
`applies to the server as a whole.
`The final method added, TRACE, is used for debugging purposes at the appli-
`cation level. A client program can use the method to haveits original request
`echoed backro ir. Using this information, the client can debug problems which
`might occur to an origin server when several
`intermediate agents handle its
`request. In use, an HTTP traceroute can be accomplished byletting the request
`advance one server at a time, checking the response back from each.
`
`2.4.2 Persistent connections
`
`As mentioneda bitearlier, in the quest for user satisfaction, Web browsers began
`making multiple connections to origin servers in order to speed up response
`times. Unfortunately, this led to some major congestion since a few clients could
`quickly bog downa slow link. The practice also suffered from the inherit mecha-
`nisms of making TCP connections where setup time can usurp a good portion of
`the total connection cycle. Starting with HTTP/1.1, the protocol implements, as
`a default behavior, the practice of persistent connections. This means that once a
`client and server open a connection, the connection remains open until one or
`the other specifically requests that it be closed. While open, the client can send
`multiple, but separate, requests and the server can respond to them in order. Cli-
`ents are also free co send multiple requests without waiting for the responses,
`
`Ié
`
`015
`
`015
`
`
`
`ATTP/121
`
`
`In practice, a client might do this when
`basically pipelining the requests.
`requesting all of the graphic images from a particular page. It can also make the
`requests for the images, one after the other, and then finally listen for the
`responses from the server. Implemented well, response time to the users will be
`high, without the inefficiencies of individual requests.
`
`2.4.3 Chunked encoding
`One problem arises for servers when persistent connections become the default
`behavior: they must now return a proper Content-Length header with each
`response. Previously, servers could signify the end of the entity body by simply
`closing the connection. With persistent connections, the server can no longer do
`this and must be able to determine the length of any entity it sends co the client.
`For mostresources, this is not a problem. The length of H'I'ML,and imagefiles
`can be determined through the operating system. Where trouble arises is in
`dynamically generated responses.
`Fortunately, HTTP/1.1 also provides a solution: chunked encoding, Using
`chunked encoding, a server or CGI process can send back an entity body of
`unknown initial length by sending it back in chunks of known length. We'll dis-
`cuss the details in a later chapter, but Figure 2.5 shows the basic format.
`As shown,the server sends the size of the upcoming chunk in bytes and then
`the actual chunk ofdata. This is repeated until all the data is sent. Onceall of the
`data is sent, a final size of0 is sent, indicating the end ofthe data, Followingthis,
`the server may optionally send footers, or header fields which are allowed to be
`sent after the entity body. With this method,it becomes easy for a server to send
`dynamically generated data and easy for the client to decodeit.
`
`format
`
`Figure 2.6 Chunked encoding
`
`Id
`
`016
`
`016
`
`
`
`CHAPTER 2 ATTP OVERVIEW
`
`
`
`2.4.4 Byte range operations
`Another optimization and convenience introduced is byte range operations, I'm
`sure almost everyone has experienced trying to download the latest beta software
`from a favorite vendor, only to have the connection fail with 100 bytes to go
`(out of 5 MB,ofcourse), At that point, download is attempted again, hoping for
`the best. Now, the user agent can just ask for the last 100 bytes of the resource
`instead of asking for the entire resource again. This can improve both the mood
`and response time. When requesting a byte range, a client makes a request as
`normal, bur includes a Range header specifying the byte range the resource is to
`return. The client may also specify multiple byte ranges within a single request if
`it so desires. In this case,
`the server returns the resource as a multipart/
`byteranges media type.
`The use of byte ranges is not limited to recovery of failed transfers. Certain
`clients may wish to limit the number of bytes downloaded prior to committing a
`full request. A client with limited memory, disk space, or bandwidth can request
`the first so-many bytes of a resource to ler the user decide whether to finish the
`download. Servers are not required to implement byte range operations, butit is
`a recommendedpart of the protocol.
`
`2.4.5 Content negotiation
`There are times when a server may hold several different representationsofa sin-
`gle resource in orderto serve clients better, The alternate representations may be
`national language versions of a page or a resource whichis available, both in irs
`regular media type and as a gziped version. In order to provide to theclient the
`best representation, content negotiation may be performed. ‘This can take the
`form ofserver-driven, agent-driven, or transparentnegotiation.
`The first form, server-driven negotiation, is performed on theorigin server,
`based on the client’s request. The server will
`inspect
`the various Accept-*
`headers a client may send and, using this information plus other optional infor-
`mation, send the best response to the client. This allows the client
`to send
`Accept, Accept-Charset, Accept-Language, of any combination of the
`Accept-* headers, stating their preference for responses. When servers perform
`
`20
`
`017
`
`017
`
`
`
`ATTPHL
`
`
`this negotiation, they must then send a Vary headerto theclient stating over
`which parameters the server chose the particular resource. The vary headeris
`required to be returned in order to provide caches with enough information to
`properly determine which furure requests maybesatisfied by the response.
`The second form of content negotiation is agent-driven. In this approach,
`the server provides to the user agent the information needed to pick the best rep-
`resentation of the resource. This may come in the form of the optional
`Alternates header or
`in the entity body to the initial
`response. The
`Alternates header is mentioned in the appendices to the HTTP protocols, but
`the exact definition will be provided in a later specification thereto, Using either
`approach allows the server to provide a list of choices to the user agent. The user
`agent may then auromarically, or with user input, pick the best representation,
`The final form is called transparent negotiation. In transparent negotiation,
`an intermediate cache provides server-driven negotiation, based on the agent-
`driven information from the server. In more concrete terms, the cache has the
`agent-driven negotiation information from the server for a particular resource
`with multiple representations, Assuming the cache understandsall of the ways in
`which the representations vary,
`it may pick the best response when a client
`request is received. This allows an off-loading of server duties onto cache agents
`and improves response time to clients while providing accurate responses.
`
`2.4.6 Digest Authentication
`Digest Authentication is included in HTTP/1.1 as a replacement for Basic
`Authentication. Basic Authentication suffers from the problem of passing the
`user's passwordin clear text across the network, With Digest Authentication, the
`passwordis kept as a shared secret berween theclient and server. The server and
`client compute a digest value, using the MD5* (Message Digest 5) algorithm
`over a concatenationofthe secret password and a few other values. This digestis
`then sent across the network. Since only the client and server know the secret
`password, the client can compute the digestvalue, sendit to the server, and then
`the server can verify it against the information it holds. Since no one else knows
`
`" MD5is detailed in RFC1321,
`
`|
`
`018
`
`018
`
`
`
`CHAPTER 2 HTTP OVERVIEW
`
`
`the secret password, authenticity is more secure.‘l'his algorithm is similar to the
`POP3 protocol’s APOP method ofauthentication.
`Digest Authentication is still only a reasonably secure method, however. It
`still requires an outside mode of exchanging the password between clients and
`servers. Digest Authentication, therefore,
`is meant solely as a replacement for
`Basic Authentication.
`
`2.4.7 Caching
`The caching model in HTTP/1.1 allows the server a great deal of control over
`the caching of responses. First, the specification makes it clear what is cachable
`and whatis not. Generally speaking, only Gey or HEAD responses are cachable;
`responses to any other method must be explicitly marked as cachable by the
`server. The protocol uses the Cache-Control header
`to transmit caching
`instructions from servers and clients to caches.
`For servers, the cache controldirectives can be segregated into five groups:
`whatis cachable, what is not cachable, how old it can be, don’t serve anything
`past its age, and don’t transform. In thefirst group are directives which allow an
`origin server to explicitly mark something as cachable when it normally would
`not be. This can be used to allow caching of authenticated responses or responses
`to POST requests. An example of a cachable PosT request mightbe the results of
`a search engine on a Website. Under many circumstances, the results from a
`search would remain valid for several hours or even a few days. If the response is
`cachable andserves one other client request, the server has off-loaded some work
`onto cache agents.
`The whatis not cachable group ofdirectives include the no-cache and no-
`store directives. Basically,
`these directives instruct the cache agents to never
`save a response which includesthe directive. The no-cacheapplies ro responses
`only, while the no-store applies to both the request and response messages.
`The ne-store directive can be thought ofas the stronger. It instructs caches to
`remove the request/response from volatile storage (i.¢., memory) as soon as pos-
`sible and to neverstore it in nonvolatile storage (i.e., hard disk).
`A server who wishes to control how long a response may be cached will use
`the max-age directive, This directive sets a tueme limit from when it is served to
`
`22
`
`019
`
`019
`
`
`
`ATTPA)
`
`
`when the response is considered stale. A client maystill request a cache return of
`a response, even though it has becomestale, In these situations, the server can
`includea directive from the don’t serve anythingpast its age group. These direc-
`tives (must-revalidate and proxy-revalidate) instruct a cache to revali-
`date a response with the origin server to make certain it is still