`you need to
`master Web
`development!
`
`
`
`
`
`
`
`
`‘5“!
`
`Understand the Web development
`process—Fplanning, analysis,
`design, implementation, and
`maintenance
`
`Learn to use tools like HTML,
`
`VRML, and Java to create enticing
`Web content
`
`Master advanced CG] gateway
`programming techniques with C
`Perl‘ and REXX
`
`saws
`net
`
`
`
`UNLEASHED
`
`
`Starbucks Corp. Exhibit 1019
`
`Starbucks Corp. Exhibit 1019
`
`
`
`Unleash the power of HTML & CGI!
`li Learn the thought process and planning
`involved with Web site design
`
`111 Master advanced HTML, forms, multi-media,
`and image maps
`
`E Learn gateway programming techniques with
`C, Peri, and REXX
`
`• Explore future technologies like VRML and
`Java
`
`II Get full coverage of planning, analysis,
`design, HTML implementation, and gateway
`programm1ng
`
`~
`
`Jl Study real-world applications including a Web
`coloring book and hypertext news interface
`
`Written by a team of experts, HTML d- CGJ
`Unleashetl is your complete guide to professional
`Web content development. Every phase and tool of
`the development process is covered, as well as every
`environment.
`
`is co-author of the best-selling book,
`World Wide Web U~tle11shetl and publishes widely
`used and frequently accessed World Wide Web-based
`documents and publications on the Internet and Web.
`is responsible for the development
`and daily operation of New York University's EDGAR
`Web server, which disseminates publicly traded
`corporate filings.
`HTML d- CGI Unle11sbetl includes
`several contributed chapters written by experts in
`key Web development fields.
`
`ISBN 0-672-30745 - 6
`
`90000
`
`9 780672 307454
`
`Starbucks Corp. Exhibit 1019
`
`
`
`
`
`UNLEASHED
`
`jolm December and M'ar/e Ginsburg
`
`Jmli..u|.ip-:..1H~..
`
`1
`
`Starbucks Corp. Exhibit 1019
`
`Starbucks Corp. Exhibit 1019
`
`
`
`For my grandparents, Isabelle and joseph December, and Aili and
`Arthur Hill.
`
`Copyright © 1995 by Sams.net Publishing
`
`FIRST EDITION
`
`All rights reserved. No part of this book shall be reproduced, stored in a
`retrieval system, or transmitted by any means, electronic, mechanical,
`photocopying, recording, or otherwise, without written permission from the
`publisher. No patent liability is assumed with respect to the use of the
`information contained herein. Although every precaution has been taken in
`the preparation of this book, the publisher and author assume no responsi(cid:173)
`bility for errors or omissions. Neither is any liability assumed for damages
`resulting from the use of the information contained herein. For informa(cid:173)
`tion, address Sams.net Publishing, 201 W. 103rd St., Indianapolis, IN
`46290.
`
`International Standard Book Number: 0-672-30745-6
`
`Library of Congress Catalog Card Number: 95-69417
`
`98 97 96 95
`
`4 3 2
`
`Interpretation of the printing code: the rightmost double-digit number is
`the year of the book's printing; the rightmost single digit, the number of the
`book's printing. For example, a printing code of 95-1 shows that the first
`printing of the book occurred in 1995.
`
`Composed in AGaramond, Futura, and MCPdigital by Macmillan Computer
`Publishing
`
`Printed in the United States of America
`
`Trademarks
`
`All terms mentioned in this book that are known to be trademarks or service
`marks have been appropriately capitalized. Sams.net Publishing cannot
`attest to the accuracy of this information. Use of a term in this book should
`not be regarded as affecting the validity of any trademark or service mark.
`
`President, Sams Publishing Richard K Swadley
`Publisher, Sams.net Publishing George Bond
`Managing Editor Cindy Morrow
`Marketing Manager
`john Pierce
`
`Acquisitions Editor
`Mark Taber
`
`Development Editor
`Dean Miller
`
`Software Development
`Specialist
`Steve Flatt
`
`Production Editor
`Katharine Stuart Ewing
`
`Copy Editors
`Susan Christophersen
`Mitzi Foster Gianokos
`Chuck Hutchinson
`
`Technical Reviewer
`Alan Richmond
`
`Editorial Coordinator
`Bill Whitmer
`
`Technical Edit Coordinator
`Lynette Quinn
`
`Formatter
`Frank Sinclair
`
`Editorial Assistant
`Carol Ackerman
`
`Cover Designer
`jason Grisham
`
`Book Designer
`Alyssa Yesh
`
`Production Team Supervisor
`Brad Chinn
`
`Production
`Carol Bowers, Mona Brown,
`Terrie Deemer, Cheryl
`Dietsch, Michael Dietsch,
`Greg Eldred, Michael Henry,
`Ayanna Lacey, Kevin Laseau,
`Paula Lowell, Steph Mineart,
`Nancy C. Price, Brian-Kent
`Proffitt, SA Springer, Tim
`Taylor, Mark Walchle
`
`Starbucks Corp. Exhibit 1019
`
`
`
`Principles of
`Gateway
`•
`Programmmg
`
`IN THIS CHAPTER
`
`Transmission Control Protocol-Internet
`Protocol (TCP-IP) 374
`
`Why Do We Need HTTP? 376
`
`A Closer Look at the Hypertext Transport
`Protocol (HTTP) 377
`
`• What Is the Common Gateway
`Interface? 382
`
`• The Flow of Data Using the Common
`Gateway Interface 382
`
`A Brief Introduction to Data Passing and
`Methods 385
`
`• CGI: An Expanding Horizon 388
`
`• Typical Hardware and Server Software
`Platforms 388
`
`Starbucks Corp. Exhibit 1019
`
`
`
`Gateway Programming
`Part IV
`
`In this chapter, I start with principles, including a brief description of the Internet protocols
`that enable the World Wide Web in general and gateway programming in particular: Trans(cid:173)
`mission Control Protocol-Internet Protocol (TCP-IP) and the Hypertext Transport Protocol
`(HTTP).
`
`The Web can be thought of as a distributed information system. It is capable of supporting,
`seamlessly and globally, rapid and efficient multimedia information transfer between informa(cid:173)
`tion content sites ("servers") and information content requesters ("clients"). The servers are
`distributed in the truest sense of the word because there is no geographic constraint whatso(cid:173)
`ever on their location. The reader should pay particular attention to three critical properties of
`HTTP: its statelessness, its built-in mechanisms for an arbitrarily rich set of data representa(cid:173)
`tions (that is, its extensibility), and its use of the connectionless TCP-IP backbone for data
`communication.
`
`The chapter then moves on to the Common Gateway Interface (CGI). Important fundamen(cid:173)
`tal terminology is introduced, such as the "methods" that HTTP supports. The advantages
`that the CGI environment affords both information requesters and information providers are
`discussed and illustrated with short Perl programs.
`
`Finally, typical hardware and software choices for Web sites are reviewed and the stage is set
`for the examples that I present in Chapters 20 to 25.
`
`Transmission Control Protocol-Internet
`Protocol (TCP·IP)
`
`It's not necessary to be a "propeller-head" (although it helps!) to grasp the essentials ofTCP(cid:173)
`IP. From the standpoint of the Web developer, here's what you really have to know:
`
`•
`
`• TCP guarantees end-to-end transmission of data from the Internet sender to the
`Internet recipient. Big data streams are broken up into smaller "packets" and reas(cid:173)
`sembled when they arrive at the recipient's site. Mercifully, this breakdown and
`reassembly are transparent to Internet users.
`IP gives you the familiar addressing scheme of four numbers, separated by periods. For
`example, the NYU EDGAR development site has an IP address of 128. 122. 197. 196. If
`the user always had to type in these numbers to invoke an Internet service, the world
`would be a gloomy place, but of course the Internet provides Domain Name Service
`(DNS)-and so the EDGAR machine has a friendlier name, edgar. stern. nyu. edu.
`TCP-IP is a connectionless protocol. This means that the route of data from the sender
`to the recipient is not predetermined. Along the way, the packets of data may well
`encounter numerous routing machines that use algorithmic methods for determining
`the next "packet hop"-each packet makes its own way from router to router until the
`final destination is reached.
`
`Starbucks Corp. Exhibit 1019
`
`
`
`Principles of Gateway Programming
`Chapter 19
`
`375
`
`• TCP is an open protocol (that is, it's not proprietary or for-profit). Openness means
`that Internet users are not beholden to a commercial vendor for supporting or
`enhancing the TCP-IP standard. There are well-established standards review proce(cid:173)
`dures, participating engineering groups such as the IETF (Internet Engineering Task
`Force), and draft standards online (known as "Requests for Comment," or RFCs) that
`are freely available to all. 1
`
`~n
`
`1
`
`The concept of openness lies at the very heart of the Internet and gives it an inimitable
`charm. Openness means accelerated standards development, cooperation among
`vendors, and a win-win situation for developers and for users. The ideals of coopera(cid:173)
`tion and interoperability will be addressed again in this chapter's section on the
`Hypertext Transport Protocol (HTTP).
`
`Therefore, the Internet can adapt to network congestion by rerouting data packets around
`problem areas. Again, end users do not have to know the nitty-gritty details (but they do have
`to suffer the consequences of peak usage, slowing everybody's packets down!).
`
`The aspiring, ambitious Web developer should immerse himself or herself in the nitty
`gitty ofTCP-IP standards-both the current state of affairs and possible future
`directions. 2 For example, the Internet Multicasting Service has a very interesting online
`section titled "New and Trendy Protocols" that makes for fascinating reading and may
`well be a portent of things to come. 3 If you're an employee at a large installation, my
`advice is to show healthy curiosity and ask the system administrators to fill you in on
`the infrastructure and Internet connectivity at your firm. Be careful, though-some(cid:173)
`times the sys admins bite!
`
`1 Internet Requests for Comments, RFCs, may sound like dry stuff, but the first two mentioned in this chapter are a
`must read for the Web developer. By the way, it's very handy to know about the complete RFC Index (about
`500KB), at http: //www.cis .ohio-state. edu/htbin/rfc/ and a searchable RFC site (by number or keyword) at
`http://www.tohoku.ac.jp/RFC.html.
`2 There is a variery of excellent texts describing the TCP-IP protocol. Some are more detailed than others. One set
`that I've enjoyed is Stevens's The Illustrated TCP-IP Volumes I and II, Addison-Wesley, 1994.
`3 The Internet Multicasting Service home page is at htt p://www. town. hall. org/ and you can find their
`discussion of"New and Trendy Protocols" at http://www. town. hall. erg/trendy /trendy. html.
`
`Starbucks Corp. Exhibit 1019
`
`
`
`I
`I
`
`_ 376
`
`Gateway Programming
`PartlY
`
`Why Do We Need HnP?
`
`We don't need a World Wide Web to perform some of the more basic tasks on the Internet
`For example, I can transfer ASCII files, or binary images, from one machine to another usin ·
`file transfer protocol (FTP). I can log on to a remote machine using telnet, rwgin, or rsh. Or, Y
`can browse hierarchically based (menued) data using gopher. Most machines support standard
`e-mail as well Simple Mail Transport Protocol (SMTP), and if a site subscribes to USENET
`the newsgroups are accessible using Network News Transport Protocol (NNTP).
`'
`
`On a UNIX-based machine, the basic services are enumerated in the file /etc/services. Each
`service corresponds to a standard port. For example, telnet is mapped to port 23, and F'fP is
`mapped to port 2I. All ports below I 024 are privileged, meaning that only the system
`administrator(s) who can become root on the machine are able to manipulate the service and
`port mapping.
`
`Figure I9 .I shows a typical File Transfer Protocol session.
`
`Sequence of Events In a Typical FTP Session
`
`I Pip ttJ!IIr1 ."BU Q"9
`lllilt':
`JBBJnfiiSJ
`FlPP!>OM
`
`ta " .IBQIB!I~
`
`FIGURE 19.1.
`User at New York
`University asks for
`a documentation file
`from the Internet
`Multicasting Service
`using File Transfer
`Protocol (FTP).
`
`6 I, o'\lf!l aJIJIB'Di IICJ) f
`
`P t 11 ann
`
`!li ~tuMJI "IQilllijftii.$Jil
`tBDHib'l. tt.m:ft'l ~ tl!wDY
`
`tJf ~tm
`
`llrillefiJl ~tclttl~ l
`
`ectgar . stern. nyu. edu (requestor)
`
`town. hall. org (tip server)
`
`Note that after steps 1 through 5 the connection is not dropped; it is still active. I can continue requesting documents from the town, hall. org FTP server
`indefinitely. However, I connot Issue new FTP commands if any requests are still outstanding; In other words, requests cannot be entered in
`paralleL
`
`The conniK:tion stays active untll the requestor quits the session, the sarv&r closes the connection (usually because the requestor has been inactive
`for, for example, one hour or hardware problems occur on lhe requestor side, the server side, or somewhere in between.
`
`The important thing to realize about basic services such as FTP or telnet is that they establish
`what potentially might be a long-lasting connection. The user can stay connected indefinitely;
`for example, FTPing one file after another from an FTP site or logging on all day on a remote
`machine via telnet. The problem, of course, is that when a user is in a terminal FTP session
`and he or she wants to telnet to a different machine, or FTP from a different FTP site, it's
`necessary to close the current connection and start a new one.
`
`Theoretically, a hardy soul might build an interesting hypermedia resource site by FTPing
`interesting images, video, and so on from archives around the world. He or she might also
`accumulate a great amount of textual information content in a similar fashion. Yet, in the "bad
`
`Starbucks Corp. Exhibit 1019
`
`
`
`Principles of Gateway Programming
`Chapter 19
`
`377
`
`old days," there was no way to publish the resource base to the global Internet community.
`The only recourse would be to write about the site on the USENET newsgroups, and then
`allow anonymous FTP to support other users to mirror some or all of the files. The hypermedia
`would be viewable only to a privileged set of local users.
`
`What is missing? None of these services, alone or in combination, affords the possibility of
`allowing machines around the world to collaborate in a rich hypermedia environment. When
`the '90s started, it was virtually unimaginable that the efficient sharing of text, video, and au(cid:173)
`dio resources was just around the corner. One way to think of the problem is to consider that
`it was impossible, just a few short years ago, to "request" hypermedia data for local viewing
`from a remote machine using a TCP-IP pipe. There simply was no standard to support the
`"request" or the "answer."
`
`Filling the Collaborative Vacuum
`The global Internet community was blessed, in 1991, by Tim Berners-Lee's implementation
`of the HTTP protocol at CERN, the European Center for High-Energy Physics in Geneva,
`Switzerland. Another way to look at "collaboration" in this context is the ability to publish the
`hypermedia resource base locally and have it viewable globally, and the ability to swiftly and
`easily transfer the hypermedia resources, annotate them, and republish them on another site.
`HTTP is the powerful engine enabling hypermedia remote collaboration, and stands at the
`very essence of the World Wide Web.
`
`A Closer Look at the Hypertext Transport
`Protocol (HTIP)
`
`The HTTP protocol can be thought of as "sitting on top" of the network. In other words, the
`HTTP specification (HTTP) presupposes the existence of a backbone network connecting all
`the machines (in the case of the Internet, TCP-IP), and all the packets flowing from client to
`server and vice versa take advantage of the standard TCP-IP protocol. It encompasses several
`broad areas:
`
`• A comprehensive addressing scheme. When an HTML hyperlink is composed, the
`URL (Uniform Resource Locator) is of the general form http: 1/machine-name:port(cid:173)
`number/path/file. html. Note that the machine name conforms to the IP addressing
`scheme; it may be of the form aaa. bbb. ccc. ddd. edu or, using DNS lookup, the
`machine's "English" equivalent may be used. Note further that the path is not the
`absolute path on the server machine; rather, it is a relative path to the server's docu(cid:173)
`ment root directory. More generally, a URL reference is of the type service : II
`machine/file. file-extension and, in this way, the HTTP protocol can subsume the
`
`Starbucks Corp. Exhibit 1019
`
`
`
`Gateway Programming
`
`----------------------------------------------------
`
`more basic Internet services.4 For example, to construct a link to create a hyperlink to
`an Edgar NYU research paper, one can code
`<A HREF="ftp://edgar.stern.nyu .edu/pub/papers/edgar.ps">
`By subsume, I mean that a non-HTTP request is fulfilled in the Web environment;
`hence, a request for an FTP file results in that file being cached locally with the usual
`Web browser operations available (Save&, Print, and so on) without sacrificing the
`essential flexibility of being able to jump to the next URL.
`• An extensible and open representation for data types. When the client sends a transac(cid:173)
`tion to the server, headers are attached that conform to standard Internet e-mail
`specifications (RFC822). 5 At this time, the client can limit the representation schemes
`that are deemed acceptable, or throw the doors wide open and allow any representa(cid:173)
`tion (possibly one of which the client is not aware). Normally, from the standpoint of
`gateway programming, most client requests expect an answer either in plain text or
`HTML. It's not at all necessary that developers know the full specification of client
`request headers, but full details are available online.6 When the HTTP server transmits
`information back to the client, it includes a MIME (Multipart Internet Mail Exten(cid:173)
`sion) header to "tell" the client what kind of data follows the header. The server does
`not have to have the capability to parse or interpret a data type; it can pass the data
`back to the client, and uanslation then depends on the client possessing the appropri(cid:173)
`ate utility (image viewer, movie player, and so on) corresponding to that data type.
`
`NOTE
`
`-. -.
`
`The MIME specification, originally developed for e-mail attachments, has been
`adapted in a very important way for the Web.7 MIME will be discussed further in
`Chapter 20, "Gateway Programming Fundamentals." For now, it's enough to remem(cid:173)
`ber that the HTTP protocol requires that data flowing back to the client has a properly
`formatted set of header lines.
`
`4 T. Berners-Lee, L. Masinter, M. McCahill, Uniform Resource Locators (URL). 12/20/1994 at http : I I
`www . cis.ohio -state.edulhtbinlrfclrfc1738.html.
`5 Internet Request for Comments.
`6 The basic HTTP specification is online at http: //info. cern . ch/hypertext/IWM'/Protocols/HTTP /HTTP2 . html
`courtesy of Tim Berners-Lee. The HTTP overview is online at http: I I info. cern. chlhypertextiiWM'I
`Protocols/Overview. html and the Internet Engineering Task Force HTTP Working Group's current activities
`are viewable at http://www. ics. uci. edu/pub/ ietf /http/.
`7 The MIME specification is addressed in several RFCs; here are the two basic ones: RFC 1521, N. Borenstein, N.
`Freed, "MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing
`the Format oflnternet Message Bodies," 09/23/1993 available in ASCII text and PostScript; and RFC 1522, K.
`Moore, "MIME (Multipurpose Internet Mail Extensions) Part Two: Message Header Extensions for Non-ASCII
`Text," 09/23/1993, available in ASCII text.
`
`Starbucks Corp. Exhibit 1019
`
`
`
`_________________________ P_r_i_nc-=ip'-le-~ of Gateway Programming
`
`--
`
`•
`
`The HTTP protocol also has several important properties:
`It is stateless. Statelessness means that after the server has responded to the client's
`request, the connection between client and server is dropped. This has important
`ramifications and is in direct contrast to the basic Internet services such as FTP or
`telnet, which were discussed previously. In an FTP session, if I request a file from a
`remote site, I am still logged on (the connection is still there) until I explicitly quit, or
`I am logged off by the remote machine (an inactivity time-out). Statelessness also
`means, from the standpoint of the web developer, that there is no "memory" between
`client connections. In the pure HTTP server implementation, there is no trace of
`recent activity from a given client address, and the server treats every request as if it
`were brand-new; that is, without context. Throughout Part IV, I will be presenting
`workarounds that maintain or alter state that in effect keep the client-server connec(cid:173)
`tion alive for more than one cycle.
`It is rapid. In short: the client requests, the server responds, the end. Berners-Lee's
`stated goal of a hypermedia response-answer cycle on the order of 100 milliseconds
`has definitely been met "on a clear net day." The perceived delay ("This site is so slow
`today!") can be blamed, usually, on general network congestion.
`
`r CAUTION
`
`.
`
`It's up to the Web developer to avoid adding to the congestion woes of the client!
`Throughout Part IV, I stress ways to plan data structures, and accesses to these data
`structures, in efficient ways.
`
`• There are portable implementation solutions. Thanks to Tim Berners-Lee, Henrik
`Frystyk, Roy Fielding, and many others, the Internet research community has been
`involved from the outset in implementing solutions for HTTP servers and HTTP
`browsers.
`
`NOTE
`
`-
`
`On UNIX boxes, the standard HTTP port is port 80 and the server daemon is called
`"httpd." The httpd program can run stand-alone, waiting for a request at port 80, or it
`can run "off of the Inetd" (consulting the system files /etc/services and /etc/inetd.conf
`when a port 80 request is received). The HTTPD daemon also can be started on a
`non privileged port, such as port 8000, but then of course the client must specify a
`URL such as http: //machine· name: 8000/path/file. html and it's up to the server to
`publicize the oddball port! Not a happy task. If the port is not specified, port 80 is
`assumed.
`
`Starbucks Corp. Exhibit 1019
`
`
`
`380
`
`Gateway Programming
`PartlY
`
`- '
`
`
`
`Its future direction will be open. Peripatetic Mr. Berners-Lee now heads the World
`Wide Web Consortium, or W3C, which provides an open forum for development.
`1
`many different arenas.8 For example, the Netscape Communications Corporation h
`n
`developed a security scheme, called the "Secure Sockets Layer," and has published ~s
`SSL specifications for all to see (I talk more on security in Chapter 24, "Transaction e
`Security and Security Administration"). The W3C is evaluating this as well as a
`commercial competitor's ideas for "Secure HTTP," or SHTTP, in a rigorous and
`impartial manner. An organization such as the W3C is a fantastic resource for the
`Web development community-top engineers and theorists can enter an open forum.
`and freely discuss ideas and new directions for the protocol.
`
`It's a great idea for the budding Web developer to closely follow the ideas that are
`being bandied about by the W3C. One important idea is the "Uniform Resource
`Identifier," which is Request for Comment (RFC) 1630.9 Currently, users often
`encounter the frustration of clicking a hypertext link only to find that the URL is no
`longer valid. The URI specs allow for the possibility of encoding a forwarding address,
`in a manner of speaking, when a link moves. The list of ideas goes on and on; the more
`the developer knows today, the more he or she is ready tomorrow when the concept
`becomes a practical reality. And if the time and resources exist, a trip to one of the
`WWW conferences is highly recommended to keep up with the latest initiatives. 10
`
`•
`
`Its weaknesses are known and are being addressed. In one intriguing and noteworthy
`example, the current HTTP 1.0 often causes performance problems on the server side,
`and on the network, because it sets up a new connection for evety request. Simon
`Spero has published a progress report on what the W3C calls "HTTP Next Genera(cid:173)
`tion," or HTTP-NG. As Spero states, HTTP-NG "divides up the connection {be-
`
`a The W3C Consortium is hosted in Europe by the French information research agency INRIA (budgetary
`considerations caused CERN to bow out at the end of 1994) and in the United States by the Massachusetts
`Institute ofTechnology. Their stated objective (and one well worth noting) is to "ensure the evolution of the
`World Wide Web (W3) protocols into a true information infrastructure in such a fashion that smooth transitions
`will be assured both now and in the future. Toward this goal, the MIT Consonium team will develop, suppon,
`test, disseminate W3 protocols and reference implementations of such protocols and be a vendor-neutral
`convenor of the community developing W3 products. In this latter role, the team will act as a coordinator for W3
`development to ensure maximum possible standardization and interoperabiliry." More information is available at
`http://info.cern.ch/hypertext/WWW/Consortium/Prospectus/FAQ.html.
`? T . Berners-Lee, "Universal Resource Identifiers in WWW: A UnifYing Syntax for the Expression of Names and
`Addresses of Objects on the Network as used in the World-Wide Web," 06/09/1994, at http: //www.cis.ohio·
`state.edu/htbin/rfc/rfc1630.html.
`10 The next World Wide Web conference will be in Boston, Massachusetts, December 1995. Look at http://
`www.w3.org/hypertext/Conferences/WWW4/ for more details.
`
`Starbucks Corp. Exhibit 1019
`
`
`
`rween client and server) into lots of different channels ... each object is returned over its
`own channel." Spero further points out that HTTP-NG protocol will permit complex
`data types such as video to redirect the URL to a video transfer protocol, and only
`then will the data be fetched for the client. HTTP-NG also keeps a Session ID thus
`bestowing "state." Again, the Web developer should make a point of keeping abreast
`of developments in HTTP-NG, Secure HTTP, Netscape SSL, and other hot industry
`issues. 11
`
`Let's imagine now the state of the world just after the HTTP protocol was introduced (and
`yes, it was an instant and smashing success) but before the advent of our next topic, the Com(cid:173)
`mon Gateway Interface. In 1991, we had our accustomed TCP-IP Internet connectivity, and
`then there was the HTTP protocol in operation. That means that we had many HTML coders
`integrating text, video, and audio at their server sites, and many more clients anxious to get at
`the servers' delights. Remote collaboration was achieved: clients could request hypermedia data
`from a remote server and view it locally. Consider, though, one such client session. Without
`the Common Gateway Interface, clients can only navigate from one hypertext link to the next;
`each one containing text, audio, video, or some other data type. This inefficient means ofbrows(cid:173)
`ing a large information store would consist of nothing more than the actions shown in Fig(cid:173)
`ure 19.2.
`
`FIGURE 19.2.
`Without the Common
`Gateway Interface,
`an inefficient brows(cid:173)
`ing session.
`
`Without CGI. ..
`
`Web
`client
`
`Request at port 80 (http)
`
`Web
`client
`
`Server response;
`connection closes
`
`.....
`
`Request number 2
`
`Server response #2
`connection close
`
`{Large data store; HTML pages,
`images, audio, organized by
`category).
`
`The client relies on visual cues from the serve~s HTML and image organization for direction in what to
`request next. There is no ad-hoc query ability; there is no full-text indexing of the underlying infonnation
`store. The responses from the server to the client are discrete pre-composed data Items. This is a non(cid:173)
`interactive model where the session consists of click-response/ click--response and so on.
`
`The drawbacks of navigating serially from link to link, with each link producing one discrete
`pre-existing data item, are potentially severe at some server locations. For the user, it would be
`annoying to browse numerous links at a large server site to find a specific item of interest. For
`the Web developer, there would be no way to provide an ad-hoc mechanism for querying data
`(of any type), nor would it be possible to build HTML documents dynamically at request time.
`Naturally, some sites can fully stand on their own, without gateway-supplied interactivity.
`
`11 Simon Spero, at UNNC Sunsite/EIT, discusses his proof of concept implementation ofHTTP-NG and the
`basic HTTP-NG architecture at http://info.cern.ch/hypertext!WWW/Protocols/http-ng-status.html.
`
`Starbucks Corp. Exhibit 1019
`
`
`
`i
`I
`J
`
`382
`
`Gateway Programming
`~--~--------------------------~
`Part IV
`
`What Is the Common Gateway Interface?
`
`The Common Gateway Interface, or CGI, is a means for the HTTP server to "talk"
`grams on your, or someone else's, machine. The name was very aptly chosen:
`
`to pro-
`
`Common: The idea is that each server and client program, regardless of the oper . atmg
`
`system platform, adheres to the same standard mechanisms for the flow of data
`between client, server, and gateway program. This enables a high level of portabi!i
`between a wide variety of machines and operating systems.
`ty
`Gateway. Although a CGI program can be a stand-alone program, it can also act as a
`mediator between the HTTP server and any other program that can accept at runtime
`some form of command line input (for example, standard input, stdin, ot environ(cid:173)
`mental variables). This means that, for example, an SQL database program that has no
`built-in means for talking to an HTTP server can be accessed by a "gateway" program.
`The gateway program can usually be developed in any number of languages, irrespec(cid:173)
`tive of the external program.
`Interface: The standard mechanisms provide a complete environment for developers.
`There is no necessity for a developer to learn the nuts and bolts of the HTTP server
`source code. Once you understand the interface, you can develop gateway programs;
`all you need to know in terms of the HTTP protocol is how the data flows in and out.
`
`CGI programs go beyond the static model of a client issuing one HTML request after another.
`Instead of passively reading server data content one pre-written screen at a time, the CGI speci(cid:173)
`fication allows the information provider to serve up different documents depending on the
`client's request. The CGI spec also allows the gateway program to create new documents on
`the fly-that is, at the time that the client makes the request. For example, a current Table of
`Contents HTML document, listing all HTML documents in a directory, can easily be com(cid:173)
`posed by a CGI program. I demonstrate this useful program in Chapter 20.
`
`Note particularly the synergies between organizations permitted by the capability ofCGI pro(cid:173)
`grams to call each other across the Internet. By mutual agreement, companies can feed each
`other parameters to perform ad-hoc queries on proprietary data stores. I'll be showing an ex(cid:173)
`ample of such interaction in Chapter 21, "Gateway Programming I: Programming Libraries
`and Databases," in the discussion of the Stock Ticker Symbol Application.
`
`The Flow of Data Using the Common
`Gateway Interface
`·
`
`Recall Figure 19 .2, which illustrated a schematic data flow without the advantages of the Com(cid:173)
`mon Gateway Interface. Adding in CGI, our picture now looks like the one depicted in
`Figure 19.3.
`
`Starbucks Corp. Exhibit 1019
`
`
`
`Principles of Gateway Programming
`Chapter 19
`
`Typical http CHant-server Interaction
`
`FIGURE 19.3.
`A schematic overview of
`data flow using CGl
`
`+
`.. - + .. I
`'
`"'! --- -~- ----..
`~·-r-----·-f' :
`:TheCGI
`:
`:
`:
`: program ~ the •
`1
`• "black bo)C"
`:
`•
`.!--------i--;
`' '
`'
`I ,
`JJ----------1>'
`
`(C)
`
`1 1
`
`
`
`-
`
`.. -1- -
`
`'
`~ (b)
`
`' 1
`
`,
`
`I
`
`1'
`
`Client
`machine
`
`Running Web browser
`
`Retained from step (c)
`specifying data type.
`
`'
`
`l ................ -
`
`After the exhange of requesl (a) and response (d) the connection is dropped (the HTTP Pfoperty
`of statelessness). The client is then responsible for visualizing the response. Web browsers have
`built-In capability to view HTML or inline GIF images but may require third-party software packages
`tovisualize more arcane data types.
`
`Notes:
`The CGI "program• might be a collection of programs, all written In different languages.
`The CGI program is responsible for prefixing the correct MIME type on the response.
`Client browsers, If they receive an unknown data format. usually prompt the user to save the
`response to disk.
`
`The first step is data being transmitted from a client to a server (1). The server then hands the
`request to the CGI program for execution (2). Output (if any) is passed back to the server (3).
`The outpu