`
`
`
`
`
`
`
`
`
`Vital Information for Apache
`
`
`Programmers & Administrators
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Ben Laurie & Peter Laurie
`
`The Definitive Guide
`
`
`
`O'REILLY
`
`Google Exhibit 1052
`Google Exhibit 1052
`Google v. Valtrus
`Google v. Valtrus
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`Apache: The Definitive Guide
`
`
`
`Ben Laurie & Peter Laurie
`
`
`
`Second Edition, February 1999, updated February 2000
`
`ISBN: 1-56592-528-9, 388 pages
`
`
`
`
`
`Written and reviewed by key members of the Apache group, this book is the only complete
`guide on the market that describes how to obtain, set up, and secure the Apache software on
`both Unix and Windows systems.
`
`The second edition fully describes Windows support and all the other Apache 1.3 features.
`
`
`
`Release Team[oR] 2001
`
`
`
`
`
`
`
`
`
`
`
`
`
`Apache: The Definitive Guide
`
`1
`
`
`
`
`
`7
`
`
`
`
`
`
`
`
`
`
`
`24
`
`
`
`
`
`
`37
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`59
`
`
`
`
`
`
`
`
`
`
`79
`
`
`
`
`
`
`
`
`
`
`
`
`
`98
`
`
`
`
`
`
`
`
` Preface
`
`
`Who Wrote Apache, and Why?
`
`
`Conventions Used in This Book
`
`
`Organization of This Book
`
`
`Acknowledgments
`
` 1
`
` Getting Started
` 1.1
`How Does Apache Work?
` 1.2 What to Know About TCP/IP
` 1.3
`How Does Apache Use TCP/IP?
` 1.4 What the Client Does
` 1.5 What Happens at the Server End?
` 1.6 Which Unix?
` 1.7 Which Apache?
` 1.8 Making Apache Under Unix
` 1.9
`Apache Under Windows
` 1.10 Apache Under BS2000/OSD and AS/400
`
` 2
`
` Our First Web Site
` 2.1 What Is a Web Site?
` 2.2
`Apache's Flags
` 2.3
`site.toddle
` 2.4
`Setting Up a Unix Server
` 2.5
`Setting Up a Win32 Server
`
` 3
`
` Toward a Real Web Site
` 3.1 More and Better Web Sites: site.simple
` 3.2
`Butterthlies, Inc., Gets Going
` 3.3
`Block Directives
` 3.4 Other Directives
` 3.5
`Two Sites and Apache
` 3.6
`Controlling Virtual Hosts on Unix
` 3.7
`Controlling Virtual Hosts on Win32
` 3.8
`Virtual Hosts
` 3.9
`Two Copies of Apache
` 3.10 HTTP Response Headers
` 3.11 Options
` 3.12 Restarts
` 3.13
`.htaccess
` 3.14 CERN Metafiles
` 3.15 Expirations
`
` 4
`
` Common Gateway Interface (CGI)
` 4.1
`Turning the Brochure into a Form
` 4.2 Writing and Executing Scripts
` 4.3
`Script Directives
` 4.4
`Useful Scripts
` 4.5 Debugging Scripts
` 4.6
`Setting Environment Variables
` 4.7
`suEXEC on Unix
` 4.8
`Handlers
` 4.9
`Actions
`
` 5
`
` Authentication
` 5.1
`Authentication Protocol
` 5.2
`Authentication Directives
` 5.3
`Passwords Under Unix
` 5.4
`Passwords Under Win32
` 5.5
`New Order Form
` 5.6 Order, Allow, and Deny
` 5.7 Digest Authentication
` 5.8
`Anonymous Access
` 5.9
`Experiments
` 5.10 Automatic User Information
` 5.11 Using .htaccess Files
` 5.12 Overrides
`
` 6
`
` MIME, Content and Language Negotiation
` 6.1 MIME Types
` 6.2
`Content Negotiation
` 6.3
`Language Negotiation
` 6.4
`Type Maps
` 6.5
`Browsers and HTTP/1.1
`
`
`
`
`
`
`
`Apache: The Definitive Guide
`
`104
`
`
`
`
`116
`
`
`
`
`
`
`
`
`
`
`125
`
`
`
`
`131
`
`
`
`
`
`
`
`
`136
`
`
`
`
`
`
`144
`
`
`
`
`
`
`
`
`
`
`
`
`
`151
`
`
`
`
`
`
`
`
`
`
`173
`
`
`
`
`
`
`
`220
`
`
`
`
`
`
`
`
`
` 7
`
`Indexing
`
` 7.1 Making Better Indexes in Apache
` 7.2 Making Our Own Indexes
` 7.3
`Imagemaps
`
` 8
`
` Redirection
`
`8.1
`ScriptAlias
` 8.2
`ScriptAliasMatch
` 8.3
`Alias
` 8.4
`AliasMatch
` 8.5
`UserDir
` 8.6
`Redirect
` 8.7
`RedirectMatch
` 8.8
`Rewrite
` 8.9
`Speling
`
` 9
`
` Proxy Server
` 9.1
`Proxy Directives
` 9.2
`Caching
` 9.3
`Setup
`
`10 Server-Side Includes
` 10.1 File Size
` 10.2 File Modification Time
` 10.3
`Includes
` 10.4 Execute CGI
` 10.5 Echo
` 10.6 XBitHack
` 10.7 XSSI
`
`11 What's Going On?
` 11.1 AddModuleInfo
` 11.2 Status
` 11.3 Server Status
` 11.4 Server Info
` 11.5 Logging the Action
`
`12 Extra Modules
` 12.1 Authentication
` 12.2 Blocking Access
`
`12.3 Counters
` 12.4 Faster CGI Programs
` 12.5 FrontPage from Microsoft
` 12.6 Languages and Internationalization
` 12.7 Server-Side Scripting
` 12.8 Throttling Connections
` 12.9 URL Rewriting
` 12.10 Miscellaneous
` 12.11 MIME Magic
` 12.12 DSO
`
`13 Security
`
`13.1
`Internal and External Users
` 13.2 Apache's Security Precautions
` 13.3 Binary Signatures, Virtual Cash
` 13.4 Firewalls
` 13.5
`Legal Issues
` 13.6 Secure Sockets Layer: How to Do It
` 13.7 Apache-SSL's Directives
` 13.8 Cipher Suites
` 13.9 SSL and CGI
`
`14 The Apache API
` 14.1 Pools
` 14.2 Per-Server Configuration
` 14.3 Per-Directory Configuration
` 14.4 Per-Request Information
` 14.5 Access to Configuration and Request Information
` 14.6 Functions
`
`15 Writing Apache Modules
` 15.1 Overview
` 15.2 Status Codes
` 15.3 The Module Structure
` 15.4 A Complete Example
` 15.5 General Hints
`
`
`
`
`
`
`
`Apache: The Definitive Guide
`
`245
`
`246
`
`248
`
`249
`
`
`
`
`253
`
`259
`
` Support Organizations
`
`
`
`
` A
`
` The echo Program
`
` B
`
` C
`
` NCSA and Apache Compatibility
`
` D
`
` SSL Protocol
` D.1 Handshake Protocol
` D.2
`Protecting Application Data
` D.3
`Final Notes
`
` E
`
` Sample Apache Log
`
`
` Colophon
`
`
`
`
`
`
`The freeware Apache web server runs on about half of the world's existing web sites, and it is rapidly
`increasing in popularity. Apache: The Definitive Guide, written and reviewed by key members of the Apache
`Group, is the only complete guide on the market today that describes how to obtain, set up, and secure the
`Apache software.
`
`Apache was originally based on code and ideas found in the most popular HTTP server of the time: NCSA httpd
`1.3 (early 1995). It has since evolved into a far superior system that can rival (and probably surpass) almost
`any other UNIX-based HTTP server in terms of functionality, efficiency, and speed. The new version now
`includes support for Win32 systems. This new second edition of Apache: The Definitive Guide fully describes
`Windows support and all the other Apache 1.3 features. Contents include:
`
`•
`The history of the Apache Group
`• Obtaining and compiling the server
`• Configuring and running Apache on UNIX and Windows, including such topics as directory structures,
`virtual hosts, and CGI programming
`
`•
`The Apache 1.3 Module API
`• Apache security
`• A complete list of configuration directives
`• A complete demo of a sample web site
`
`With Apache: The Definitive Guide, web administrators new to Apache can get up to speed more quickly than
`ever before by working through the tutorial demo. Experienced administrators and CGI programmers, and web
`administrators moving from UNIX to Windows, will find the reference sections indispensable. Apache: The
`Definitive Guide is the definitive documentation for the world's most popular web server.
`
`
`
`
`
`Apache: The Definitive Guide
`
`Preface
`
`Apache: The Definitive Guide is principally about the Apache web server software. We explain what a web
`server is and how it works, but our assumption is that most of our readers have used the World Wide Web and
`understand in practical terms how it works, and that they are now thinking about running their own servers to
`offer material to the hungry masses.
`
`This book takes the reader through the process of acquiring, compiling, installing, configuring, and modifying
`Apache. We exercise most of the package's functions by showing a set of example sites that take a reasonably
`typical web business - in our case, a postcard publisher - through a process of development and increasing
`complexity. However, we have deliberately not tried to make each site more complicated than the last. Most of
`the chapters refer to an illustrative site that is as simple as we could make it. Each site is pretty well self-
`contained so that the reader can refer to it while following the text without having to disentangle the meat
`there from extraneous vegetables. If desired, it is perfectly possible to install and run each site on a suitable
`system.
`
`Perhaps it is worth saying what this book is not. It is not a manual, in the sense of formally documenting every
`command - such a manual exists on the Apache site and has been much improved with Version 1.3; we
`assume that if you want to use Apache, you will download it and keep it at hand. Rather, if the manual is a
`roadmap that tells you how to get somewhere, this book tries to be a tourist guide that tells you why you
`might want to make the journey.
`
`It also is not a book about HTML or creating web pages, or one about web security or even about running a
`web site. These are all complex subjects that should either be treated thoroughly or left alone. A compact,
`readable book that dealt thoroughly with all these topics would be most desirable.
`
`A webmaster's library, however, is likely to be much bigger. It might include books on the following topics:
`
`• The Web and how it works
`• HTML - what you can do with it
`• How to decide what sort of web site you want, how to organize it, and how to protect it
`• How to implement the site you want using one of the available servers (for instance, Apache)
`• Handbooks on Java, Perl, and other languages
`• Security
`
`Apache: The Definitive Guide is just one of the six or so possible titles in the fourth category.
`
`Apache is a versatile package and is becoming more versatile every day, so we have not tried to illustrate
`every possible combination of commands; that would require a book of a million pages or so. Rather, we have
`tried to suggest lines of development that a typical webmaster should be able to follow once an understanding
`of the basic concepts is achieved.
`
`As with the first edition, writing the book was something of a race with Apache's developers. We wanted to be
`ready as soon as Version 1.3 was stable, but not before the developers had finished adding new features.
`Unfortunately, although 1.3 was in "feature freeze" from early 1998 on, we could not be sure that new features
`might not become necessary to fix newly discovered problems.
`
`In many of the examples that follow, the motivation for what we make Apache do is simple enough and
`requires little explanation (for example, the different index formats in Chapter 7). Elsewhere, we feel that the
`webmaster needs to be aware of wider issues (for instance, the security issues discussed in Chapter 13) before
`making sensible decisions about his or her site's configuration, and we have not hesitated to branch out to deal
`with them.
`
`
`
`
`
`page 1
`
`
`
`Apache: The Definitive Guide
`
`Who Wrote Apache, and Why?
`
`Apache gets its name from the fact that it consists of some existing code plus some patches. The FAQ1 thinks
`that this is cute; others may think it's the sort of joke that gets programmers a bad name. A more responsible
`group thinks that Apache is an appropriate title because of the resourcefulness and adaptability of the
`American Indian tribe.
`
`You have to understand that Apache is free to its users and is written by a team of volunteers who do not get
`paid for their work. Whether or not they decide to incorporate your or anyone else's ideas is entirely up to
`them. If you don't like this, feel free to collect a team and write your own web server.
`
`The first web server was built by the British physicist Tim Berners-Lee at CERN, the European Centre for
`Nuclear Research at Geneva, Switzerland. The immediate ancestor of Apache was built by the U.S. government
`in the person of NCSA, the National Center for Supercomputing Applications. This fine body is not to be
`confused with the National Computing Security Agency or the North Carolina Schools Association. Because this
`code was written with (American) taxpayers' money, it is available to all; you can, if you like, download the
`source code in C from www.ncsa.uiuc.edu, paying due attention to the license conditions.
`
`There were those who thought that things could be done better, and in the FAQ for Apache (at
`http://www.apache.org) we read:
`
`...Apache was originally based on code and ideas found in the most popular HTTP server of the time, NCSA
`httpd 1.3 (early 1995).
`
`That phrase "of the time" is nice. It usually refers to good times back in the 1700s or the early days of
`technology in the 1900s. But here it means back in the deliquescent bogs of a few years ago!
`
`While the Apache site is open to all, Apache is written by an invited group of (we hope) reasonably good
`programmers. One of the authors of this book, Ben, is a member of this group.
`
`Why do they bother? Why do these programmers, who presumably could be well paid for doing something
`else, sit up nights to work on Apache for our benefit? There is no such thing as a free lunch, so they do it for a
`number of typically human reasons. One might list, in no particular order:
`
`•
`
`•
`
`•
`
`•
`
`•
`
`They want to do something more interesting than their day job, which might be writing stock
`control packages for BigBins, Inc.
`
`They want to be involved on the edge of what is happening. Working on a project like this is a
`pretty good way to keep up-to-date. After that comes consultancy on the next hot project.
`
`The more worldly ones might remember how, back in the old days of 1995, quite a lot of the people
`working on the web server at NCSA left for a thing called Netscape and became, in the passage of
`the age, zillionaires.
`
`It's fun. Developing good software is interesting and amusing and you get to meet and work with
`other clever people.
`
`They are not doing the bit that programmers hate: explaining to end users why their treasure isn't
`working and trying to fix it in 10 minutes flat. If you want support on Apache you have to consult
`one of several commercial organizations (see Appendix A), who, quite properly, want to be paid for
`doing the work everyone loathes.
`
`
`1 FAQ is netspeak for Frequently Asked Questions. Most sites/subjects have an FAQ file that tells you what the thing is, why it is,
`and where it is going. It is perfectly reasonable for the newcomer to ask for the FAQ to look up anything new to him or her, and
`indeed this is a sensible thing to do, since it reduces the number of questions asked. Apache's FAQ can be found at
`http://www.apache.org/docs/FAQ.html.
`
`
`
`
`
`page 2
`
`
`
`Apache: The Definitive Guide
`
`Conventions Used in This Book
`
`This section covers the various conventions used in this book.
`
`Typographic Conventions
`
`Constant Width
`
`Used for HTTP headers, status codes, MIME content types, directives in configuration files, commands,
`options/switches, functions, methods, variable names, and code within body text
`
`Constant Width Bold
`
`Used in code segments to indicate input to be typed in by the user
`
`Constant Width Italic
`
`Used for replaceable items in code and text
`
`Italic
`
`Icons
`
`Used for filenames, pathnames, newsgroup names, Internet addresses (URLs), email addresses,
`variable names (except in examples), terms being introduced, program names, subroutine names,
`CGI script names, hostnames, usernames, and group names
`
`Text marked with this icon applies to the Unix version of Apache.
`
`Text marked with this icon applies to the Win32 version of Apache.
`
`The owl symbol designates a note relating to the surrounding text.
`
`The turkey symbol designates a warning related to the surrounding text.
`
`
`
`
`
`
`
`
`
`
`Pathnames
`
`We use the text convention ... / to indicate your path to the demonstration sites, which may well be different
`from ours. For instance, on our Apache machine, we kept all the demonstration sites in the directory
`/usr/www. So, for example, our path would be /usr/www/site.simple. You might want to keep the sites
`somewhere other than /usr/www, so we refer to the path as ... /site.simple.
`
`Don't type .../ into your computer. The attempt will upset it!
`
`
`
`
`
`page 3
`
`
`
`Directives
`
`Apache is controlled through roughly 150 directives. For each directive, a formal explanation is given in the
`following format:
`
`Apache: The Definitive Guide
`
`Directive
`
`Syntax
`Where used
`
`An explanation of the directive is located here.
`
`So, for instance, we have the following directive:
`
`ServerAdmin
`
`ServerAdmin email address
`Server config, virtual host
`
`ServerAdmin gives the email address for correspondence. It automatically generates error messages so the user
`has someone to write to in case of problems.
`
`The "where used" line explains the appropriate environment for the directive. This will become clearer later.
`
`
`
`Organization of This Book
`
`The chapters that follow and their contents are listed here:
`
`Chapter 1
`
`Covers web servers, how Apache works, TCP/IP, HTTP, hostnames, what a client does, what happens
`at the server end, choosing a Unix version, and compiling and installing Apache under both Unix and
`Win32.
`
`Chapter 2
`
`Discusses getting Apache to run, creating Apache users, runtime flags, permissions, and site.simple.
`
`Chapter 3
`
`Introduces a demonstration business, Butterthlies, Inc.; some HTML; default indexing of web pages;
`server housekeeping; and block directives.
`
`Chapter 4
`
`Demonstrates aliases, logs, HTML forms, shell script, a CGI in C, environment variables, and adapting
`to the client's browser.
`
`Chapter 5
`
`Explains controlling access, collecting information about clients, cookies, DBM control, digest
`authentication, and anonymous access.
`
`Chapter 6
`
`Covers content and language arbitration, type maps, and expiration of information.
`
`Chapter 7
`
`Discusses better indexes, index options, your own indexes, and imagemaps.
`
`
`
`
`
`page 4
`
`
`
`Apache: The Definitive Guide
`
`Chapter 8
`
`Describes Alias, ScriptAlias, and the amazing Rewrite module.
`
`Chapter 9
`
`Covers remote proxies and proxy caching.
`
`Chapter 10
`
`Explains runtime commands in your HTML and XSSI - a more secure server-side include.
`
`Chapter 11
`
`Covers server status, logging the action, and configuring the log files.
`
`Chapter 12
`
`Discusses authentication, blocking, counters, faster CGI, languages, server-side scripting, and URL
`rewriting.
`
`Chapter 13
`
`Discusses Apache's security precautions, validating users, binary signatures, virtual cash, certificates,
`firewalls, packet filtering, secure sockets layer (SSL), legal issues, patent rights, national security, and
`Apache-SSL directives.
`
`Chapter 14
`
`Describes pools; per-server, per-directory, and per-request information; functions; warnings; and
`parsing.
`
`Chapter 15
`
`Covers status codes; module structure; the command table; the initializer, translate name, check
`access, check user ID, check authorization and check type routines; prerun fixups; handlers; the
`logger; and a complete example.
`
`Appendix A
`
`Provides a list of commercial service and/or consultation providers.
`
`Appendix B
`
`Provides a listing of echo.c.
`
`Appendix C
`
`Contains Apache Group internal mail discussing NCSA/Apache compatibility issues.
`
`Appendix D
`
`Provides the SSL specification.
`
`Appendix E
`
`Contains a listing of the full log file referenced in Chapter 11.
`
`In addition, the Apache Quick Reference Card provides an outline of the Apache 1.3.4 syntax.
`
`
`
`
`
`
`
`page 5
`
`
`
`Apache: The Definitive Guide
`
`Acknowledgments
`
`First, thanks to Robert S. Thau, who gave the world the Apache API and the code that implements it, and to
`the Apache Group, who worked on it before and have worked on it since. Thanks to Eric Young and Tim Hudson
`for giving SSLeay to the Web.
`
`Thanks to Bryan Blank, Aram Mirzadeh, Chuck Murcko, and Randy Terbush, who read early drafts of the first
`edition text and made many useful suggestions; and to John Ackermann, Geoff Meek, and Shane Owenby, who
`did the same for the second edition. Thanks to Paul C. Kocher for allowing us to reproduce SSL Protocol,
`Version 3.0, in Appendix D, and to Netscape Corporation for allowing us to reproduce echo.c in Appendix B.
`
`We would also like to offer special thanks to Andrew Ford for giving us permission to reprint his Apache Quick
`Reference Card.
`
`Many thanks to Robert Denn, our editor at O'Reilly, who patiently turned our text into a book - again. The two
`layers of blunders that remain are our own contribution.
`
`And finally, thanks to Camilla von Massenbach and Barbara Laurie, who have continued to put up with us while
`we rewrote this book.
`
`
`
`
`
`page 6
`
`
`
`Apache: The Definitive Guide
`
`Chapter 1. Getting Started
`
`When you connect to the URL of someone's home page - say the notional http://www.butterthlies.com/ we
`shall meet later on - you send a message across the Internet to the machine at that address. That machine,
`you hope, is up and running, its Internet connection is working, and it is ready to receive and act on your
`message.
`
`URL stands for Universal Resource Locator. A URL such as http://www.butter-thlies.com/ comes in three parts:
`
`<method>://<host>/<absolute path URL (apURL)>
`
`So, in our example, < method> is http, meaning that the browser should use HTTP (Hypertext Transfer
`Protocol); <host> is www.butterthlies.com; and <apURL> is "/ ", meaning the top directory of the host. Using
`HTTP/1.1, your browser might send the following request:
`
`GET / HTTP/1.1
`Host: www.butterthlies.com
`
`The request arrives at port 80 (the default HTTP port) on the host www.butterthlies.com. The message is again
`in three parts: a method (an HTTP method, not a URL method), that in this case is GET, but could equally be
`PUT, POST, DELETE, or CONNECT; the Uniform Resource Identifier (URI) "/"; and the version of the protocol we are
`using. It is then up to the web server running on that host to make something of this message.
`
`It is worth saying here - and we will say it again - that the whole business of a web server is to translate a URL
`either into a filename, and then send that file back over the Internet, or into a program name, and then run
`that program and send its output back. That is the meat of what it does: all the rest is trimming.
`
`The host machine may be a whole cluster of hypercomputers costing an oil sheik's ransom, or a humble PC. In
`either case, it had better be running a web server, a program that listens to the network and accepts and acts
`on this sort of message.
`
`What do we want a web server to do? It should:
`
`• Run fast, so it can cope with a lot of inquiries using a minimum of hardware.
`• Be multitasking, so it can deal with more than one inquiry at once.
`• Be multitasking, so that the person running it can maintain the data it hands out without having to
`shut the service down. Multitasking is hard to arrange within a program: the only way to do it
`properly is to run the server on a multitasking operating system. In Apache's case, this is some
`flavor of Unix (or Unix-like system), Win32, or OS/2.
`• Authenticate inquirers: some may be entitled to more services than others. When we come to
`virtual cash, this feature (see Chapter 13) becomes essential.
`• Respond to errors in the messages it gets with answers that make sense in the context of what is
`going on. For instance, if a client requests a page that the server cannot find, the server should
`respond with a "404" error, which is defined by the HTTP specification to mean "page does not
`exist."
`• Negotiate a style and language of response with the inquirer. For instance, it should - if the people
`running the server can rise to the challenge - be able to respond in the language of the inquirer's
`choice. This ability, of course, can open up your site to a lot more action. And there are parts of the
`world where a response in the wrong language can be a bad thing. If you were operating in Canada,
`where the English/French divide arouses bitter feelings, or in Belgium, where the French/Flemish
`split is as bad, this feature could make or break your business.
`• Offer different formats. On a more technical level, a user might want JPEG image files rather than
`GIF, or TIFF rather than either of the former. He or she might want text in vdi format rather than
`PostScript.
`
`
`
`
`
`page 7
`
`
`
`Apache: The Definitive Guide
`
`•
`
`Run as a proxy server. A proxy server accepts requests for clients, forwards them to the real
`servers, and then sends the real servers' responses back to the clients. There are two reasons why
`you might want a proxy server:
`o
`
`The proxy might be running on the far side of a firewall (see Chapter 13), giving its users
`access to the Internet.
`
`o
`The proxy might cache popular pages to save reaccessing them.
`o Be secure. The Internet world is like the real world, peopled by a lot of lambs and a few
`wolves.2 The wolves like to get into the lambs' folds (of which your computer is one) and,
`when there, raven and tear in the usual wolfish way. The aim of a good server is to
`prevent this happening. The subject of security is so important that we will come back to it
`several times before we are through.
`
`These are services that the developers of Apache think a server should offer. There are people who have other
`ideas, and, as with all software development, there are lots of features that might be nice - features someone
`might use one day, or that might, if put into the code, actually make it work better instead of fouling up
`something else that has, until then, worked fine. Unless developers are careful, good software attracts so many
`improvements that it eventually rolls over and sinks like a ship caught in an Arctic ice storm.
`
`Some ideas are in progress: in particular, various proposals for Apache 2.0 are being kicked around. The main
`features Apache 2.0 is supposed to have are multithreading (on platforms that support it), layered I/O, and a
`rationalized API.
`
`If you have bugs to report or more ideas for development, look at http://www.apache.org/bug_report.html.
`You can also try news:comp.infosystems.www.servers.unix, where some of the Apache team lurk, along with
`many other knowledgeable people, and news:comp.infosystems.www.servers.ms-windows.
`
`
`
`1.1 How Does Apache Work?
`
`Apache is a program that runs under a suitable multitasking operating system. In the examples in this book,
`the operating systems are Unix and Windows 95/98/NT, which we call Win32. The binary is called httpd under
`Unix and apache.exe under Win323 and normally runs in the background. Each copy of httpd/apache that is
`started has its attention directed at a web site, which is, for practical purposes, a directory. For an example,
`look at site.toddle on the demonstration CD-ROM. Regardless of operating system, a site directory typically
`contains four subdirectories:
`
`Contains the configuration file(s), of which httpd.conf is the most important. It is referred to
`throughout this book as the Config file.
`
`Contains the HTML scripts to be served up to the site's clients. This directory and those below it, the
`web space, are accessible to anyone on the Web and therefore pose a severe security risk if used for
`anything other than public data.
`
`Contains the log data, both of accesses and errors.
`
`conf
`
`htdocs
`
`logs
`
`cgi-bin
`
`Contains the CGI scripts. These are programs or shell scripts written by or for the webmaster that can
`be executed by Apache on behalf of its clients. It is most important, for security reasons, that this
`directory not be in the web space.
`
`In its idling state, Apache does nothing but listen to the IP addresses and TCP port or ports specified in its
`Config file. When a request appears on a valid port, Apache receives the HTTP request and analyzes the
`headers. It then applies the rules it finds in the Config file and takes the appropriate action.
`
`
`2 We generally follow the convention of calling these people the Bad Guys. This avoids debate about "hackers," which, to many people,
`simply refers to good programmers, but to some means Bad Guys. We discover from the French edition of this book that in France they
`are Sales Types - dirty fellows.
`3 This double name is rather annoying, but it seems that life has progressed too far for anything to be done about it. We will, rather
`clumsily, refer to httpd/apache and hope that the reader can pick the right one.
`
`
`
`
`
`page 8
`
`
`
`Apache: The Definitive Guide
`
`The webmaster's main control over Apache is through the Config file. The webmaster has some 150 directives
`at his or her disposal; most of this book is an account of what these directives do and how to use them to
`reasonable advantage. The webmaster also has half a dozen flags he or she can use when Apache starts up.
`Apache is freeware : the intending user downloads the source code and compiles it (under Unix) or downloads
`the executable (for Windows) from www.apache.org or a suitable mirror site. You can also load the source code
`from the demonstration CD-ROM included with this book, although it is not the most recent. Although it sounds
`like a difficult business to download the source code and configure and compile it, it only takes about 20
`minutes and is well worth the trouble.
`
`Under Unix, the webmaster also controls which modules are compiled into Apache. Each module provides
`the code to execute a number of directives. If there is a group of directives that aren't needed, the appropriate
`modules can be left out of the binary by commenting their names out in the configuration file4 that controls the
`compilation of the Apache sources. Discarding unwanted modules reduces the size of the binary and may
`improve performance.
`
`Under Windows, Apache is normally precompiled as an executable. The core modules are compiled in,
`and others are loaded, if needed, as dynamic link libraries (DLLs) at runtime, so control of the executable's size
`is less urgent. The DLLs supplied in the .../apache/modules subdirectory are as follows:
`
`APACHE~1 DLL 5,120 19/07/98 11:47 ApacheModuleAuthAnon.dll
`APACHE~2 DLL 5,632 19/07/98 11:48 ApacheModuleCERNMeta.dll
`APACHE~3 DLL 6,656 19/07/98 11:47 ApacheModuleDigest.dll
`APACHE~4 DLL 6,144 19/07/98 11:48 ApacheModuleExpires.dll
`APACHE~5 DLL 5,120 19/07/98 11:48 ApacheModuleHeaders.dll
`APACHE~6 DLL 46,080 19/07/98 11:48 ApacheModuleProxy.dll
`APACHE~7 DLL 35,328 19/07/98 11:48 ApacheModuleRewrite.dll
`APACHE~8 DLL 6,656 19/07/98 11:48 ApacheModuleSpeling.dll
`APACHE~9 DLL 10,752 19/07/98 11:47 ApacheModuleStatus.dll
`APACH~10 DLL 6,144 19/07/98 11:48 ApacheModuleUserTrack.dll
`
`What these are and what they do will become more apparent as we proceed. You can add other DLLs
`from outside suppliers; more will doubtless become available.
`
`It is also possible to download the source code and compile it for Win32 using Microsoft Visual C++
`v5.0. We describe this in Section 1.9, later in this chapter. You might do this if you wanted to write your own
`module (see Chapter 15).
`
`
`
`1.2 What to Know About TCP/IP
`
`To understand the substance of this book, you need a modest knowledge of what TCP/IP is and what it does.
`You'll find more than enough information in Craig Hunt and Robert Bruce Thompson's books on TCP/IP,5 but
`what follows is, we think, what is necessary to know for our book's purposes.
`
`TCP/IP (Transmission Control Protocol/Internet Protocol) is a set of protocols enabling computers to talk to
`each other over networks. The two protocols that give the suite its name are among the most important, but
`there are many others, and we shall meet some of them later. These protocols are embodied in programs on
`your computer written by someone or other; it doesn't much matter who. TCP/IP seems unusual among
`computer standards in that the programs that implement it actually work, and their authors have not tried too
`much to improve on the original conceptions.
`
`TCP/IP only applies where there is a network. Each computer on a network that wants to use TCP/IP has an IP
`address, for example, 192.168.123.1.
`
`There are four parts in the address, separated by periods. Each part corresponds to a byte, so the whole
`address is four bytes long. You will, in consequence, seldom see any of the parts outside the range -255.
`
`Although not required by protocol, by convention there is a dividing line somewhere inside this number: to the
`left is the network number and to the right, the host number. Two machines on the same physical network
`(usually a local area network) normally have the same network number and communicate using TCP/IP.
`
`