ch05: Web Servers

0. Guide

  • Survey the many different types of software and hardware web servers.

  • Describe how to write a simple diagnostic web server in Perl.

  • Explain how web servers process HTTP transactions, step by step.

1. What Real Web Servers Do

  1. Set up connction: accept a client connection, or close if the client is unwanted.

  2. Receive request: read an HTTP request message from the network.

  3. Process request: interpret the request message and take aciton.

  4. Access resource: access the resource specified in the message.

  5. Construct response: create the HTTP response message with the right headers.

  6. Send response: send the response back to the client.

  7. Log transaction: place notes about the completed transaction in a log file.

2. Step 1: Accepting Client Connections

2.1 Handling New Connections

Establish a connection, adds the new connection to the server's list of existing web server connections and prepares to watch for data on the connection.

2.2 Client Hostname Identification

Most web servsers can be configured to convert client IP addresses into client hostnames, using "reverse DNS". (can take a very long time)

Web servers can use the client hostname for detailed access control and logging.

In Apacha, use HostnameLookups configuration directive:

HostnameLookups off
<Files ~ "\.(html|htm|cgi)$">
        HostnameLookups on
</Files>

2.3 Determining the Client User Through ident

The ident protocol lets servers find out what username initiated an HTTP connection. This information is particularly useful for web server logging.

If a client supports the ident protocol, the client listens on TCP port 113 for ident requests.

ident can work inside organizations, but it does not work well across the public Internet for many reasons.

3. Step 2: Receiving Request Messages

Web server reads out the data from the network connction and parses out the pieces of the request message:

3.1 Connection Input/Output Processing Architectures

Different web server architectures service requests in different ways:

Multiplexed I/O servers

​ All the connections are simultaneously watched for ativity. When a connection changes state, a small amount of processing is performed on the connection; when that processing is complete, the connection is returned to the open connection list for the next change in state. Work is done on a connection only when there is something to be done; threads and processes are not tied up waiting on idle connections.

4 Step 3: Processing Requests

5 Step 4: Mapping and Accessing Resources

Mapping the URI from the request message to the proper content or content generator on the web server.

5.1 Docroots

Typically, a special folder in the web server filesystem is reserved for web content. This folder is called the document root, or docroot.

Configuration for Apache:

DocumentRoot /usr/local/httpd/files

Servers are careful not to let relative URLs back up out of docroot and expose other parts of the filesystem.

5.1.1 Virtually hosted docroots

Virtually hosted web servers host multiple web sites on the same web server, giving each site its own distinct document root on the server:

5.1.2 User home directory docroots

5.2 Directory Listings

Directory URLs, the path resolves to a directory, not a file.

Most web servers look for a file named index.html or index.htm inside a directory to represent that directory.

Configure the set of filenames that will be interpreted as default directory files using DirectoryIndex:

DirectoryIndex index.html index.htm home.html home.htm index.cgi

5.3 Dynamic Content Resource Mapping

Apache lets you map URI pathname components into executable program directories, and execute the program in a corresponding server directory.

Like cgi.

6. Step 5: Building Responses

6.1 Response Entities

If the transaction generated a response body, the content is sent back with the response message.

6.2 MIME Typing

The web server is responsible for determining the MIME type of the response body:

mime.types

​ Server can use the extension of the filename to indicate MIME type.

Magic typing

​ The Apache web server can scan the contents of each resource and pattern-match the content against a table of known patterns to determine the MIME type for each file.

Explicit typing

​ Servers can be configured to force particular files or directory contents to have a MIME type, regardless of the file extension or contents.

Type negotiation

​ Some web servers can be configured to store a resource in multiple document formats. In this case, the web server can be configured to determine the "best" format to sue by a negotiation process with the user.

6.3 Redirection

Web servers sometimes return redirection responses instead of success messages. A web server can redirect the browser to go elsewhere to perform the request. The Location response header contains a URI for the new or preferred location of the content. Redirects are useful for:

Permanently moved resources

​ A resource might have been moved to a new location, or otherwise renamed, giving it a new URL. 301

Temporarily moved resources

​ A resource is temporarily moved or renamed. 303 See Other and 307 Temporary Redirect

7. Step 6: Sending Responses

Compute the Content-Length correctly.

8. Step 7: Logging

Last updated