ch05: Web Servers
Last updated
Last updated
Survey the many different types of software and hardware web servers.
Describe how to write a simple diagnostic web server in Perl.
Explain how web servers process HTTP transactions, step by step.
Set up connction: accept a client connection, or close if the client is unwanted.
Receive request: read an HTTP request message from the network.
Process request: interpret the request message and take aciton.
Access resource: access the resource specified in the message.
Construct response: create the HTTP response message with the right headers.
Send response: send the response back to the client.
Log transaction: place notes about the completed transaction in a log file.
Establish a connection, adds the new connection to the server's list of existing web server connections and prepares to watch for data on the connection.
Most web servsers can be configured to convert client IP addresses into client hostnames, using "reverse DNS". (can take a very long time)
Web servers can use the client hostname for detailed access control and logging.
In Apacha, use HostnameLookups configuration directive:
The ident protocol lets servers find out what username initiated an HTTP connection. This information is particularly useful for web server logging.
If a client supports the ident protocol, the client listens on TCP port 113 for ident requests.
ident can work inside organizations, but it does not work well across the public Internet for many reasons.
Web server reads out the data from the network connction and parses out the pieces of the request message:
Different web server architectures service requests in different ways:
Multiplexed I/O servers
All the connections are simultaneously watched for ativity. When a connection changes state, a small amount of processing is performed on the connection; when that processing is complete, the connection is returned to the open connection list for the next change in state. Work is done on a connection only when there is something to be done; threads and processes are not tied up waiting on idle connections.
Mapping the URI from the request message to the proper content or content generator on the web server.
Typically, a special folder in the web server filesystem is reserved for web content. This folder is called the document root, or docroot.
Configuration for Apache:
Servers are careful not to let relative URLs back up out of docroot and expose other parts of the filesystem.
5.1.1 Virtually hosted docroots
Virtually hosted web servers host multiple web sites on the same web server, giving each site its own distinct document root on the server:
5.1.2 User home directory docroots
Directory URLs, the path resolves to a directory, not a file.
Most web servers look for a file named index.html or index.htm inside a directory to represent that directory.
Configure the set of filenames that will be interpreted as default directory files using DirectoryIndex:
Apache lets you map URI pathname components into executable program directories, and execute the program in a corresponding server directory.
Like cgi.
If the transaction generated a response body, the content is sent back with the response message.
The web server is responsible for determining the MIME type of the response body:
mime.types
Server can use the extension of the filename to indicate MIME type.
Magic typing
The Apache web server can scan the contents of each resource and pattern-match the content against a table of known patterns to determine the MIME type for each file.
Explicit typing
Servers can be configured to force particular files or directory contents to have a MIME type, regardless of the file extension or contents.
Type negotiation
Some web servers can be configured to store a resource in multiple document formats. In this case, the web server can be configured to determine the "best" format to sue by a negotiation process with the user.
Web servers sometimes return redirection responses instead of success messages. A web server can redirect the browser to go elsewhere to perform the request. The Location response header contains a URI for the new or preferred location of the content. Redirects are useful for:
Permanently moved resources
A resource might have been moved to a new location, or otherwise renamed, giving it a new URL. 301
Temporarily moved resources
A resource is temporarily moved or renamed. 303 See Other and 307 Temporary Redirect
Compute the Content-Length correctly.