ch01: Overview of HTTP
Last updated
Last updated
How web clients and servers communicate
Where resources(web content) come from
How web transactions work
The format of the messages used for HTTP communication
The underlying TCP network transport
The different variations of the HTTP protocol
Some of the many HTTP architectural componenets installed around the Internet
Together, HTTP clients and HTTP servers make up the basic components of the World Wide Web.
A resource is any kind of content.
Because the Internet hosts many thousands of different data types, HTTP carefully tags each object being transported through the Web with a data format label called a MIME type.
A MIME type is a textual label, represented as a primary obejct type and a specific subtype, separated by a slash. For example:
An HTML-formatted text document would be labeled with type text/html
.
A plain ASCII text document would be labeled with type text/plain
.
A JPEG version of an image would be image/jpeg
.
A GIG-format impage would be image/gif
.
An Apple QuickTime movie would be video/quicktime
.
A Microsoft PowerPoint presentation would be application/vnd.ms-powerpoint
.
Each web server resource has a name, so clients can point out what resources they are interested in.
The server resource name is called a uniform resource identifier, or URI:
URIs come in two flavors, called URLs and URNs.
URL(uniform resource locator) is the most common form of resource identifier.
URLs describe the specific location of a resource on a particular server.
Standardized format:
The first part of the URL is called the scheme, and it describes the protocol used to access the resource. This is usually the HTTP protocol(http://
).
The second part gives the server Internet address.(e.g. www.joe-hardware.com
).
The rest names a resource on the web server(e.g. /specials/saw-blade.gif
).
Today, almost every URI is a URL.
URN(uniform resource name), servers unique name for a particular piece of content, independent of where the resource currently resides.
URNs are still experimental and not yet widely adopted.
An HTTP transaction consists of a request command(sent from client to server), and a response result(sent from the server back to the client).
This communication happens with formatted blocks of data called HTTP messages:
Every HTTP request message has method. The method tells the server what action to perform:
HTTP method | Description |
GET | Send named resource from the server to client. |
PUT | Store data from client into a named server resource. |
DELETE | Delete the named resource from a server. |
POST | Send client data into a server gateway application. |
HEAD | Send just the HTTP headers from the response for the named resource. |
Every HTTP response message comes back with a status code.
HTTP also sends an explanatory textual "reason phrase" with each numeric status code.
An application often issues multiple HTTP transactions to accomplish a task:
HTTP messages consist of three parts:
Start line
The first line of the message is the start line, indicating what to do for a request or what happened for a response.
Header fields
Zero or more header fields follow the start line. Each header field consists of a name and a value, separated by a colon(:) for easy parsing. The headers end with a blank line. Adding a header field is as easy as adding another line.
Body
After the blank line is an optional message body containing any kind of data. Request bodies carry data to the web server; response bodies carry data back to the client. Unlike the start lines and headers, which are textual and structured, the body can contain arbitary binary data. Of course, the body can also contain text.
Steps:
The browser extracts the server's hostname from the URL.
The browser converts the server's hostname into the server/s IP address.
The browser extracts the port number(if any) from the URL.
The browser establishes a TCP connection with the web server.
The browser sends an HTTP request message to the server.
The server sends an HTTP response back to the browser.
The connection is closed, and the browser displays the document.
There are many other web applications that you interact with on the Internet.
Proxies: HTTP intermediaries that sit between clients and servers
Caches: HTTP storehouses that keep copies of popular web pages close to clients
Gateways: Special web servers that connect to other applications
Tunnels: Special proxies that blindly forward HTTP communications
Agents: Semi-intelligent web clients that make automated HTTP requests
Proxt servers, important building blocks for web security, application integration, and performance optimizaiton.
Web cache or caching proxy is a special type of HTTP proxy server that keeps copies of popular documents that pass through the proxy:
Gateways are special servers that act as intermediaries for other servers. They are often used to convert HTTP traffic to another protocol:
Tunnels are HTTP applications that, after setup, blindly relay raw data between two connections.
Agents are client programs that make HTTP requests on the user's behalf. Any application that issues web requests is an HTTP agent:
Web servers host web resources:
Web servers attach a MIME type to all HTTP object data:
An example:
One popular use of HTTP tunnels is to carry encrypted Secure Sockets Layer traffic through an HTTP connection: