ch01: Overview of HTTP

0. Guide:

  • How web clients and servers communicate

  • Where resources(web content) come from

  • How web transactions work

  • The format of the messages used for HTTP communication

  • The underlying TCP network transport

  • The different variations of the HTTP protocol

  • Some of the many HTTP architectural componenets installed around the Internet

1. Web Clients and Servers

Together, HTTP clients and HTTP servers make up the basic components of the World Wide Web.

2. Resources

A resource is any kind of content.

2.1 Media Types

Because the Internet hosts many thousands of different data types, HTTP carefully tags each object being transported through the Web with a data format label called a MIME type.

A MIME type is a textual label, represented as a primary obejct type and a specific subtype, separated by a slash. For example:

  • An HTML-formatted text document would be labeled with type text/html.

  • A plain ASCII text document would be labeled with type text/plain.

  • A JPEG version of an image would be image/jpeg.

  • A GIG-format impage would be image/gif.

  • An Apple QuickTime movie would be video/quicktime.

  • A Microsoft PowerPoint presentation would be application/vnd.ms-powerpoint.

2.2 URIs

Each web server resource has a name, so clients can point out what resources they are interested in.

The server resource name is called a uniform resource identifier, or URI:

URIs come in two flavors, called URLs and URNs.

2.3 URLs

URL(uniform resource locator) is the most common form of resource identifier.

URLs describe the specific location of a resource on a particular server.

Standardized format:

  • The first part of the URL is called the scheme, and it describes the protocol used to access the resource. This is usually the HTTP protocol(http://).

  • The second part gives the server Internet address.(e.g. www.joe-hardware.com).

  • The rest names a resource on the web server(e.g. /specials/saw-blade.gif).

Today, almost every URI is a URL.

2.4 URNs

URN(uniform resource name), servers unique name for a particular piece of content, independent of where the resource currently resides.

URNs are still experimental and not yet widely adopted.

3. Transactions

An HTTP transaction consists of a request command(sent from client to server), and a response result(sent from the server back to the client).

This communication happens with formatted blocks of data called HTTP messages:

3.1 Methods

Every HTTP request message has method. The method tells the server what action to perform:

HTTP method

Description

GET

Send named resource from the server to client.

PUT

Store data from client into a named server resource.

DELETE

Delete the named resource from a server.

POST

Send client data into a server gateway application.

HEAD

Send just the HTTP headers from the response for the named resource.

3.2 Status Codes

Every HTTP response message comes back with a status code.

HTTP also sends an explanatory textual "reason phrase" with each numeric status code.

3.3 Web Pages Can Consist of Multiple Objects

An application often issues multiple HTTP transactions to accomplish a task:

4. Message

HTTP messages consist of three parts:

Start line

​ The first line of the message is the start line, indicating what to do for a request or what happened for a response.

Header fields

​ Zero or more header fields follow the start line. Each header field consists of a name and a value, separated by a colon(:) for easy parsing. The headers end with a blank line. Adding a header field is as easy as adding another line.

Body

​ After the blank line is an optional message body containing any kind of data. Request bodies carry data to the web server; response bodies carry data back to the client. Unlike the start lines and headers, which are textual and structured, the body can contain arbitary binary data. Of course, the body can also contain text.

5. Connections

5.1 Connections, IP Addresses, and Port Numbers

Steps:

  1. The browser extracts the server's hostname from the URL.

  2. The browser converts the server's hostname into the server/s IP address.

  3. The browser extracts the port number(if any) from the URL.

  4. The browser establishes a TCP connection with the web server.

  5. The browser sends an HTTP request message to the server.

  6. The server sends an HTTP response back to the browser.

  7. The connection is closed, and the browser displays the document.

6. Architectural Components of the Web

There are many other web applications that you interact with on the Internet.

  • Proxies: HTTP intermediaries that sit between clients and servers

  • Caches: HTTP storehouses that keep copies of popular web pages close to clients

  • Gateways: Special web servers that connect to other applications

  • Tunnels: Special proxies that blindly forward HTTP communications

  • Agents: Semi-intelligent web clients that make automated HTTP requests

6.1 Proxies

Proxt servers, important building blocks for web security, application integration, and performance optimizaiton.

6.2 Caches

Web cache or caching proxy is a special type of HTTP proxy server that keeps copies of popular documents that pass through the proxy:

6.3 Gateways

Gateways are special servers that act as intermediaries for other servers. They are often used to convert HTTP traffic to another protocol:

6.4 Tunnels

Tunnels are HTTP applications that, after setup, blindly relay raw data between two connections.

6.5 Agents

Agents are client programs that make HTTP requests on the user's behalf. Any application that issues web requests is an HTTP agent:

Last updated