📘
Deuterium Wiki
  • Hello
  • Linux
    • cmd
      • du: 显式文件大小
      • seq: 生成序列
      • cat: 连接
      • cp: 复制
      • cd: 切换目录
      • mv: 移动
    • awk
      • 执行awk脚本
      • 删除空行
      • 个数统计
      • 文件的交集
      • 文件的差集
    • mysql
      • 删除重复数据
      • 导出数据不带标题
  • Reading
    • Novel
      • 《基督山伯爵》人物关系
    • Awesome CS Books
      • csapp-3e-homework-solution
        • 1. A Tour of Computer Systems
        • 2. Representing and Manipulating Information
          • 2.55 Compile and Run
          • 2.56 Another Try
          • 2.57 More show Procedures
          • 2.58 Check Little-Endian
          • 2.59 Bit Expressions
          • 2.60 Replace Byte
          • 2.61 More Bit Expressions
          • 2.62 Check Arithmetic Right Shift
          • 2.63 Logic & Arithmetic Right Shift
          • 2.64 Any Odd One
          • 2.65 Odd Ones
          • 2.66 Leftmost One
          • 2.67 Int Size is 32
          • 2.68 Lower One Mask
          • 2.69 Rotate Left
          • 2.70 Fits Bits
          • 2.71 Xbyte
          • 2.72 Copy Int
          • 2.73 Saturating Add
          • 2.74 Sub OK
          • 2.75 Unsigned High Prod
          • 2.76 calloc
          • 2.77 Multiple By Shifts
          • 2.78 Divide Power 2
          • 2.79 Mul3div4
          • 2.80 Three Fourths
          • 2.81 Generate Bits
          • 2.82 Signed and Unsigned
          • 2.83 Binary Floating Value
          • 2.84 Float Le
          • 2.85 Floating Point I
          • 2.86 Extend Precision
          • 2.87 Floating-Point II
          • 2.88 Floating-Point III
          • 2.89 Floating-Point IV
          • 2.90 fpwr2
          • 2.91 π
          • 2.92 Float Negate
          • 2.93 Float Absval
          • 2.94 Float Twice
          • 2.95 Float Half
          • 2.96 Float f2i
          • 2.97 Float i2f
        • 3. Machine-Level Representation of Programs
          • 3.58 Decode
          • 3.59 128-bit Multiply
          • 3.60 For Loop
          • 3.61 Conditional Data Transfer
          • 3.62 Switch I
          • 3.63 Switch II
          • 3.64 Multiple Dimension Array I
          • 3.65 Multiple Dimension Array II
          • 3.66 Multiple Dimension Array III
          • 3.67 Caller and Callee
          • 3.68 Alignment
          • 3.69 Struct
          • 3.70 Union
          • 3.71 fgets
          • 3.72 Variable-Size Stack
          • 3.73 Find Range I
          • 3.74 Find Range II
          • 3.75 Complex
      • tcpv1
        • ch01: Introduction
        • ch02: Link Layer
        • ch03: Internet Protocol
        • ch04: Address Resolutin Protocol
        • ch05: Reverse Address Resolution Protocol
        • ch06: Internet Control Message Protocol
        • ch07: Ping Program
        • ch08: Traceroute Program
        • ch09: IP Routing
        • ch10: Dynamic Routing Protocols
        • ch11: User Datagram Protocol
        • ch12: Broadcasting and Multicasting
        • ch13: Internet Group Management Protocol
        • ch14: The Domain Name System
        • ch15: Trivial File Transfer Protocol
        • ch16: Boostrap Protocol
        • ch17: Transmission Control Protocol
        • ch18: TCP Connection Establishment and Termination
        • ch 19: TCP Interactive Data Flow
        • ch20: TCP Bulk Data Flow
      • http
        • ch01: Overview of HTTP
        • ch02: URLs and Resources
        • ch03: HTTP Messages
        • ch04: Connection Management
        • ch05: Web Servers
        • ch06: Proxies
        • ch07: Caching
        • ch08: Integration Points
        • ch09: Web Robots
        • ch10: HTTP-NG
        • ch11: Client Identification and Cookies
        • ch12: Basic Authentication
        • ch13: Digest Authentication
        • ch14: Secure HTTP
        • ch15: Entities and Encodings
        • ch16: Internationalizated
        • ch17: Content Negotiation and Transcoding
        • ch18: Web Hosting
        • ch19: Publishing Systems
        • ch20: Redirections and Load Balancing
        • ch21: Logging and Usage Tracking
    • 提升认知
      • 《为什么需要生物学思维》
      • 《大话西方艺术史》
  • Mathematics
Powered by GitBook
On this page
  • 0. Guide
  • 1. Navigating the Internet's Resources
  • 2. URL Syntax
  • 2.1 Usernames and Passwords
  • 2.2 Parameters
  • 3. URL Shortcuts
  • 3.1 Relative URLs
  • 4. Shady Characters
  • 4.1 The URL Character Set
  • 4.2 Encoding Mechanisms
  • 4.3 Character Restrictions
  • 5. The Future

Was this helpful?

  1. Reading
  2. Awesome CS Books
  3. http

ch02: URLs and Resources

Previousch01: Overview of HTTPNextch03: HTTP Messages

Last updated 4 years ago

Was this helpful?

2.URL与资源

0. Guide

  • URL syntax and what the various URL components mean and do

  • URL shortcuts that many web clients support, including relative URLs and expandomatic URLs

  • URL encoding and character rules

  • Common URL schemes that support a variety of Internet information systems

  • The future of URLs, including uniform resource names(URNs)—a framework to support objects that move from place to place while retaining stable names

1. Navigating the Internet's Resources

URL: identify resources by describing where resources are located.

URN: identify reosurces by name, regardless of where they currently reside.

  • The first part of the URL(http) is the URL scheme. The scheme tells a web client how to access the resource. In this case, the URL says to use the HTTP protocol.

  • The second part of the URL(www.joes-hardware.com) is the server location. This tells the web client where the resource is hosted.

  • The third part of the URL(/seasonal/index-fall.html) is the resource path. The path tells what particular local resource on the server is being requested.

2. URL Syntax

URL format:

<scheme>://<user>:<password>@<host>:<port>/<path>;<params>?<query>#<frag>

General URL components:

Component

Description

Default Value

scheme

Which protocol to use when accessing a server to get a resource.

None

user

The username some schemes require to access a resource.

anonymous

password

The password that may be included after the username, separated by a colon(:).

\

host

The hostname or dotted IP address of the server hosting resource.

None

port

The port number on which the server hosting the resource is listening. Many schemes have default port numbers.

Scheme-specific

path

The local name for the resource on the server, separated from the previous URL components by a slash(/). The syntax of the path component is server- and shceme-specific.

None

params

Used by some schemes to specify input parameters. Params are name/value pairs. A URL can contain multiple params fields, separated from themselves and the rest of the path by semicolons(;).

None

query

Used by some schemes to pass parameters to active applications(such as databases, bulletin boards, search engines, and other Internet gateways). There is no common format for the contents of the query component. It is separated from the rest of the URL by the "?" Character.

None

frag

A name for a piece or part of the resource. The frag field is not passed to the server when referencing the object; it is used internally by the client. It is separated from the rest of the URL by the "#" character.

None

2.1 Usernames and Passwords

Many servers require a username and password before you can access data through them. FTP servers are a common example of this. Examples:

ftp://ftp.prep.ai.mit.edu/pub/gnu
ftp://anonymous@ftp.prep.ai.mit.edu/pub/gnu
ftp://anonymous:my_passwd@ftp.prep.ai.mit.edu/pub/gnu
http://joe:joespasswd@www.joes-hardware.com/sales_info.txt

2.2 Parameters

Many protocols require more information to work.

Applications interpreting URLs need these protocol parameters to access the resource.

To give applications the input parameters they need in order to talk to the server correctly, URLs have a params component. This component is just a list of name/value pairs in the URL, separated from the rest of the URL by ";" characters. For example: ftp"//prep.ai.mit.edu/pub/gnu;type=d

Each segment can have its own params:

http://www.joes-hardware.com/hammers;sale=false/index.html;graphics=true

3. URL Shortcuts

3.1 Relative URLs

If you use relative URLs, you can move a set of documents around and still have their links work, because they will beinterpreted relative to the new bases. This allows for things like mirroring content on other servers.

Base URLs

The base URLs can come from a few places:

Explicitly provided in the reource

​ An HTML document, for example, may include a \ HTML tag defining the base URL by which to convert all relative URLs in that HTML document.

Base URL of the encapsulating resource

​ Use the URL of the resource in which it is embedded as a base.

No base URL

​ In some instances, there is no base URL. This often means that you have an absolute URL; however, sometimes you may just have an incomplete or broken URL.

4. Shady Characters

Protable, Readable and Complete.

4.1 The URL Character Set

Escape sequences allow the encoding of arbitrary character values or data using a restricted subset of the US-ASCII character set, yielding portability and completeness.

4.2 Encoding Mechanisms

An encoding scheme was devised to represent characters in a URL that are not safe. The encoding simply represents the unsafe character by an "escape" notation, consisting of a percent sign(%) followed by two hexadecimal digits that represent the ASCII code of the character.

Examples:

Character

ASCII code

~

126(0x7E)

SPACE

32(0x20)

%

37(0x25)

4.3 Character Restrictions

Several characters have been reserved to have special meaning inside of a URL:

Character

Reservation/Restriciton

%

Reserved as escape token for encoded characters

/

Reserved for delimiting splitting up path segments in the path component

.

Reserved in the path component

..

Reserved in the path component

#

Reserved as the fragment delimiter

?

Reserved as the query-string delimiter

;

Reserved as the params delimiter

:

Reserved to delimiter the scheme, user/password, and host/port components

$, +

Reserved

@&=

Reserved because they have special meaning in the context of some shcemes

{}|\^~[]'

Restricted because of unsafe handling by various transport agents, such as gateways

<>''

Unsafe; should be encoded because these characters often have meaning outside the scope of the URL, such as delimiting the URL itself in a document

0x00-0x1F,0x7F

Restricted; characters within these hex ranges fall within the nonprintable section of the US-ASCII character set

>0x7F

Restricted; characters whose hex values fall within this range do not fall within the 7-bit range of the US-ASCII character set

5. The Future

URLs are really addreses, not true names. This means that a URL tells you where something is located, for the moment.

Example:

http://www.joes-hardware.com/seasonal/index-fall.html