URLs for HTTP Servers

8.1 URLs for HTTP Servers

As most HTML is served from HTTP (HyperText Transfer Protocol) servers this is the most common URL you are likely to see. Consider the following examples:

http://www.w3.org/hypertext/Addressing/
http://www.java.utoronto.ca:3232/home.html
http://www.utoronto.ca/ian/books/html4ed/outline.html

What do these strings mean? The first part http: means that the documents are served by an http server. The double slash (//) means that the next part is the name of the server. This can have two parts, the internet address of the server (essential) and the port number the server listens at (optional). In the first example www.w3.org the port number is not specified, so the browser assumes the default number for http servers (Port 80). In the second case URL tells the browser that the http server is at port 3232. The port is specified after the server name, separated by a colon.

The final part specifies the file or resource being requested: this is separated from the address+port number pair by a slash (/). The resource is specified by a path relative to to the root directory of the server. Thus, in the third example, the document outline.html at www.utoronto.ca is found in the subdirectory .../ian/books/html4ed/ with respect to the HTTP server root.

Special HTTP URL Paths

A file or resource specification beginning with /cgi-bin/ is usually special: in the case of many servers, the cgi-bin string indicates a special reference to programs or scripts that can be executed by the server. This is discussed in more detail in section 8.1.1.

HTTP Directory Listings

If the file name is left out, the server tries to send you a default directory file. Usually this is a file named "index.html", but this default name can be modified (or turned off) by the server configuration files. You should always include the trailing slash if you are referencing a directory, for example /directory/ as otherwise the server will think you are requesting a file named directory as opposed to information about the directory.

8.1.1 Passing Parameters to the Server

The HTTP protocol support the passing of arguments to the server. The general format is to postpend the arguments to the URL, separated from the URL by a question mark (?). The reason for this notation is simple: most requests of this type are requests to search a database, and the passed arguments are the search parameters.

The general form is as follows:

http://some.site.edu/cgi-bin/foo?arg1+arg2+arg3

What does this mean? There are two things to note:

cgi-bin: The cgi-bin directory is a special location known to the server, containing executable programs or scripts. The reason is obvious: you have to pass argument to something that can act on those arguments, implying a program or script. The cgi-bin directory contains programs/scripts that interface with the WWW - a URL can access and pass argument to programs/scripts in this directory, and these programs/scripts can in turn act on the arguments and return information, documents, etc. to the browser.
passed arguments: Arguments are appended to the URL, separated from it by a question mark (?). You can also send more than one argument, separated by a plus sign (+). Thus in the above the program/script foo is sent three arguments, arg1, arg and arg3.

For more information see the W3C documentation on addressing.

8.1.2 Personal HTML directories

On many Web servers, users can have html documents in their own home directories, distinct from the special area reserved for administrative Web pages. The procedure for doing this depends on some degree on the server. In general the user needs to create a special file, placed in their home directory, that specifies where their personal 'root' html directory is. You then access files in this personal 'root' area by using a special URL path of the form: ~your_login_name/path/file, where the tilde (~) indicates that this is a 'personal' Web area. Again, this is a server-specific feature, and not all servers do this, or have this turned on. Ask your server manager for details about your local implementation.