ToC ~ Up ~ Prev ~ Next ~ Index Introduction to HTML
Last Update: 5 January 1998

2.3 Naming Scheme for HTML Documents

When your HTML browser (Netscape Navigator, Internet Explorer, Opera, lynx etc.....) retrieves a file, it must know what type of data it has received in order to know what to do with it. Hypertext (that is, HTTP) servers explicitly tell the browser the type of the data being sent. In other cases, such as when the browser is using FTP to access a remote file, or when the browser is reading a file from your local disk (such as when you are editing pages prior to publishing them to a Web server), the browsers "guesses" the data type from the filename extension -- that is the part after the dot in the filename. For example, HTML files are identified by names such as name.html, where the .html extension indicates an HTML document.

Four letter extensions are common. This is not a problem with UNIX computers or Macintoshes, since these machines place no restriction on the filename. DOS and Windows 3.1 machines are unfortunately restricted to a three letter extension. Generally the extension is truncated to three letters (i.e. .html becomes .htm).

Here are some of the standard extensions, and their meanings:

.html (also .htm)
HTML document, containing text and HTML mark-up instructions.
.txt
A plain text file. The browser presents the file as a block of text and does not process it for mark-up instructions. Browsers generally treat unknown types of data as a text file.
.gif
A GIF format image file.
.xbm
An X-Bitmap (black&white) image file.
.xpm
An X-Pixmap (colour) image file.
.jpeg (also .jpg)
A jpeg-encoded image file.
.mpeg (also .mpg or .mpe)
An mpeg-encoded video file.
.qt
A (Macintosh) QuickTime-format video file
.avi
A (Microsoft) AVI-format video file
.au
An aiff-encoded audio (sound) file.
.Z
A compressed file - compressed using the adaptive Lempel-Ziv coding. This compression/decompression program are commonly found on UNIX computers.
.gz
A compressed file - compressed using the GNU gzip program. This program is common on UNIX computers and is available on PCs and Macintoshes.

2.3.1 MIME Types and File Data Formats

The World Wide Web actually uses MIME types (Multipurpose Internet Mail Extension) to define the type of a particular piece of information being sent from a Web server to a browser. A browser in turn determines, from the MIME type, how the data should be treated. Each browser has a configuration (menu or file) that maps the types of the data to particular functions. A browser can handle many types of data itself (e.g. HTML documents, GIF images) while other types are passed to auxiliary programs, such as image viewers, movie or sound players, plugins, and so on.

HTTP servers send MIME contents-types header messages ahead of every file they deliver to a browser. This header explicitly tells the browser what type of data is being sent. Thus a server must have a way of telling the type of data it is sending. Usually the server has a configuration file that relates filename extensions to the appropriate MIME type. For example, the MIME type for HTML documents is text/html. Thus, if a browser reqests that a server send the file blobs.html, the server first looks up the MIME type corresponding to the .html extension. The server then sends a message to the browser saying that data of content-type text/html is being sent, after which the server sends the actual data.

Other servers, such as FTP servers, do not send this MIME type information. In this case, the browser "guesses" the MIME type, based on the filename extension. Thus each browser must be configured with a list that relates typical extensions to the "most likely" type of data. This is also how a browser determines the type of files accessed locally of the computer.

2.3.2 More Information about MIME Types

For more information on MIME types see the Internet Draft document defining the MIME format: RFC 1341. I have also assembled a non-autoritative list of MIME types, which is useful for understanding the different types that are in current use. This document is accessible at: http://www.utoronto.ca/ian/books/html4ed/appb/mimetype.html, and is part of the online supporting material for my book, The HTML 4.0 Sourcebook.


ToC ~ Up ~ Prev ~ Next ~ Index Introduction to HTML
© 1994-1998 by Ian Graham
Last Update: 5 January 1998