ToC ~ Up ~ Prev ~ Next ~ Index |
Introduction to HTML Last Update: 5 January 1998 |
Certain characters, such as the left bracket (<
), ampersand
(&
), etc. are reserved by HTML to represent special
attributes such as the start of HTML elements, graphic characters, and so on.
In addition there are many ISO-Latin 1 characters that you may wish
to include in a document, but which are not trivially available on
a standard keyboard.
HTML allows special referencing to represent these special characters. These are indicated by either character references or entity references.
Character references are composed of three parts:
&
),
For example the character reference for less than symbol
(<) is <.
Note that this number depends on the character set being used -- for example, in some character sets, the 60th character may not be the less than symbol. Thus it is more convenient (and universal) to have a symbolic reference for a character, as opposed to an absolute numeric reference. In HTML (and SGML) such references are called entity references.
Entity references are similar, but use symbolic names to represent the characters. Entity references also have three parts:
&
),
Thus the entity reference for less than symbol (<) is
<.
Note that not all the valid characters have corresponding entity references. In theses cases you need to use the direct numerical character references. Furthermore, some of the newer entity references defined in HTML 4 not understood by all browsers.
The ISO data table document lists all the ISO Latin-1 characters, alongside their numerical positions in the character set (both decimal -- used by HTML character references, and hexadecimal -- used by URL character encodings) and the corresponding entity reference, if defined. A second test document gives a list of all the defined entity references, and includes these entities in the text. You can use this document to test your browser's support for the full range of ISO Latin-1 entity references.
Another document describing entity references is found at http://www.natural-innovations.com/boo/doc-charset.html. This document, due to Walter Ian Kaye, lists all the ISO Latin-1 characters, complete with text descriptions. It also lists some characters and entity references that are not part of ISO Latin-1. These extra characters are not yet widely implemented in Web applications.
ToC ~ Up ~ Prev ~ Next ~ Index |
Introduction to HTML © 1994-1998 by Ian Graham Last Update: 5 January 1998 |