An Introduction to SGML

In SGML, text is not specified in terms of its visual appearance (e.g. by some special code that specifies ``switch to 15pt. font''). Rather, its semantic structure is specified (e.g. this is a heading), and it is left to the rendering process to actually layout (or render) the text (e.g. by displaying headings with 15pt fonts).

SGML is sort of a meta-standard that gives you a large amount of freedom to actually specify what your markup should look like. Consequently, a number of different, complex, and incompatible SGML markups have been proposed to be used within certain knowledge domains. As HTF is designed to be independent of a certain discipline, this would mean that HTF would have to be a superset of them in order to be able to transport and store them all using Hyper-G.

However, in the design of HTF a different approach was taken. Rather then making it a superset of all semantic SGML markups, HTF is a markup language that is primarily targeted towards the renderer and NOT so much interested in the semantics of the document. For example, there are a few markup elements in HTF that allow to specify that a phrase should be emphasized. HTF is not interested in the semantic meaning of the phrase, i.e. in all the possibilities why this phrase should be emphasized. This makes HTF a very small markup language that can be authored and processed easily. A large SGML DTD can be translated to the small HTF DTD by applying a sort of ``pre-renderer'' that maps the semantic elements to simple HTF elements.

SGML purists will argue that this is against the SGML philosophy and makes authoring HTF difficult because it is not possible to reconstruct the semantics of the document. However, Hyper-G is not designed as a document preparation system, but as a document dissemination system, where it is more important to support a wide range of users and user interfaces, even starting with simple terminals.


Every HTF element is started by a tag, and every non-empty element ends with a tag. Start tags are delimited by < and > and end tags are delimited with </ and >.

The element name immediately follows the tag open delimiter. Names consist of a letter followed by up to 33 letters, digits, periods, or hyphens. Names are not case sensitive.

In a start tag, attributes and whitespace are allowed between the element name and the closing delimiter. An attribute consists of a name, and optionally an equal sign and a value. Whitespace is allowed around the equal sign. The attribute value is specified in a string surrounded by single or double quotes.

Entity References

The HTF DTD allows to represent 8-bit characters in the ISO Latin 1 character set using only 7-bit characters, and to represent characters that would otherwise be recognized ar markup (the characters < and &), using SGML entity references. The character & signals the beginning of an entity reference when it is followed by a letter or a digit. The delimiter is followed by the entity name and a semicolon. For example:

"&Ouml;sterreich" is the German word for "Austria".

The character & itself is represented by the entity reference &amp;, while &lt; denotes the < character. These characters should always be represented in this way when they appear in data.


SGML foresees the possibility to include comments in a document that will be ignored by the parser. A comment is started with <!--, ended with -->, and may appear anywhere in the text. Some comments are understood and elimitated by Hyper-G on document insertion, and allow to specify the position of the document (parent collection, sort key), language, expiration time, etc. For example:

<TITLE>Staff Meeting Nov. 23, 4pm
<!--TimeExpire=93/11/23 16:00:00-->
This document will expire at the specified time.

Line breaks, spaces, tabs

A line break character is considered markup (and ignored) if it is the first or last piece of content of an element. This allows you to write either

<XMP>some example text</XMP>


some example text

and they will be processed identically. Also, a line that's not empty but contains no content will be ignored altogether. For example, the element

first line
<!-- comment --> 
second line

fourth line

contains only the strings:

first line
second line

fourth line

As a special case, however, HTF treats an empty line (two line breaks following each other) identically to a paragraph break, except within an <XMP> or <PLAIN> element.

Space characters are rendered as horizontal white space. The rendering of a horizontal tab (HT) character is not defined, and HT should therefore not be used, except within an <XMP> or <PLAIN> element.

Neither spaces nor tabs should be used to make SGML source layout more attractive or easier to read!

More attribute information.
Author: mgais
created: 94/03/11 11:27:45
modified: 95/07/12 13:56:00


Hyper-G Text Format (HTF)