SGML Syntax Summary Introduction

Copyright © 1996 Harvey Bingham

1 Scope

The scope is limited to that of the syntax from the ISO 8879-1986(E) SGML standard as corrected in Amendment 1. This SGML Syntax Sumary adds extensive hyperlinking of syntactic variables used in productions to where they are defined, and where the defined syntactic variables are used. It also collects into distinct documents each of the token kinds used in productions. Locators provide {clause and page:line} references, to facilitate use with the ISO 8879 SGML standard and The SGML Handbook.

This summary omits the text found therein that supports understanding, other than the clause and sub-clause headings, and the suggestions on semantics inherent in the names of the syntactic productions and the tokens used in their definitions.

2 Field of Application

The purpose of SGML syntax is to describe:

markup languages that will permit proper SGML parsing and validation
different applications designed to such descriptions
application document markup consonant with such a design

3 References

The SGML Handbook includes the ISO 8879-1986(E) SGML standard, and provides many significant extensions. These help with understanding the many fine points of that standard: historic, syntactic, semantic, and pragmatic.


International Standard ISO 8879 Information Processing - Text and Office Systems -
Standardized Generalized Markup Language (SGML), First Edition - 1986-10-15

UDC 681.3.06 Ref. No. ISO 8879-1986(E)
Copyright © 1986, 1988 International Organization for Standardization

Included are the revisions from

ISO 8879-1986(E) Amendment 1 (Final Text with Ballot Comments Resolved), undated, late fall 1987
Further corrections are included to that Amendment 1, also undated, received May 1988, and incorporated 1 July 1988.

ISO 8879-1986(E) SGML Syntax Summary

The earlier printed version of this Syntax Summary contains much of the material in this document. I widely distributed it, and provided it to the Graphic Communications Association, as a service to the SGML community. It is now out of print. So I have created this electronic version from it, augmented with direct hyperlinks.

ISO 8879-1986(E) SGML Syntax Summary
Harvey Bingham, Interleaf, Inc. June 8, 1988; Corrected September 22, 1992

The SGML Handbook

The SGML Handbook contains extensive explanation and is the authoritative reference. It includes the Standard material, with additional tutorial and commentary by the author. The editor provided an extensive cross-index, that exposes the limits of a paper index in indirect referencing, both through the limited differentiation among references to a single concept, and omission of "where-used" references. I believe that the hyperlinks herein are better focused and hence more useful for the more limited purpose of studying the Syntax of SGML.

The SGML Handbook, © 1990 Charles F. Goldfarb,
edited by Yuri Rubinsky,
ISBN 0-19-853737-9

Oxford University Press,
Walton Street, Oxford OX2 6DP
New York, New York

In the production listings, the "reference" {clause, page:line} locator provides cross-referencing into The SGML Handbook. The list of production-name[] pairs following the "used in:" includes all the other productions that use this production-name as a syntactic variable in their definitions.

ISO/IEC 10179:1996 DSSSL

DSSSL is the standard for specifying transformations on SGML document instances and associating style information.

ISO/IEC 10179:1996, April 1, 1996, © ISO/IEC 1996
Information technology -- Text and office systems --
Document Style Semantics and
Specification Language

See the DSSSL Syntax Summary for the similar set of documents, also by Harvey Bingham, that augment information extracted from the DSSSL standard.

ISO standards are available from national standards bodies.

4 Definitions

The 432 definitions provide concept clarification. Many of the productions in the syntax gain meaning from the definitions contained in this clause of the standard.

5 Notation

The syntactic productions are shown in order of production numbers. The form used for each syntactic variable is:

[prod. no.] defined syntactic variable name =
definition, one line per syntactic token possibly with metacharacters
reference {clause, page:line}, used in
list of syntactic variables, each linked to the production where this production is used

[prod. no.] refers to one of the 211 Production numbers. The values range from [1] to [204]. In the Amendment, nine with decimal parts were added: [5.1], [5.2], [35.1], 87.1], [1491.], [149.2], 163.1], [166.1], and [168.1].
Two productions were explicitly deleted without renumbering the originals: [98] and [99].

The first line defines the syntactic term, the "left hand side" of the production. Its defined syntactic variable name suggests its use. It is often a phrase, usually in lower case. The "=" separates the name from its defining expression.

The definition, the "right hand side" of the production, contains a separate line for each syntactic token. The tokens are in the same order as in the standard, (where they are run-on to appear usually on one line), but rather are here spread vertically, the form used originally in the SGML Syntax Summary that was adopted in The SGML Handbook.

The metacharacter symbols before and/or after the syntactic tokens indicate their precedence "( )", selection "? * +", and ordering ", | &".

The definition contains one or more lines, each with one of the five forms of syntactic tokens, possibly with metacharacter symbols before and/or after the token

5.1 Syntactic Tokens

Token Kind
What it represents
syntactic variable
left-hand-side of a production, defines a non-terminal token, the name of a production. Preceded by [number of that production] and followed by "=". See SGML Syntactic Variables for an alphabetized list, linked to the productions where defined herein.
syntactic variable
right-hand-side, in a production definition.
reference: {locator}, used in: list of productions in which the syntactic variable appears at least once on the right-hand-side as part of a definition.
In both cases, the suffix [production-number] is a hyperlink to where that production is defined.
The literal string itself. If upper-case translation of general names is specified by the concrete syntax, then the corresponding lower case letters can be used the same as upper case. No such substitutions are needed or used in the descriptions. See SGML Keyword Syntactic Literals for a full list, with links to the productions where used herein.
In use, the surrounding quotation marks are omitted.
The reserved name indicator rni # precedes some syntactic literals.
delimiter role
A delimiter string. In the descriptions is the reference delimiter (from ISO 8879-1986, Figures 3 and 4 Character Classes, pages 31 and 33, {9.6.1, 360:0}. See SGML Reference Delimiter Roles for links to the syntactic variables where used herein.
One of 14 defined character classes whose character contents may differ across SGML documents. In the explanation is the symbol (or symbols) assigned by the concrete syntax (from ISO-8879, Figure 2 - Character Classes: Concrete Syntax, page 30, and {9.2.1 345:0}.) See SGML Terminal Variables for their meanings and links to productions in which they occur herein.
Terminal Constant
Either an entity end signal Ee or a character class common to all SGML documents:
LC Letter a-z
UC Letter A-Z
Digit 0-9
Special ' ( ) + , - . / : = ?
(from ISO-8879, Figure 1 - Character Classes: Abstract Syntax, page 29, and {9.2.1 345:0}.) See SGML Terminal Constants for their meanings and links to productions where they occur herein.

5.2 Ordering and Selection Symbols

The metacharacter symbols used as the metalanguage in the productions provide precedence, occurrence, and ordering. These are used in the same way as in a document type declaration.

Precedence (...) Parentheses before Occurrence before Ordering

Occurrence Default is 1 time, unless one of the following suffixes appear
? Optional (0 or 1 time)
+ Required and repeatable (1 or more times)
* Optional and repeatable (0 or more times)
, All must occur in the order shown
& All must occur in any order (allowed in SGM document type declarations, unused in any production)
| One and only one must occur

Continue with the SGML Syntax Summary.

Harvey Bingham's home page

SGML Syntax Summary original 8 June 1988
Corrected 10 January 1992
Expanded and converted to HTML 26 Mar 1996
Updated 28 May 1996
Changed return mail 8 Nov 1996

Copyright restrictions:
This material may be used freely for the purposes of studying SGML and promoting its application. This copyright notice shall be included in any subsequent copies. The author reserves the right to update this material and to determine the primary server on which it is available.