[Archive copy mirrored from: http://www.sgmlsource.com/8879rev/n1925.htm]

ISO/IEC JTC 1/SC 18/WG8 N1925


Document Processing and Related Communication --

Document Description and Processing Languages

TITLE: Report of the SGML Rapporteur Group (Barcelona Meeting)
PROJECT EDITOR: Charles F. Goldfarb
STATUS: Approved recommendations
ACTION: For information
DATE: 9 May 1997
DISTRIBUTION: WG8 and Liaisons
REPLY TO: Dr. James David Mason
(ISO/IEC JTC1/SC18/WG8 Convenor)
Lockheed Martin Energy Systems
Information Management Services
1060 Commerce Park, M.S. 6480
Oak Ridge, TN 37831-6480 U.S.A.
Telephone: +1 423 574-6973
Facsimile: +1 423 574-0004
Network: masonjd@ornl.gov

Report of the SGML Rapporteur Group

Barcelona, Spain
9 May 1997

The SGML Rapporteur Group met from May 5 through May 9, 1997 with 24 members present from seven National Bodies (Australia, France, Japan, Norway, Sweden, U.K., and U.S.) and two liaison organizations (CERN and International SGML Users Group).

We had intended to continue the clause-by-clause review of 8879 begun at our Boston meeting (see WG8 N1893) but were prevented from doing so by the urgent need to develop the WebSGML Technical Corrigendum (see WG8 N1928).

We developed the WebSGML Technical Corrigendum (WG8 N1929), which received unanimous approval of the members present. It will be balloted over the next six months and final approval and publication are expected to be authorized at our meeting in December in Washington, D.C.

We began development of the specification for multiple name spaces for element types, considering the proposals for element type modularity and typed subdocuments (see WG8 N1873 for one proposal).

We agreed that the following are additional general topics for consideration in the revision:

1. The creation of distinguishing syntax in the document instance for privileged semantics, such as elements whose types were declared empty, elements that are empty because of a specified CONREF attribute, id attributes, et. al. (At present, the instance syntax contains only the constructs needed to allow the grove to be constructed.)

2. The responsibilities of parsers with respect to the SGML declarations of earlier versions, if they differ from that of the current version.

A number of items were considered in the course of developing the WebSGML TC that were deemed better left for consideration later in the development of the revision. They are attached below. Please note that these items were not approved for inclusion in the revision; they are merely items for future discussion.

For Future Discussion

Optional syntax simplifications

Each option is independently specifiable. They are implemented either by failing to assign a string to a delimiter role (that is, specifying it as #UNUSED), or by keywords to be added to the SGML declaration, or by other designated means.

Comment options

SIMPLCOM A comment declaration consists of exactly one delimited comment and comments are not allowed in ps.

EMPTYCOM Empty comment declarations are permitted.

[Committee: Is second COM recognized if not followed by MDC?]

Reference end is restricted to REFC

It is required in order to terminate a reference.

A marked section declaration is restricted to a single status


It must immediately follow the first DSO and must immediately be followed by the second DSO. In the prolog, the keyword must be either INCLUDE or IGNORE, which can be specified via a parameter entity reference. In the instance, it must be CDATA and cannot be specified via a parameter entity reference. TEMP cannot be specified in either place.

Ignore some contextual constraints

A delimiter that occurs in a mode where data can occur (e.g., CON and LIT modes) with a contextual constraint that it be followed by a name start character, a digit, or a hexadecimal digit is recognized even if this contextual constraint is not satisfied.

Note: When a delimiter is recognized and the contextual constraint is not satisfied, it is a reportable markup error. This means that delimiters such as STAGO and ERO will be recognized even when not followed by a name start character, and an error will be reported. Therefore, it is safest to escape STAGO and ERO whenever they occur as data because text like "R & D costs <100 000 EUROs" is invalid when SIMPLE markup is used with the reference concrete syntax.

Restrict parameter entity references

Parameter entity references are permitted only as references to specific individual parameters. [The standard will provide a list of parameters for which references can be used. Synactic variable names will be used to identify the parameters. The SGML declaration will specify which are usable for a given document or profile.]

Prohibit specific declaration parameters

Specific parameters of markup declarations can be prohibited by an SGML declaration. [The standard will provide a list of parameters that can be prohibited. Synactic variable names will be used to identify the parameters. The SGML declaration will specify which are usable for a given document or profile.]

Require redundant parameter separators

A ps is required even when omitting it would not cause an ambiguity.

Attribute definitions


"#ALLFIXED" can be specified as the associated element type name (or notation name) in an attribute definition list declaration. If so, the definitions are associated with all element type names (or notation names). Definitions associated with #ALLFIXED cannot be overridden.

Undefining attributes

An attribute defined for #ALL can be undefined for a specified element type or notation (including the implicitly defined element types identified as #IMPELEM) by using a definition consisting of the attribute name and the keyword UNDEFINE.

Note: For example, to undefine the attribute att1 for elem1:

In global subset:


The attribute can then be redefined:

Extensions to FORMAL: processing instructions, data marked


(adapted from Rick Jelliffe)

Formal processing instructions

If formal processing instructions are supported, a PI must start with a notation name:

The notation name cannot be SGML, which is reserved for ISO 8879 use, or IS*n, which is reserved for other International Standards or Internationally Standardized Profiles.

Formal data marked sections

If formal data marked sections are supported, the simplified marked section syntax is used and a data marked section keyword can be a notation name (which cannot be CDATA, RCDATA, TEMP, IGNORE, or INCLUDE).

Note: The simplified marked section syntax allows only a single status keyword.

Data marked sections are not parsed, except to locate the marked-section close, and cannot nest. The CDATA and RCDATA status keywords identify data marked sections whose notations are intrinsically defined by ISO 8879. Data marked sections cannot occur in the prolog, even if empty.

The white space in content handling rules do not apply to formal data marked sections.

Note: A notation could be defined that applied the parser's WSCON rules as the first step in interpreting the text, but the SGML parser would not do so when constructing the SGML grove. An implementation could offer an API that would perform this service for a notation handler.

[Should we have an SGML declaration switch for whether omitted system identifiers are allowed? If they are, should omitted entity and notation declarations be legal, and interpreted as omitted system identifiers?]

Global subsets

Function character identification

The function character identification parameter provides a means for names to be associated with characters in the syntax-reference character set. A syntax character name can be used in a named character reference and is equivalent to a numeric character reference to the character's number in the document character set.

Note: Like the rest of the concrete syntax definition, syntax character names are independent of the document character set. They can therefore be used in the global subset.

A new function class, "syntax character treated as data" (SYNDATA), is added. A character number can be assigned to this class alone, or in addition to one other class. The names can be specified in parameter literals and are valid names in the prolog concrete syntax. They are used in the same way that named character references are used (and therefore cannot duplicate those names), but, unlike named character references, the replacement characters are always treated as data characters. The character number of the replacement character is the number in the document character set that corresponds to the specified syntax-reference character set number.

Productions [187] and [188] are replaced with:

[187] added function = name | parameter literal

[188] function class = "SYNDATA" | "FUNCHAR" | "MSICHAR" | "MSOCHAR" | "MSSCHAR" | "SEPCHAR"

where SYNDATA means a syntax character treated as data.

For example:

FUNCTION RE 13 RS 10 SPACE 32 SPC SYNDATA 32 "TAB" SEPCHAR 9 "amp" SYNDATA 38 "lt" SYNDATA 60 "gt" SYNDATA 62 "quot" SYNDATA 34 "apos" SYNDATA 39

Global subsets

An optional "global subset declaration" is permitted immediately after the SGML declaration. It has an internal declaration subset containing a set of markup declarations. Markup declarations in the subset are in the prolog concrete syntax. They are treated as though occurring before the internal subset of all DOCTYPE declarations. For example:


Note: In practice, global subsets will typically appear in profile definition documents and will be referenced implicitly by the profile header.


A new sub-parameter of the naming rules parameter, the profile reserved name indicator (PRNI), identifies one or more NAMES in the prolog syntax, each of which acts as a reserved prefix.

For example:


A name of an element type, entity, attribute, or notation beginning with a PRNI cannot be declared in the first instance anywhere but in the global declaration subset.

Note that an attribute declared for #ALL element types can be redeclared in a DTD in the usual manner.

In the global declaration subset only, a definition for an attribute name beginning with a PRNI can be declared with an associated element type (or notation) of "#FIXED". This declaration does not associate the attribute definition with an element type (or notation), but permits such associations to be made in a DTD, using exactly the same reserved attribute name and definition.

Note: Unlike #ALL, #FIXED allows an attribute name and definition to be reserved without forcing every element type (or notation) to provide the attribute.

Profile header

The profile header occurs at the start of the SGML document entity.

The external identifier identifies the profile definition document, which can include an SGML declaration and/or a global subset declaration. If no external-identifier is specified, "SYSTEM" is assumed.

[Note to committee: Should we simply allow all external identifiers to be empty, in which case they are treated as omitted system identifiers?]

The header is stripped and processed by the entity manager and is not seen or parsed by the parser.

Note: An entity manager is a component of an SGML system that interfaces between the parser and the real storage managers, such as file systems, data bases, and networks. It receives real storage objects and prepares and presents them to the SGML parser as entities. An entity manager that supports this feature must be constructed to know inherently how to process system headers for its system and for any corresponding systems that it recognizes.

Multiple headers

The SGML declaration provides a new feature, "multiple headers" (MULTIHDR), with the values YES or NO, which is the default. If YES is specified, a system header is permitted at the start of all storage objects in which SGML entities are stored. For example: