[Mirrored from: http://www.sgmlsource.com/8879rev/n1035.htm]
ISO/IEC standards are reviewed at least once every five years to determine whether they are still applicable or whether they should be withdrawn. Such reviews frequently result in the publication of a revised edition of the standard.
ISO 8879 was published October 15, 1986. It is the expectation of its developers (ISO/IEC JTC1/SC18/WG8) that a review will result in republication with editorial changes and possibly some new technical enhancements. The purpose of this document is to record in one place all such changes that have been agreed to by the developers. Accordingly, this document incorporates and replaces WG8 N931 and WG8 N1013.
This document should be read carefully and taken at face value. In particular, it cannot be stated with certainty that a revision of ISO 8879 will ever be published, or that, if one is published, that any of these accepted items will find their way unmodified into the final draft.
Items are listed in order by clause number. General comments precede those relating to specific clauses. Each item is preceded by a two-letter code indicating the status of the item and the type of change involved. If the source of the item is a WG8 document, the document number and item number within that document, if any, are given in parentheses. (The attachments to WG8 N680 are N680A, N680B, and N680C.)
The status codes are:
The types of change code are:
Items coded E and R
reflect the developers' understanding of SGML
as defined by the existing text of ISO 8879. Items coded T
represent modifications to the SGML language that will not come into
effect unless and until a revision of ISO 8879 is published.
Annex B: Tutorial on basic SGML concepts
Annex C: Tutorial on additional SGML concepts
Annex D.3: Variant concrete syntaxes, including multicode concrete syntaxes
Annex D (except D.3): Public entity sets
Annex E.1: Example of document type definition
Annex E.2: Computer graphics metafile
Annex E.3: Device-independent techniques for code extension
ISO 8879-1986should be
ISO 8879:1986.
element, or instance of an element type when emphasis on the type is desired.
The footnotes identifying certain references as being at present at the stage of draft should be deleted, as those standards are now available in final form.
The full text of the revised definitions is given, rather than change instructions. Although this approach adds to the size of this report, it makes it easier to see the effect of the changes. All definitions are coded AE. (N680B 20 23-24 26-28, N680C 1 7 11)
Markup that is a set of one or more attribute specifications.
Attribute specification lists occur in start-tags, entity declarations, and link sets.
An ordered collection of bits, interpretable as a binary number.
A bit combination should not be confused with a byte, which is a name given to a particular size of bit string, typically seven or eight bits. A single bit combination could contain several bytes.
A number that represents the base-10 integer equivalent of the coded representation of a character.
A set of characters that are used together. Meanings are defined for each character, and can also be defined for control sequences of multiple characters.
When characters occur in a control sequence, the meaning of the sequence supercedes the meanings of the individual characters.
A mapping of a character repertoire onto a code set such that each character in the repertoire is represented by a bit combination in the code set.
Techniques for including in documents the coded representations of characters that are not in the document character set.
When multiple national languages occur in a document, graphic repertoire code extension may be useful.
A set of bit combinations of equal size, ordered by their numeric values, which must be consecutive.
For example, a code set whose bit combinations have 8 bits (an 8-bit code) could consist of as many as 256 bit combinations, ranging in value from 00000000 through 11111111 (0 through 255 in the decimal number base), or it could consist of any contiguous subset of those bit combinations.
The location of a bit combination in a code set; it corresponds to the numeric value of the bit combination.
The representation of a character as a single bit combination in a code set.
A coded representation is always a single bit combination, even though the bit combination may be several 8-bit bytes in size.
An SGML document that complies with all provisions of this International Standard.
The provisions allow for choices in the use of optional features and variant concrete syntaxes.
An element that is not a contextually optional element and
whose generic^identifier is the document^type^name; or
whose currently applicable content^token is a contextually required token.
An element could be neither contextually required nor contextually optional; for example, an element whose currently applicable content^token is in an or group that has no inherently optional tokens.
The rank^suffix that, when appended to a rank stem in a tag, will derive the element's generic identifier. For a start-tag it is the rank^suffix of the most recent element with the identical rank^stem, or a rank^stem in the same ranked^group. For an end-tag it is the rank^suffix of the most recent open element with the identical rank^stem.
An entity that was declared to be data and therefore is
not parsed when referenced.
A content^token that associates a data tag pattern with a target element type.
Within an instance of a target element, the data content and that of any subelements is scanned for a string that conforms to the pattern (a data tag).
Markup that describes the structure and other attributes of a document in a non-system-specific manner, independently of any processing that may be performed on it. In particular, SGML descriptive markup uses tags to express the element structure.
A markup declaration that formally specifies a portion of a document type definition.
A document type declaration does not specify all of a document type definition because part of the definition, such as the semantics of elements and attributes, cannot be expressed in SGML. In addition, the application designer might choose not to use SGML in every possible instance -- for example, by using a data content notation to delineate the structure of an element in preference to defining subelements.
Rules, determined by an application, that apply SGML to the markup of documents of a particular type.
Part of a document type definition can be specified by an SGML document type declaration. Other parts, such as the semantics of elements and attributes, or any application conventions, cannot be expressed formally in SGML. Comments can be used, however, to express them informally.
A portion of a tag that identifies the document instances within which the tag will be processed.
A name^group performs the same function in an entity reference.
A set of element, attribute definition list, and notation declarations that are used together.
An element set can be public text.
A link set that contains no link rules.
A collection of characters that can be referenced as a unit.
A link process definition in which the result element types and their attributes and link attribute values can be specified for multiple source element types.
An entity whose replacement text is not incorporated in an entity declaration; its system identifier and/or public identifier is specified instead.
A parameter that identifies an external entity or data
content notation.
There are two kinds: system identifier and public
identifier.
A document type or link type declaration can include the
identifier of an external entity containing all or part of
the declaration subset; the external identifier
serves simultaneously as a declaration of that entity and
as a reference to it.
An error in the construction or use of a formal public identifier, other than an error that would prevent it being a valid minimum literal.
A formal public identifier error can occur only if FORMAL YES is specified on the SGML declaration. A failure of a public identifier to be a minimum literal, however, is always an error.
A delimiter role other than short reference.
A character that is not a control character.
For example, a letter, digit, or punctuation. It normally has a visual representation that is displayed when a document is presented.
The portion of a parameter that is bounded by a balanced pair of grpo and grpc delimiters or dtgo and dtgc delimiters.
There are five kinds: name group, name token group, model group, data tag group, and data tag template group. A name, name token, or data tag template group cannot contain a nested group, but a model group can contain a nested model group or data tag group, and a data tag group can contain a nested data tag template group.
A link process definition in which the result element types and their attributes are all implied by the application, but link attribute values can be specified for multiple source element types.
An entity whose replacement text is incorporated in an entity declaration.
A parameter that is a reserved name.
In parameters where either a keyword or a name defined by an application could be specified, the keyword is always preceded by the reserved name indicator. An application is therefore able to define names without regard to whether those names are also used by the concrete syntax.
A member of a link set; that is, for an implicit link, a source^element^specification, and for an explicit link, an explicit^link^rule.
A named set of rules, declared in a link^set^declaration, by which elements of the source document type are linked to elements of the result document type.
The entity sets, link attribute sets, and link set declarations, that occur within the declaration subset of a link type declaration.
The external entity referenced from the link type declaration is considered part of the declaration subset.
Character class consisting of each lower-case name character assigned by the concrete syntax.
Character class consisting of each lower-case name start character assigned by the concrete syntax.
Markup that controls how other markup of a document is to be interpreted.
There are 13 kinds: SGML, entity, element, attribute definition list, notation, document type, link type, link set, link set use, marked section, short reference mapping, short reference use, and comment.
An entity reference consisting of a delimited name of a general entity or parameter entity (possibly qualified by a name group) that was declared by an entity declaration.
A general entity reference can have an undeclared name if a default entity was declared.
A data entity in which a non-SGML character could occur.
The portion of a markup declaration that is bounded by ps separators (whether required or optional). A parameter can contain other parameters.
A subelement that is permitted by its containing element's model.
An included subelement is not a proper subelement.
A name from which a generic identifier can be derived by appending a rank^suffix.
A failure of a document to conform to this International Standard when it is parsed with respect to the active document and link types, other than a semantic error (such as a generic identifier that does not accurately connote the element type) or:
an ambiguous content model;
an exclusion that could change a token's required or optional status in a model;
exceeding a capacity limit;
an error in the SGML declaration;
an otherwise allowable omission of a tag that creates an ambiguity;
the occurrence of a non-SGML character; or
a formal public identifier error.
A character string that separates markup components from one another.
There are four kinds s, ds, ps, and ts.
A character class composed of function characters other than RE, RS, and SPACE, that are allowed in separators and that will be replaced by SPACE in those contexts in which RE is B replaced by SPACE.
A program (or portion of a program or a combination of programs) that recognizes markup in SGML documents.
If an analogy were to be drawn to programming language processors, an SGML parser would be said to perform the functions of both a lexical analyzer and a parser with respect to SGML documents.
Short reference string.
A link process definition in which the result element types and their attributes are all implied by the application, and link attribute values can be specified only for the source document element.
An entity whose text is treated as system data when referenced. The text is dependent on a specific system, device, or application process.
A specific character data entity would normally be redefined for different applications, systems, or output devices.
A declaration, included in the documentation for a conforming SGML system, that specifies the features, capacity set, concrete syntaxes, and character set that the system supports, and any validation services that it can perform.
An element whose generic^identifier is specified in a data^tag^group.
The portion of a group, including a complete nested group (but not a connector), that is, or could be, bounded by ts separators.
Clarify governing principle that the parsing of a document instance shall not be affected by the concurrent parsing of other document instances. For example, the replacement text of an entity reference could differ from one active concurrent instance to another. Also, a record end could be ignored in one instance and not in another.
Caution the user that short references in the base document instance are treated as data in other concurrent instances.
Replace
For example, in
For example, in the following three records:
In first sentence, change a coded to: one and only one coded
Replace last paragraph with:
The public^identifier should be a formal^public^identifier with a public^text^class of CHARSET.
Replace last paragraph with:
The
Replace last paragraph with:
The public^identifier should be a formal^public^identifier with a public^text^class of SYNTAX.
Change a coded to: one and only one coded
Clarify that a parameter literal in the SGML declaration is interpreted as though its character set were the syntax-reference character set. Therefore, a character can be entered directly in a parameter literal only if it has the same character number in the document character set as in the syntax-reference character set. If not, it must be entered as a character reference.
Resolve conflict between intent of text and syntax production
rule, which restricts the declared concrete syntax, by treating
production 189 as though each occurrence of ps+,
parameter^literal were replaced by (ps+,
parameter^literal)+, and by replacing each occurrence of the
word literal in the text with literals
.
Add new paragraph:
The length of a delimiter string in the delimiter set cannot exceed the NAMELEN quantity of the quantity set.
In production 193, replace second name with parameter^literal and replace first paragraph with:
The name is a reference reserved name that is replaced in the declared concrete syntax by the interpreted parameter^literal, which must be a valid name in the declared concrete syntax.
Add new note before the existing first note:
The list of reference reserved names that can be replaced in a declared concrete syntax is:
In last sentence of first paragraph, change the period to: , which must exceed the reference value. The resulting quantity set must be rational.
For example, TAGLEN must be greater than LITLEN because literals occur in start-tags. Similarly, LITLEN must exceed NAMELEN because names occur in literals.
In second sentence, change a coded to: one and only one coded
Before SRLEN, insert:
For ATTSPLEN, replace description with: Normalized length of
an
Replace
In line 1, jobitem should be delimited by LIT.
Replace last two lines:
Delete last example and its discussion, which begins at the last paragraph of p.105 and continues through I wonder whether Mrs. G will read this on p.106.
Replace example with:
There are two kinds of SGML application (and therefore two kinds of conforming SGML application):
A structure-controlled SGML application operates only on the element structure that is described by SGML markup, never on the markup itself.
A markup-sensitive SGML application can act on the actual SGML markup and can act on element structure information as well. Examples include SGML-sensitive editors and markup validators.
The set of information that is acted upon by implementations of structure-controlled applications is called the element structure information set (ESIS). ESIS is implicit in ISO 8879, but is not defined there explicitly. The purpose of this paper is to provide that explicit definition.
ESIS is particularly significant for SGML conformance testing because two SGML documents are equivalent documents if, when they are parsed with respect to identical DTDs and LPDs, their ESIS is identical. All structure-controlled applications must therefore produce identical results for all equivalent SGML documents. In contrast, not all markup-sensitive applications will produce identical results from equivalent documents. (For example, a program that prints comment declarations, or that counts the number of omitted end-tags.)
ESIS information is exchanged between an SGML parser and the rest of an SGML system that implements a structure-controlled application. Although an implementation may choose to wire in some of ESIS, such as the names of attributes, a structure-controlled application need have no other knowledge of the prolog than what ESIS provides.
A system implementing a structure-controlled application is required to act only on ESIS information and on the APPINFO parameter of the SGML declaration.
This requirement does not prohibit a parser from providing the same interface to both structure-controlled and markup-sensitive applications, which could include non-ESIS information (e.g., the date), and/or information that could be derived from ESIS information (e.g., the list of open elements).
The documentation of a conforming SGML system that supports
user-developed structure-controlled applications should make application
developers aware of this requirement. Such a system should facilitate
conformance to this requirement by distinguishing ESIS information
from non-ESIS in its interface to applications. Note 1 in
In the following description of ESIS, information is identified as being available at a particular point in the parsed document. This identification should not be interpreted as a requirement that the information actually be exchanged at that point — all or part of it could have been exchanged at some other point. Similarly, there is no constraint on the manner (e.g., number of function calls) or format in which the exchanges take place.
The ESIS description includes the information associated with all of the SGML optional features. When a given feature is not in use, corresponding information is not present in the document. ESIS information is transmitted from the parser to the application unless otherwise indicated.
ESIS information applies to a single parsed document instance. Therefore, if concurrent instances are being parsed, the applicable document type name must be identified. This requirement also applies when parsing intermediate instances in a chain of active links.
ESIS information consists of the identification of the following occurrences, and the passing of the indicated information for each:
Initialization
The application must inform the SGML parser of the active document types, the active link types, or that parsing is to occur only with respect to the base document type.
Start of document instance set
For each active LPD, the link type name and link set information (see
Start of document element only
For each active simple link, the link type name and attribute
information (see
Start of any element
Generic identifier
Attribute information for the start-tag.
For each link rule for which this element is an associated element type, attribute information for the link attributes.
The application must inform the SGML parser which applicable link rule it chose.
For the chosen link rule, the result GI and attribute information for the result element.
If the element has an associated link set, the link set information.
End of any element, including elements declared to be empty
Generic identifier
If the element was empty, ESIS does not indicate why it was empty; that is, whether it was declared to be empty, or whether an explicit content reference occurred, or whether it just happened to contain no data characters.
End of document instance set
Processing instructions could occur between the end of the document element and the end of the document instance set.
Processing instruction
System data
Data
Includes no ignored characters (e.g., record starts).
Includes only significant record ends, with no indication of how significance was determined. Characters entered via character references are not distinguished in any way. Implementation-specific means can be used to represent bit combinations that the application cannot accept directly.
Such bit combinations may be those of non-SGML characters entered via character references, but no significance is attached to this coincidence.
Bit combinations of non-SGML characters that occurred directly in the source text would have been flagged as errors, and would therefore never be treated as data.
Attribute information
All attribute values must be reported and associated with their
attribute names.
For example, a parser could supply the attribute names
with
each value, or supply the values in an order that corresponds to a
previously-supplied list of names. The order of the tokens in
a tokenized attribute value shall be
preserved as originally specified.
Each unspecified impliable attribute must be identified.
For example, a parser could identify such attributes explicitly, or it could allow the application to determine them by comparing the identified specified attribute values to a previously-supplied list of attribute names.
There shall be no indication of whether an attribute value was the default value.
The order in which attributes are specified in the attribute specification list is not part of the ESIS.
General entity name attribute values include the entity name and entity text. The entities themselves are not treated as having been referenced.
An application can use system services to parse the entities, but such parsing is outside the context of the current document.
For notation attributes, the attribute value includes the notation name and notation identifier.
For CDATA attributes, references to SDATA entities in attribute value literals are resolved. The replacement text is distinguished from the surrounding text and identified as an individual SDATA entity.
For CDATA attributes, references to CDATA entities in attribute value literals are resolved. The replacement text is not distinguished from the surrounding text.
References to internal entities
The information passed to the application depends on the entity type:
replacement text, identified as an individual SDATA entity.
replacement text, identified as a processing instruction but not as an entity.
For other references, nothing is passed to the application.
The replacement text is parsed in the context in which the reference occurred, which can result in other ESIS information being passed.
References to external entities
The information passed to the application depends on the entity type:
For data entities, the entity name and entity text are passed. If a notation is named, the notation name, notation identifier, and attribute information for the data attributes are also passed.
For SGML text entities, nothing is passed to the application.
The replacement text is parsed in the context in which the reference occurred, which can result in other ESIS information being passed.
For SUBDOC entities, the entity name and entity text are passed. The application can require that the subdocument entity be parsed at the point at which the reference occurred.
Parsing of the subdocument entity can result in other ESIS information being passed. The occurrence of the end of the document instance set of the subdocument entity will indicate that subsequent ESIS information applies to the element from which the subdocument entity was referenced.
Link set information
All link rules whose source element specification is implied.
Thanks to Rick Jelliffe for converting this document to HTML.
[Link to ISO 8879 Review Current Information Set]