[Mirrored from: http://www.ornl.gov/sgml/wg8/document/1893.htm]

WG8 N1893


Document Processing and Related Communication—

Document Description and Processing Languages

TITLE: Fourth Interim Report on the Project Editor's Review of ISO 8879
PROJECT: JTC1.18.15.1
PROJECT EDITOR: Dr. Charles F. Goldfarb
STATUS: Approved report
ACTION: For information
DATE: 23 December 1996
DISTRIBUTION: WG8 and Liaisons
REPLY TO: Dr. James D. Mason
(ISO/IEC JTC1/SC18/WG8 Convenor)
Oak Ridge National Laboratory
Information Management Services
Bldg. 2506, M.S. 6302, P.O. Box 2008
Oak Ridge, TN 37831-6302 U.S.A.
Telephone: +1 423 574-6973
Facsimile: +1 423 574-6983
Network: masonjd@ornl.gov

At its Boston meeting, WG8 began our systematic clause-by-clause review of ISO 8879. We got as far as 11.3.4 systematically, but other clauses were discussed as a by-product and are reflected in this report (though incomplete). The clauses after 11.3.4 will be covered systematically at our next meeting.

N1855, which contains the cumulative decisions taken prior to this meeting, was not formally considered during this work, although some topics were not discussed because they were known to have been covered in N1855. A consolidated report will be developed in the future; for now, this report should be considered a supplement to N1855. In particular, the procedure for conducting the review, and the policy requiring that existing conforming documents remain conforming, apply as described in that report.

For each clause we agreed on the topics to be considered for revision; these are listed below by clause number. In some cases, we agreed on the resolution of an issue; such agreements are indicated by "(A)" at the start of the topic description. In other cases we rejected a proposal; these are indicated similarly, but by "(R)". These dispositions indicate the current thinking of WG8, but should not be considered final. In some cases they contradicted the agreements in N1855; the conflicts will be resolved when the two documents are reconciled.

WG8 will establish a closed mail group, restricted to delegation members appointed by their Heads of Delegation, in which these topics will be discussed. Although the mail group is private, WG8 will continue to produce interim reports like this one to keep the SGML user community aware of the revision activity.


a. (A) There shall be a defined simple-as-possible-but-useful conformance level of SGML, tentatively called "simple SGML". It will have a simplified declaration syntax and it will be possible to parse the element structure without reference to the DTD.

b. (A) There shall be a defined form of an SGML instance that is optimized for self-contained delivery over a network for read-only applications, and whose "ESIS" grove plan is capable of being constructed without reference to its DTD (SGML-D).

c. (A) Each conforming SGML system shall have a "system definition document" that will contain declarations for certain defaulted constructs (such as the default PI notation when formal PIs are in use) and the information currently in the SGML system declaration.

6.2 SGML Entities

a. Discuss production [2]

b. Boundary between prolog and document instance set needs to be unambiguous.

6.2.3 Implied SGML Declaration

a. Clarify that SGML declaration is syntactically optional but there is a defaulting mechanism. For a subdocument, the default SGML declaration could be that of the document from which it is referenced. For a document, it could be defined in the system definition document.

6.3 Data Entities

a. [6] How do NDATA entities relate to the character/entity model?

7.1 Prolog

a. [7] Would allowing linktype declarations to precede the base document type declaration help to solve problems with entity handling in linktype declarations?

b. Substantial rewrite is required and design must be tied to groves. In particular, the terms "active" and "chain" need clarification.

7.3 Element

a. Should we allow empty elements to have end-tags? If so, are they affected by the omittag feature? Can any data or "other content" occur between the start-tag and end-tag?

b. Should there be separate delimiter roles for opening and closing of start-tags, end-tags, and/or start-tags of required-to-be-empty elements (i.e., declared empty or with specified conref attribute)? Should it be usable for elements with #pcdata content models that just happen to be empty (i.e., in order to avoid entering an end-tag when omittag is not in use)? Note that this markup will blur the distinction between an empty element and one with zero-length content. Start-tag omission

a. This may be broken. It will be discussed.

7.3.2 Data Tag Minimization

a. Additional explanation is needed.

7.6 Content

a. Should there be a means of specifying that CDATA and RCDATA elements are terminated only by their own end-tags (or by the ETAGO, GI,SPACE sequence, if TAGLEN is not in use)? This would (or wouldn't it) make these constructs useful, whereas today they are not. The current rules, however, simplify implementation of an SGML-aware text editor because there is no need for long trial changes of parsing state. This change could not be made mandatory without affecting existing documents that use OMITTAG.

7.6.1 Record Boundaries

a. Shall RE handling be an optional feature?

b. Shall there be options of "ignore all REs" and "include all REs"?

c. REs that are ignored shall not affect conformance to the content model.

d. Shall deferring REs to the next data be optional (and with an associated quantity to avoid unbounded lookahead)?

e. Shall RE "ignoring" or "preserving" be an optional property of an element type, independently of whether it is an included or proper subelement?

f. Shall RE handling have a menu of options to support "source document formatting", such as: after start-tags, before end-tags, after PIs and declarations, etc.? If so, shall the menu be global, element-type specific, or both?

g. Should any of the RE handling be extended to other white space?

Other questions and discussion can be found in N1875. Omitted attribute name

a. (A) Name omission is possible only when the value token is unique in the attribute definition list.

b. Should it be possible to omit the name when there is only one attribute?

8. Processing Instruction

a. (A) There shall be an optional feature which, if selected, will require the first token of a PI to be a notation name, concatenated to the PIO. The default notation name is SYSPROC. A separate "system definition document" will declare the default notation.

9.1 Replaceable character data

a. (A) Clarify that references to SGML documents or data entities with notations are prohibited, whether external or internal.

9.4 Entity References

a. (A) Clarify that (syntactic) entity references to external notationless CDATA entities cause the replacement text to be treated like an SGML text entity.

b. (A) Clarify that SGML text entities accessed by (syntactic) entity references are considered "constants", in that they are treated like the data of the entities that refer to them and are part of the same grove. Applicable Entity Declaration

a. (A) This will need rewriting, depending on what is done with link and concur.

9.4.5 Reference End

a. (A) There shall be an optional feature(s) specifying whether the reference end can be empty and whether it can be an RE. Equivalent Reference String

a. Consider whether to delete this or move it to an informative annex, with corrections as needed.

9.5 Character Reference

a. (A) Allow numeric character references that are meaningful in languages not restricted to the use of Arabic numerals.

See also N1875.

9.6 Delimiter Recognition

Figure 3: Reference Delimiter Set: General

a. (A) Correct errors in this figure and related text.

b. PIC should be two characters or more, to minimize conflicts.

c. There shall be LITC and LITAC (literal close) delimiter roles.

9.6.5 Short References with Blank Sequences

a. Provide a means of allowing the letter "B" to occur in a short reference delimiter string.

9.8 Capacity

a. (A) This clause will be deleted.

10.1.3 Group

a. Should ts be the same as ps; that is, allowing comments? If so, this would have to be an option because it would break documents in which the com delimiter begins with a character that is also a name start character.

b. Shall there be a reserved element type name (e.g., "#noelem")? It could be used in parameter entities referenced as group members as a default that would keep a group from breaking (which an empty entity would do)?

c. Should the restriction that a ts containing an EE must follow a token be converted to a recommendation?

10.1.6 External identifier

a. Should the character repertoire for public identifiers be other than that of a minimum literal? Should there be multiple categories of public identifier based on the repertoire that each uses. For example, it could be specified in the SGML declaration or we could legislate the use of 10646. If so, should we advise that use of repertoires other than the minimum literal could reduce portability?

b. Some storage managers may have the effect of constraining the character repertoire of system identifiers; it uncertain whether this fact would affect production [75].

c. (A) Clarify that there is always a system identifier, which always identifies the same object, although it never needs to be the real access key used by the storage manager. The ability to omit the system identifier shall be an optional feature, and implies the availability of an external mapping mechanism.

10.1.7 Minimum Literal

a. Shall the set of minimum literal characters be expanded?

b. Shall there be a case-folded variety of minimum literal?

c. Shall SEPCHAR be replaced by SPACE for normalization of minimum literals?

10.2 Formal public identifier

a. FPI syntax is in effect a data content notation. The presentation of the FPI syntax must be defined rigorously. (There may be other situations like this in the standard.) Public Text Class

a. (A) The list of classes shall be a recommendation, not mandatory. Public text language

a. Shall there be an optional country and locale code attached to the language code? Public Text Designating Sequence

a. Further explanation is needed. Public text display version

a. The role of this component should be reconsidered in the light of the altreps data attribute and the entuse storage manager. Additional requirements to be addressed include proprietary representations of SGML entities (e.g., compiled DTDs) and interpreters of data content notations (in contrast to the specifications defining data content notations).

b. Consider using this component, renamed "public text version", to implement a useful revision-level control system. (A different part of SGML could be used instead.) This differs from simple alternative versions in allowing GT/LT comparisons of the revision-level, instead of just equal/not-equal comparisons of version names.

10.3 Comment Declaration

a. (A) The ability to have more or fewer than exactly one comment in a comment declaration shall be an optional feature. When exactly one comment is permitted, the com delimiters shall be concatenated with the adjacent declaration delimiters.

b. There shall be a COMC (comment close) delimiter role.

c. There shall be COMA/COMCA delimiter roles.

10.4 Marked Section Declaration

a. (A) Marked sections shall be an optional feature. It shall be possible to allow or disallow R/CDATA and IGNORE/INCLUDE independently, in either the prolog or instance set, or both, as follows:

1. (Exclusive) No MS at all. 2. (Exclusive) MS as in SGML86. 3. R/CDATA as "tags"; no parameter entity references or white space. 4. INCLUDE/IGNORE in prolog. 5. INCLUDE/IGNORE in instance set.

b. (A) R/CDATA marked sections are not permitted in the prolog.

10.5 Entity Declaration

a. Shall it be possible, for each reference to a "target" entity, to define different replacement text for particular entities referenced within the target entity? This effect could be achieved by declaring one or more entities that are mapped to the target entity, each of which would have a different internal subset (a new addition) in which the particular entities are redeclared. This would apply to both general and parameter entities.

10.5.1 Entity name

a. Shall there be separate name spaces of parameter entities such that DTD developers will be able, for example, to decide which parameter entities can be preempted in the internal subset without having to allow all of them to be preemptable as is the case today? This is a general issue of scoping parameter entity names, so that modules of a DTD can be developed independently. (I.e.,

b. Shall the parameter entity name be expressable as a reserved name space name rather than the open delimiter for entity references in that name space (e.g., #param rather than pero)?

c. Shall it be possible to declare a general entity as non-preemptable?

d. (R) Shall it be possible to declare general entities within a document instance? If so, shall it be restricted to the start of the entity (like a distributed internal subset, more or less)?

10.5.3 Data Text

a. (A) CDATA and SDATA entities can have an optional notation.

10.5.5 External entity specification

a. (A) Notation shall be optional for CDATA entities, meaning that the data is interpreted as individual characters (same as for data content with no notation specified). The same is true for SDATA as well, although the rendition of the "characters" may be an arbitrary graphic.

11.1 Document Type Declaration

a. Shall the internal subset be structured so as to help enforce user policies regarding access to it?

11.2 Element Type Declaration

a. Should it be possible to define modules for element types, such that the subelement type names occupy their own name space? See N1873 for a proposal.

b. Should we provide a means of defining where subdocs can occur in the element structure? This would allow them to be accessed safely by an entity reference.

11.2.3 Declared Content

a. Should RCDATA, CDATA, and/or EMPTY be optional features?

11.2.4 Content Model

a. Should the content model language be enriched to allow control over specific branches of the document instance?

11.3.3 Declared Value

a. (A) Delete restriction on a token only appearing once.

b. Should there be a means of declaring that an entity have certain properties (e.g., notation, internal/external, name from a list of names)? We need to consider how to distinguish semantic properties from addressing mechanics, and the possibility that the addressing mechanics can be handled through HyTime location addressing.

c. Should there be a means of controlling the use of SATA entity references in attribute values?

11.3.4 Default Value

a. (A) It shall be possible to declare a default value for #CURRENT.

b. (A) Default values shall be resolved at the end of the prolog.

NOTE: The systematic review paused here.

13. SGML Declaration

a. (A) [171] Capacity set is optional.

13.2 Capacity Set

a. (A) [180] This parameter is supported for compatibility only. No capacity checking will occur.

13.4 Concrete Syntax

a. (A) It shall not be necessary for the abstract alphabetic and digit character classes to be used in all concrete syntaxes.

13.4.5 Naming Rules

a. Shall we support other forms of case-folding than upper-case?

b. Shall there be more granularity in the application of case-folding than just entity and general? Short Reference Delimiters

a. It will be possible to specify that the short reference delimiters are only those defined in short reference mapping declarations. This will be a shortref minimization feature.

13.4.8 Quantity Set

a. (A) There shall be a single quantity set, fixed in the standard. The quantities shall be minimums that every conforming SGML system shall support, but there shall be no checking to see whether quantities are exceeded. This support requirement applies to individual items; it is possible that a combination of items could exceed memory capacity or other resources, but this would not make a system non-conforming.

13.5.1 Minimization Features

a. (A) Features shall be reorganized to allow the maximum technically feasible modularity. For example, shorttag shall be unbundled and attribute defaulting shall be an independently specifiable option.

b. A "profile" mechanism shall be introduced to allow easy setting of options. Profiles could be named by external identifiers.

15. Conformance

a. (R) The idea of including test cases for each clause of the standard was rejected as impractical given the time and resource constraints. It could be a suitable subject for an additional standard.

15.5.1 Standard Identification

a. Shall standard identification in English be required in addition to the national language?

SGML Extended Facilities

SGML Property Set

a. Should ESIS distinguish must-be-empty elements from happens-to-be-empty elements?

b. Examine relationship of CGR to RAST and see if a single syntax could be used for arbitrary grove plans. Consider how to preserve RASTs support for human readability, such as line length limits, printable characters, and termination of all lines with a visible character.