[Archive copy mirrored from: http://www.sgmlsource.com/8879rev/n1929.htm]
|TITLE:||Proposed TC for WebSGML Adaptations for SGML|
|PROJECT EDITOR:||Charles F. Goldfarb|
|SUMMARY OF MAJOR POINTS:||This
Technical Corrigendum adds a normative annex K and an informative
annex L to ISO 8879 to meet an urgent need for adaptations of SGML for
use on the World Wide Web and intranets. It incorporates by reference
the Extended Naming Rules TC.
This TC does not affect existing SGML documents or products. It affects only those SGML documents and products that choose to support the WebSGML Adaptations option.
|DATE:||1 June 1997|
|DISTRIBUTION:||WG8 and Liaisons|
|REFER TO:||ISO 8879|
|REPLY TO:||Dr. James D. Mason
(ISO/IEC JTC1/SC18/WG8 Convenor)
Oak Ridge National Laboratory
Information Management Services
Bldg. 2506, M.S. 6302, P.O. Box 2008
Oak Ridge, TN 37831-6302 U.S.A.
Telephone: +1 423 574-6973
Facsimile: +1 423 574-6983
Add the following normative annex K and informative annex L to ISO 8879.
This annex describes an optional extension of SGML known as the "WebSGML Adaptations". The extension incorporates by reference the Extended Naming Rules TC, such that a system that supports these WebSGML Adaptations also supports the Extended Naming Rules. An SGML system need not support these WebSGML Adaptations in order to be a conforming SGML system.
To distinguish SGML declarations that use the facilities of this TC from those that do not, the minimum literal in productions  and  of ISO 8879:1986 (the "version literal") must be modified to read "ISO 8879:1986 (WWW+ENR)". To accomplish this add the following sentence to the paragraph immediately following production  and to the second paragraph following production :
However, when WebSGML Adaptations are used, the minimum data must be "ISO 8879:1986 (WWW+ENR)".
An SGML parser that supports this TC shall also be able to parse documents whose version literal indicates that they do not use the facilities of this TC. The parsing of such documents must produce the same grove as would an SGML parser that does not support this TC. Validation of such documents, however, is with respect to the standard as modified by this TC and, by reference, the ENR TC.
This annex is phrased in terms of revisions to be made to the body of this International Standard. However, these revisions are applicable only when the WebSGML Adaptations are in use.
This annex makes reference to groves and property sets, which are defined in the SGML Extended Facilities of the 2d Edition of HyTime (ISO/IEC 10744) and also in DSSSL (ISO/IEC 10179).
Note: The SGML Extended Facilities are expected to be incorporated in the forthcoming revision of this International Standard.
The WebSGML Adaptations are as follows.
The following definitions are used in this Annex.
DTD declarations: Markup declarations that specify the part of a document type definition (DTD) that is expressable in SGML. They occur in the global subset and in the external and internal subsets of document type declarations.
External subset: The portion of a document type declaration subset referenced by the external identifier parameter of a document type declaration.
Internal subset: The portion of a document type declaration subset that occurs between the dso and dsc of a document type declaration.
Note: If there are no DTD declarations within the document type declaration in the SGML document entity, then, whether or not the dso and dsc are omitted, the internal subset is considered empty (rather than non-existent).
An SGML document that complies with all provisions of this International Standard.
Note: The provisions allow for choices in the use of options, features, and variant concrete syntaxes.
A conforming SGML document must be either a type-valid SGML document, a tag-valid SGML document, or both.
Note: A user may wish to enforce additional constraints on a document, such as whether a document instance is integrally-stored or free of entity references.
An SGML document in which, for each document instance, there is an associated document type declaration to whose DTD that instance conforms.
An SGML document, all of whose document instances are fully-tagged. There need not be a document type declaration associated with any of the instances.
Note: If there is a document type declaration, the instance can be parsed with or without reference to it.
A document instance in which a start-tag with a generic identifier, and an end-tag, are present for every element, and the attribute name is present in every attribute specification in the start-tag.
Note: Processing without reference to DTD declarations is possible only for a fully-tagged document instance.
An SGML system that supports unconstrained SGML documents must be able to parse DTD declarations and resolve both internal and external entity references. If it continues parsing (as a form of error-recovery) after failing to access a referenced entity, the results will be unpredictable. Observing one or more of the entity constraints defined in this International Standard may cause a document to be more amenable to processing by a simpler SGML system, or in an environment (such as a network) where entity access may be slow or unreliable.
A document instance in which every element and marked section ends in the entity in which it begins.
Note: This constraint makes it possible, as a form of error-recovery, for parsing to continue in a fully-tagged document instance after a failure to access a referenced entity. The resulting grove will be the same for the parsed text, except for the tree addresses of younger siblings of the nodes in the inaccessible entity.
An SGML document that has no entity references other than delimiter entity references, although it could have attribute values that refer to entities.
Note: A reference-free document can be parsed by systems that cannot resolve entity-references. If it is tag-valid, it can also be parsed by systems that cannot parse DTD declarations.
An SGML document that has no external entity references, although it could have attribute values that refer to external entities.
Note: External-reference-free documents can be parsed by systems that cannot resolve external entity-references.
An SGML document that contains no entity declarations.
Note: An entity-declaration-free document that contains entity references other than delimiter entity references requires IMPLYDEF ENTITY. If the document is tag-valid, it can be parsed by systems that cannot parse DTD declarations.
Groves that, except for certain permitted differences, are identical with respect to the portion of their grove plans that are included within the SGML default grove plan. The allowable differences apply to constructs that are parsed without respect to declarations. They are:
Note: The SGML default grove plan does not include any markup properties.
A general entity, associated with a character number in the syntax-reference character set, that is used to reference significant SGML characters as data.
Note: In order to allow delimiter escaping when parsing without reference to DTD declarations, data character entities should be predefined for the first character of each delimiter string that can be recognized in a mode where data can occur.
The parameters of an SGML declaration.
The characters assigned to the SEPCHAR, SPACE, RE, and RS delimiter roles.
An alternative form of the SGML declaration is permitted, known as an SGML declaration reference. It references an SGML declaration body. SD is added as a new public text class identifying an SGML declaration body.
In order for SGML documents to be self-identifying, it is strongly recommended that all conforming SGML documents contain one of the forms of SGML declaration.
Note: For example:
<!SGML PUBLIC "IDN//W3C.ORG//SD HTML Version 3.2//EN">
The SGML declaration provides a new feature, "white space in content rule" (WSCON), with the values KEEPALL or SGML1986 (the current rule), which is the default. If KEEPALL is specified, all white space in mixed content is included in the grove as datachar nodes, and all white space known to be in element content is included in the grove as ssep nodes.
Note: If KEEPALL is specified, the RS and RE ignoring rules in 7.6.1 do not apply.
Note: This feature does not affect delimited strings, such as attribute value literals, which have their own rules for normalizing white space (and which, in any case, do not occur in content).
The capacity set parameter of the SGML declaration can be specified as CAPACITY NOLIMITS, indicating that there are no capacity limits.
The quantity set parameter of the SGML declaration can be specified as QUANTITY NOLIMITS, indicating that there are no quantity limits except BSEQLEN, which has the quantity specified for it in the reference quantity set. Specifying QUANTITY NOLIMITS does not require a system to support quantities greater than those specified in its system declaration.
A new hexadecimal character reference open (HCRO) delimiter is used when a numeric character reference is represented as a hexadecimal string. It is recognized when followed by a hex digit. The hex digit begins a hex digit string of no greater than NAMELEN length, which is terminated in the same way as other numeric character references. A concrete syntax need not assign a string to this delimiter. It is recognized in the same modes as CRO.
The "check validity assertions" (VALCHECK) parameter allows a document to assert whether it is type-valid, tag-valid or both. It is a reportable error if an assertion is untrue.
Note: For example, if an otherwise conforming type-valid document incorrectly asserts that it is tag-valid, the document is non-conforming. Had the assertion not been made, the document would have been conforming.
VALCHECK, (TAGVALID?, TYPVALID?)
The "check entity constraint assertions" (ENTCHECK) parameter allows a document to assert whether it satisfies specified constraints. It is a reportable markup error if an assertion is untrue.
ENTCHECK, (INTEGRAL?, (NOREF | NOEXTREF | NOENTDEC)?)
INTEGRAL The document instance is integrally-stored.
NOREF The document is reference-free.
NOEXTREF The document is external-reference-free.
NOENTDEC The document is entity-declaration-free.
Note: For example, if an otherwise conforming SGML document incorrectly asserts that it is integrally-stored, the document is non-conforming. Had the assertion not been made, the document would have been conforming.
A new parameter is declarable in the SGML Declaration, the "Application Requirements" parameter:
SEEALSO, s ((public_identifier s)+ | NONE)
The public identifiers identify notations that specify application-specific requirements for the document, including requirements unrelated to the SGML language. These requirements are in addition to, and must not contradict, the requirements of this International Standard. Failure to satisfy application requirements is not a reportable markup error, except to the extent that such requirements are also expressed in other parameters of the SGML declaration.
Note: For example, this parameter could be used by an SGML system to signal the existence of requirements for specific entity constraint assertions, formatting conventions for specified element types, or data restrictions, such as that the number of cells in a table row does not exceed the number specified in some attribute. It is not a reportable markup error if the application's required entity assertions are not present in the SGML declaration, although if they are present, it is a reportable markup error if the the document fails to satisfy them.
It is not an error if the system is unable to access the object named by the public identifier.
NOTE: See Annex L "Application Requirements for XML" for an example.
A new delimiter role NESTC (NET-enabling start-tag close) is defined. If it is not assigned, the NET string is used. It must be used to close a start-tag if NET is to be used as the end-tag.
Note: For example, if NESTC is "/" and NET is ">", an empty "img" element with a null end-tag would look like:
If SHORTTAG NO is specified, any of the following can be specified to enable specific short tag minimization features:
means that empty start-tags can be used.
(ENDTAG, (EMPTYTAG | (NETANY | NETEMPTY) | BOTH), REQUIRED?)?
means that empty, null, or both forms of short end-tag can be used. Specifying NETEMPTY means that NESTC and NET are restricted to elements with empty content (not necessarily declared empty). If REQUIRED is specified, the indicated form(s) must be used for the applicable elements and unminimized end-tags cannot be used.
Note: If empty elements are prohibited from having end-tags, NETEMPTY effectively prohibits the use of NET.
(ATTS, (DEFAULT?, NONAME?, NOQUOTE?))?
means that any or all of the three forms of attribute minimization are permitted:
DEFAULT enables attribute value defaulting.
NONAME allows some attribute names to be omitted.
NOQUOTE allows some attribute values to be specified directly.
The syntax for ATTLIST is revised to provide the functionality of the syntax shown here:
<!ATTLIST "#NOTATION"? (name | name group | #IMPLICIT | #ALL ) attribute definition* >where:
"name" is either an element type name or a notation name, depending on whether #NOTATION is specified.
"name group" is one or more names in parentheses.
#IMPLICIT refers to all implicitly defined element types (or notation names). It is the equivalent of a name group.
"#ALL" is all element type names or notation names.
The same element type name (or notation name) may be the associated element type (or notation) in multiple ATTLIST declarations. An attempt to redefine an attribute that was previously defined for specified element types or notations is not an error; the earliest definition prevails (just as for entity declarations).
An ATTLIST declaration can have an empty list of attribute definitions.
"#ALL" can be specified as the associated element type name (or notation name) in an attribute definition list declaration. If so, the definitions are associated with all element type names (or notation names). Definitions associated with #ALL can be overridden by attribute declarations for specific element types or notations, including definitions specified with #IMPLICIT (all implicitly-defined element types or notations). An attempt to redeclare for all element types (or notations) an attribute that was previously declared for all element types (or notations) is not an error; the earliest declaration prevails (just as for entity declarations).
The current restriction on duplicate name token ("enum") values for attributes in the same set of attribute definitions is removed. Minimization by omitting the attribute name is possible only for values that occur for only one attribute in the set of attribute definitions.
The keyword "#IMPLIED" is allowed as an alternative to the document type name in a DOCTYPE declaration or the source document type name in a LINKTYPE declaration. When the keyword is used, the document type name is the element type name of the document element. For example:
<!DOCTYPE #IMPLIED SYSTEM "some.dtd"> <docelem>
It is a reportable markup error if #IMPLIED is specified and the document instance does not begin with the start-tag of the document element.
This facility is allowed when there is only one instance and this is the only doctype declaration.
Note: This facility is therefore incompatible with explicit link and concur.
When IMPLYDEF DOCTYPE is specified and there is only one document instance and no document type declarations, the document type declaration associated with the instance is assumed to be:
<!DOCTYPE #IMPLIED SYSTEM>
Note: This facility is used to imply the applicable DTD. When parsing without respect to DTD declarations, there is no need to imply an applicable DTD.
Note: This facility is incompatible with explicit link and concur.
When IMPLYDEF is specified with the keywords ELEMENT, ENTITY, %ENTITY, NOTATION, and/or ATTLIST, the corresponding constructs can be used without an explicit declaration. The implied definitions are as follows:
ELEMENT: "- - ANY" ENTITY, %ENTITY, NOTATION: "SYSTEM" ATTLIST: "CDATA #IMPLIED" for each attribute definition.
When IMPLYDEF ENTITY is specified, a default entity declaration is not permitted.
The EMPTYETG feature indicates whether elements with empty content have end-tags. The possible values are:
NO They are not permitted. (This is the case without this TC.)
YES They are permitted and are subject to markup minimization in the same way as any other end-tag.
When processing with reference to DTD declarations, it is a reportable markup error if an element that is required to be empty contains any text, including white space, other markup, or included subelements.
Internet IP domain names that contain only minimum data can be used as public text owner identifiers. To do so, the formal public identifier must begin with "IDN//domain.name".
Note: Because of different name-spelling rules, not all internet domain names can be used in this way.
Note: When constructing a public text owner identifier, users may wish to consider its potential lifespan and that of the objects to be identified by it.
The term "element declaration" in the definitions clause 4.111 is changed to "element type declaration".
All other occurrences of "element declaration" in this International Standard are changed to "element type declaration".
In production 113, optional attribute definition list declarations and notation declarations are permitted. Formally:
 entity set = ( entity declaration | attribute definition list declaration | notation declaration | ds )*where the keyword "#NOTATION" must be specified in the attribute definition list declaration.
A subdocument can contain an SGML declaration.
A new EOR ("end of declarations that are required for all purposes") indicator can be placed anywhere among the DTD declarations. Declarations occurring after the first EOR indicator can be ignored during any processing of a fully-tagged document instance other than type validation.
Indicator syntax: mdc,"EOR", mdc
It is a reportable markup error if the groves produced by parsing with and without respect to the ignorable declarations are not functionally equivalent groves.
A validating parser that fails to parse required declarations must report the possibility that the grove it produces may not be functionally equivalent.
Note: Non-validating parsers should also report this situation.
Note: The following document type declaration indicates that the external subset is not required for all purposes. The same effect could be achieved by an EOR indicator at the start of the external subset, but the form in the example makes it unnecessary to access the external subset.
<!DOCTYPE sometype SYSTEM "some.dtd" [<!EOR>]>
Note: Although the EOR indicator uses the MDO and MDC delimiters, it is not a markup declaration.
When a document is parsed without reference to some DTD declarations, their essential grove properties are inferred from the document instance as follows:
Note: These working definitions for the grove do not imply how element types and attribute definitions are actually declared (if DTD declarations exist) or should be declared (if declarations are to be created later).
When the predefined data character entities feature is used, a general entity name can be associated with a character number in the syntax-reference character set. When referenced, the replacement text is a numeric character reference to the corresponding character. Predefined data character entities are treated as though defined at the start of the internal subset of all documents in which the concrete syntax is used.
Note: For example:
ENTITIES "amp" 38 "lt" 60 "gt" 62 "quot" 34 "apos" 39
The following sub-parameter is added to the validation services parameter of the SYSTEM declaration to indicate whether the parser can validate whether parsing a document with and without respect to declarations occurring after the <!EOR> indicator produces functionally equivalent groves.
FUNGROVE (NO | YES)
The following sub-parameter is added to the validation services parameter of the SYSTEM declaration to indicate whether the parser can validate for type-validity, tag-validity, or both:
VALCHECK (NO | TYPVALID | TAGVALID | BOTH)
A new parameter, "entity constraint checking services" (ENTCHECK) is added to the SYSTEM declaration to indicate the kinds of entity constraint checking that a parser can perform. The keywords for the constraints are:
INTEGRAL The document instance is integrally-stored.
NOREF The document is reference-free.
NOEXTREF The document is external-reference-free.
NOENTDEC The document is entity-declaration-free.
The SGML declaration parameters described in this Annex are defined as follows.
The system declaration parameters described in this Annex are defined as follows.
These application requirements have the formal public identifier "ISO 8879//NOTATION Application Requirements for XML//EN".
This is version 1.0 of this annex.
Note: Pointers to revised versions of this annex, and other SGML-related information that may change over time, can be found at the International SGML User's Group web site at URL "www.isug.org". The SGML Users' Group is an international non-profit membership organization, chartered as an educational charity in the United Kingdom, and is a liaison member of the ISO/IEC subcommittee that developed SGML. Nevertheless, the documents that it distributes have not been subject to ISO/IEC review procedures, have no official status, and are not endorsed by the ISO or IEC or any of its national member bodies or affiliates.
The Extensible Markup Language (XML) is an application profile of SGML, developed for exchanging SGML documents over the World Wide Web. It is defined in URL "http://www.w3.org/WWW/TR/WD-xml-lang-970331.html".
The XML specification covers the following aspects of an SGML application and system:
XML distinguishes two classes of conforming documents. Its "valid" documents are type-valid conforming SGML documents. When the WebSGML Adaptations are in use, XML "well-formed" documents are tag-valid conforming SGML documents.
A full-SGML validating parser cannot presently validate SGML documents for conformance to XML unless it is specially modified to support XML. That is because some of XML's language restrictions cannot be expressed in the SGML declaration, even when the WebSGML Adaptations are in use.
The remainder of this annex describes only the XML restrictions on the SGML language. The XML specification should be consulted for a description of the other application requirements.
The following list describes language requirements of XML beyond those of full SGML with WebSGML Adaptations support (that is, ISO 8879 as modified by Annexes J and K). The list is believed to be complete as of publication of this annex.
XML requires a specific SGML declaration. It defines the XML character set, concrete syntax, and the SGML features and options that are supported or expressly prohibited.
Explicit SGML declaration not permitted.
Document instances must be integrally-stored.
Integral storage does not require a marked section to start and end in the same entity.
In well-formed XML documents, attribute values cannot contain external entity references.
An entity must use a single character encoding throughout.
SUBDOC, LINK, CONCUR, and markup minimization are not supported, except for attribute value defaulting.
Does not use short references.
Requires a particular document character set.
Requires a particular concrete syntax.
For elements with empty content, whether or not declared empty, the start-tag must be closed with the NETSC delimiter and the end-tag, which is required, must be a NET delimiter.
A comment declaration consists of exactly one delimited comment.
Reference end is restricted to REFC It is required in order to terminate a reference.
Named character references are not supported.
A marked section declaration is restricted to a single status keyword: CDATA. It must immediately follow the first DSO and must immediately be followed by the second DSO. It cannot be specified via a parameter entity reference.
Entity references can refer only to SGML text entities (not data or subdocument entities)
Comments are not allowed in parameter separators of DTD declarations.
Parameter entity references permitted only in restricted locations with restricted replacement text.
No public identifiers in external identifiers (affects
A marked section declaration is restricted to a single status keyword, either INCLUDE or IGNORE. It must immediately follow the first DSO and must immediately be followed by the second DSO. It can be specified via a parameter entity reference.
No name groups for declaring multiple element types.
No exclusions or inclusions in content models.
No minimization parameters.
Mixed content models must be optional-repeatable
No name groups for making a single
Attribute default values must be quoted.
No data attributes for
No attribute value specifications on
An entity must be stored completely within a single storage object
A parser must support external entity reference resolution "on demand", except when validating.