[Mirrored from: http://www.ornl.gov/sgml/wg8/document/1875.htm]
TITLE: | U.S. Contribution on SGML Review |
SOURCE: | X3V1 |
PROJECT: | JTC1.18.15.1 |
PROJECT EDITOR: | Charles F. Goldfarb |
STATUS: | National Body Contribution |
ACTION: | For information |
DATE: | 11 November 1996 |
DISTRIBUTION: | WG8 and Liaisons |
REFER TO: | |
REPLY TO: | Dr. James David Mason
(ISO/IEC JTC1/SC18/WG8 Convenor) Lockheed Martin Energy Systems Information Management Services 1060 Commerce Park, M.S. 6480 Oak Ridge, TN 37831-6480 U.S.A. Telephone: +1 423 574-6973 Facsimile: +1 423 574-0004 Network: masonjd@ornl.gov http://www.ornl.gov/sgml/wg8/wg8home.htm ftp://ftp.ornl.gov/pub/sgml/wg8/ |
At last month's meeting, the US national body drafted the following submission. One issue for further discussion is whether some of the new controls it proposes should be declared in the SGML declaration, the DTD, or both. Another is, if the proposal in 3c below is accepter, whether the cro delimiter should be recognized (but erroneous) within parameter literals in the concrete syntax parameter of the SGML declaration.
The US national body recommends that the following be considered during the ongoing review of ISO 8879:
<!ELEMENT a - - (b, #PCDATA)> ... <a> <b>are erroneous because the RE after the <a>, even though it would be ignored if legal, is data that appears in a context where data is not permitted. The recommended change is to determine whether to ignore an RE before matching content to the relevant model group. If an RE is ignored, it will not be matched to the content model.
<!ELEMENT a - - (b, #PCDATA)> ... <a> <b>...(which is nonconforming because of the space between the two start-tags), would become conforming with that space ignored. However, the similar fragment:
<!ELEMENT a - - (b|#PCDATA)> ... <a> <b>...would remain in error, because the space would determine that the #PCDATA branch of the or-group was taken.
preserve RE = mdo, "REPRES", ps+, (element type | (rni, ("ELEMENT" | "PI" | "COMMENT"))), ps+, ("INCLUDE" | "PROPER")?, ps+, ("YES" | "NO")?, ps*, mdcIn clause, 7.6.1, "An RE that does not immediately follow an RS or RE is ignored if no data or proper subelement intervened" should be changed to "An RE that does not immediately follow an RS or RE is ignored if no data or RE preserving subelement intervened." Also, "An RE is deemed to occur ... after any ... included subelement" should be changed to "An RE is deemed to occur ... after any ... non-RE preserving subelement." A generic identifier appears in the element type to indicate that the declaration applies to elements of the specified type. #ELEMENT indicates that the declaration applies to all elements for which no explicit declaration is present. #PI means it applies to processing instructions, #COMMENT to comments. If specified, INCLUDE means the declaration applies only to inclusions of one of the specified types; PROPER means it applies only to proper subelements. INCLUDE and PROPER cannot be specified for #PI or #COMMENT. YES means the element, processing instruction, or comment is RE preserving; NO means it is not.
<!REPRES #ELEMENT INCLUDE NO> <!REPRES #ELEMENT PROPER YES> <!REPRES #PI NO> <!REPRES #COMMENT NO>[produce] the same behavior as the current standard.
record break = mdo, "RECORDBK", (element type | (rni, "ELEMENT")), ps+, ("PROCESS" | "IGNORE"), ps+, ("PROCESS" | "IGNORE"), ps+, ("PROCESS" | "IGNORE"), ps+, ("PROCESS" | "IGNORE"), ps*, mdcThe declaration pertains to all elements whose generic identifier is listed in the element type; if #ELEMENT is specified it applies to all elements. The four PROCESS or IGNORE keywords pertain in order to the last RE preceding an element, first RE in the element, last RE in the element, and first RE following an element provided no data or RE-preserving element intervened. Current behavior is achieved with
<!RECORDBK #ELEMENT PROCESS IGNORE IGNORE PROCESS>
D#412 95/612 O#777 H#abcd/ffffHere, D, O, and H respectively indicate that the following number is decimal, octal, or hexadecimal. / separates bytes: they are assumed to be 8-bit bytes, but a new parameter in the SGML declaration can specify the byte size. The # and / characters are new delimiters (cns and cnd for character number start and character number delimiter) recognized only within character numbers. A numeral within a character reference is terminated by a character not allowed within numerals of the appropriate base: 8 and 9 ends an octal numeral, any letter not permitted in a hexadecimal numeral terminates such a numeral.
&#D#412; _/612; &#O#777; &#H#abcd/ffff;
auxiliary character set = mdo, "CHARSET", ps+, name, ps+, external identifier, ps+, number, (ps+, number)?, ps*, mdccan appear in a document type declaration. This declaration indicates that the specified name identifies a character set to be used in character references. The first number is the width of this additional character set and the second number, if present, is the byte size. This declaration enables character references such as
&#/cs/365; &#/cs/charname; &#/cs/H#abcd/8b7d;where / is the cnd defined above, "cs" is the name of a declared auxiliary character set and "charname" is a name defined in the character set. The parser passes the external identifier of the character set and the name or number to the application; it is up to the application to determine if the name is meaningful.
Lynne A. Price
Text Structure Consulting
lprice@ix.netcom.com