SGML: SDATA Entities in CDATA attribute value literals

SGML: SDATA Entities in CDATA attribute value literals


Subject: Re: SDATA Entities in CDATA attribute value literals (was: SGML Property Set makes the distinction)
Date: 14 Dec 1996 06:52:39 -0500
From: David Megginson <dmeggins@uottawa.ca>
Newsgroup: comp.text.sgml
kward@corel.com (Kerry Ward) writes: > So, my next question is, aren't sgmls and nsgmls invalid in their > distinction of the attribute value. Is this truly a bug? Has it > been recognized that this is a serious problem with the standard and > the existing parsers have conveniently and purposefully bypassed the > standard on this one? Shouldn't SGML parsers be expanding the > attribute value literal to the attribute value and then passing the > attribute value to the application? I recognize the implications of > character set size if the SDATA entities were to be unrecognizable > in CDATA attribute values. I am not suggesting that SDATA entities > not be used in CDATA attributes. What I think is happening is that > the standard in this instance is impractical, and the practical > solution (interpreting CDATA attributes as RCDATA) is accepted, even > though it violates the standard. No, nsgmls and sgmls are correct to recognise entity references in attribute literals. Here is the production for the attribute value literal (with the accompanying clarification): [34] attribute value literal = ( LIT, replaceable character data*, LIT ) | ( LITA replaceable character data*, LITA ) An attribute value literal is interpreted as an attribute value by replacing references within it, ignoring Ee and RS, and replacing and RE or SEPCHAR with a SPACE. and here is the production for replaceable character data: [46] replaceable character data = ( data character | character reference | general entity reference | Ee )* In other words, the application is _required_ to perform certain transformations on an attribute value literal, no matter what the declared attribute type -- not only will it expand entity and character references, but it will normalise whitespace. The exact method of distinguishing SDATA is up to the application -- the ESIS output of (n)sgmls does it with an escape sequence, while the generic interface to SP does it with an array of Attribute::CdataChunk structures. There is no bug in the standard here, and James Clark has not implemented any work-arounds; I do agree, though, that supporting this feature is particularly challenging for programmers. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com University of Ottawa dmeggins@uottawa.ca http://www.uottawa.ca/~dmeggins ------------------------------------------------------------------------- Subject: Re: SDATA Entities in CDATA attribute value literals (was: SGML Property Set makes the distinction)
Date: Tue, 17 Dec 1996 12:27:47 GMT
From: Charles@SGMLsource.com (Charles F. Goldfarb)
Newsgroup: comp.text.sgml
On 14 Dec 1996 06:52:39 -0500, David Megginson <dmeggins@uottawa.ca> wrote: >In other words, the application is _required_ to perform certain >transformations on an attribute value literal, no matter what the >declared attribute type -- not only will it expand entity and >character references, but it will normalise whitespace. David is exactly right on this point, an admittedly confusing one. The delimited attribute value literal that you specify in the tag is *not* necessarily the same as the undelimited attribute value, even when there is no way to enter the attribute value directly. Every so often someone points out an "error" in the standard -- that the productions for token list attribute values only allow a single space between tokens. The attribute value literal that you specify in a tag can have many spaces between tokens; they are normalized to a single space in the attribute value. Perhaps the declared value should have been PCDATA instead of CDATA to indicate (as in data content) that the attribute value is parsed character data. -- Charles F. Goldfarb * Information Management Consulting * +1(408)867-5553 13075 Paramount Drive * Saratoga CA 95070 * USA International Standards Editor * ISO 8879 SGML * ISO/IEC 10744 HyTime Prentice-Hall Series Editor * CFG Series on Open Information Management --