SGML: SDATA Entities in CDATA attribute value literals
Subject: Re: SDATA Entities in CDATA attribute value literals (was: SGML Property Set makes the distinction)
Date: 14 Dec 1996 06:52:39 -0500
From: David Megginson <dmeggins@uottawa.ca>
Newsgroup: comp.text.sgml
kward@corel.com (Kerry Ward) writes:
> So, my next question is, aren't sgmls and nsgmls invalid in their
> distinction of the attribute value. Is this truly a bug? Has it
> been recognized that this is a serious problem with the standard and
> the existing parsers have conveniently and purposefully bypassed the
> standard on this one? Shouldn't SGML parsers be expanding the
> attribute value literal to the attribute value and then passing the
> attribute value to the application? I recognize the implications of
> character set size if the SDATA entities were to be unrecognizable
> in CDATA attribute values. I am not suggesting that SDATA entities
> not be used in CDATA attributes. What I think is happening is that
> the standard in this instance is impractical, and the practical
> solution (interpreting CDATA attributes as RCDATA) is accepted, even
> though it violates the standard.
No, nsgmls and sgmls are correct to recognise entity references in
attribute literals. Here is the production for the attribute value
literal (with the accompanying clarification):
[34] attribute value literal =
( LIT,
replaceable character data*,
LIT ) |
( LITA
replaceable character data*,
LITA )
An attribute value literal is interpreted as an attribute value by
replacing references within it, ignoring Ee and RS, and replacing and
RE or SEPCHAR with a SPACE.
and here is the production for replaceable character data:
[46] replaceable character data =
( data character |
character reference |
general entity reference |
Ee )*
In other words, the application is _required_ to perform certain
transformations on an attribute value literal, no matter what the
declared attribute type -- not only will it expand entity and
character references, but it will normalise whitespace.
The exact method of distinguishing SDATA is up to the application --
the ESIS output of (n)sgmls does it with an escape sequence, while the
generic interface to SP does it with an array of Attribute::CdataChunk
structures. There is no bug in the standard here, and James Clark has
not implemented any work-arounds; I do agree, though, that supporting
this feature is particularly challenging for programmers.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
University of Ottawa dmeggins@uottawa.ca
http://www.uottawa.ca/~dmeggins
-------------------------------------------------------------------------
Subject: Re: SDATA Entities in CDATA attribute value literals (was: SGML Property Set makes the distinction)
Date: Tue, 17 Dec 1996 12:27:47 GMT
From: Charles@SGMLsource.com (Charles F. Goldfarb)
Newsgroup: comp.text.sgml
On 14 Dec 1996 06:52:39 -0500, David Megginson <dmeggins@uottawa.ca> wrote:
>In other words, the application is _required_ to perform certain
>transformations on an attribute value literal, no matter what the
>declared attribute type -- not only will it expand entity and
>character references, but it will normalise whitespace.
David is exactly right on this point, an admittedly confusing one. The delimited
attribute value literal that you specify in the tag is *not* necessarily the
same as the undelimited attribute value, even when there is no way to enter the
attribute value directly.
Every so often someone points out an "error" in the standard -- that the
productions for token list attribute values only allow a single space between
tokens. The attribute value literal that you specify in a tag can have many
spaces between tokens; they are normalized to a single space in the attribute
value.
Perhaps the declared value should have been PCDATA instead of CDATA to indicate
(as in data content) that the attribute value is parsed character data.
--
Charles F. Goldfarb * Information Management Consulting * +1(408)867-5553
13075 Paramount Drive * Saratoga CA 95070 * USA
International Standards Editor * ISO 8879 SGML * ISO/IEC 10744 HyTime
Prentice-Hall Series Editor * CFG Series on Open Information Management
--