[Mirrored from: http://www.uic.edu/~cmsmcq/misc/attributes.and.entities.html]

Sic et Non: on Entity References in Attribute Values

(with apologies to Thomas Aquinas)


C. M. Sperberg-McQueen

11 June 1992

1 Question: whether entity references are recognized in attribute values?

1.1 It would seem the answer is NO, because

1 Either the attribute has a declared value of CDATA, or it has a declared value of ENTITY, ENTITIES, ID, IDREF, IDREFS, NAME, NAMES, NMTOKEN, NMTOKENS, NUMBER, NUMBERS, NUTOKEN, NUTOKENS, NOTATION, or a name group.

2 If the attribute's declared value is not CDATA, it would appear entity references cannot be recognized, because:

21 The tokens in these types of values are defined as having the lexical type NAME.

22 Clause 9.3, production 55 defines NAME as a series of name start characters and name characters, and does not define them as containing entity references.

23 Therefore, 2 is correct: entity references cannot be recognized in values of attributes with declared value other than CDATA.

3 If the attribute's declared value is CDATA, it would appear entity references cannot be recognized, because:

31 Clause 11.3.3 says that if the declared value of an attribute is CDATA, "the attribute value is character data".

32 Entity references appear not to be recognized in character data.

321 Markup is not recognized in character data.

3211 Clause 9.2 defines character data as a sequence of data characters. It opposes it, in this way, to replaceable character data, defined in 9.1 as containing data characters, character references, general entity references, and entity end signals.

3212 Goldfarb, in his commentary on 9.2 (p. 344 of Handbook) says "no markup will be recognized in character data other than the delimiters that would terminate the character data."

3213 Clause 4.33 defines character data as "Zero or more characters that occur in a context in which no markup is recognized, other than the delimiters that end the character data. Such characters are classified as data characters because they were declared to be so."

3214 Therefore, 321 is correct: Markup is not recognized in character data.

322 Entity references are markup.

3221 The note to clause 4.183 identifies references as being one kind of markup.

3222 Clause 4.144 defines a general entity reference as a named entity reference to a general entity.

3223 Clause 4.205 defines a named entity reference as an entity reference.

3224 Clause 4.124 defines an entity reference as a reference.

3225 Clause 4.256 defines a reference as "Markup that is replaced by other text ..."

3226 Therefore, 322 is correct: entity references are markup.

323 Therefore, 32 is correct: entity references are not recognized in character data.

33 Therefore, 3 is correct: entity references cannot be recognized in values of attributes with declared value of CDATA.

4 No matter what the declared value of the attribute, entity references cannot be recognized within its value, because:

41 Clause 7.9.3 (note after production 34) says "Interpretation of an attribute value literal occurs as though the attribute were character data, regardless of its actual declared value."

42 Entity references are not recognized within character data (see statement 32 above).

1.2 It would seem the answer is YES, because

1 Clause 7.9.3 defines attribute value specification as either attribute value or attribute value literal.

2 If the attribute value is specified as an attribute value literal, entity references are recognized, because:

21 Clause 7.9.3 says "An attribute value literal is interpreted as an attribute value by replacing references within it, ignoring Ee and RS, and replacing an RE or SEPCHAR with a SPACE."

22 Clause 4.17 defines attribute value literal as "A delimited character string that is interpreted as an attribute value by replacing references and ignoring or translating function characters."

1.3 About the arguments against the proposition, we may say:

1 All statements made are correct, but apply only to the attribute value supplied or derived for the attribute.

2 Clause 7.9.3 specifies that attribute values may be specified either directly, or as attribute value literals. A value specified as an attribute value literal is processed by the parser into a determinate attribute value.

3 The specification that attribute values are treated as character data (rather than replaceable character data) therefore applies only to the end product of the processing specified in 7.9.3, not to the attribute value literal possibly provided in the document instance.

2 Conclusions

1 The current formulations of 8879 do not in fact require entity references to be unrecognized in attribute value specifications.

2 They do however require a Talmudic or Jesuitical process to unravel, in order to establish that fact.

3 The revision of ISO 8879 should eliminate the misleading use of the term attribute value, either by reformulating all the sections on declared value specifications and attribute value specifications, or by introducing a suitably unambiguous term such as internal or processed attribute value.