Newsgroups: comp.text.sgml Date: 28 Apr 1992 14:15:55 UT From: C. M. Sperberg-McQueen <U35395@uicvm.uic.edu> Organization: University of Illinois at Chicago Message-ID: <92119.091555U35395@uicvm.uic.edu> References: <9204271901.AA06192@ucbvax.Berkeley.EDU> Subject: Re: Tags vs Attributes
[. . .]
Ah, the old attribute-versus-tag question. The TEI's metalanguage committee went round and round on this for a long time and neither those who wanted to ban all attributes (except for IDs and IDREFs) nor those who wanted to all their unrestricted use were ever able to persuade each other. (We just put it to a vote and stopped talking about it.) Also, no one was ever able to provide a really clear universal rule for deciding when to use an attribute and when to use a tag (usually an embedded tag).
My personal rules of thumb, though, include these:
use an embedded element when the information you are recording is a constituent part of the parent element
use an attribute when the information is inherent to the parent but not a constituent part (one's head and one's height are both inherent to a human being, i.e. you can't be a conventionally structured human being without having a head, and having a height, but one's head is a constituent part and one's height isn't -- you can cut off my head, but not my height)
use attributes to stress the one-to-one relationship among pieces of information, i.e., to stress that the element represents a tuple of information (dangerous rule, though: leads to the extreme formulation that a \<chapter> element can have a TITLE= attribute, and then to the conclusion that it really ought to have a CONTENT= attribute too, and then you find yourself writing the entire document as an empty element with one hell of a long attribute value ...)
use attributes for simple datatype validation (obviously)
use embedded elements for complex structure validation (equally obvious)
(very tentative) use attributes for things which will not produce ink on a page in a conventional printout of the text (such as the language identifier to show what language an element's content is in) -- but do not necessarily avoid using attributes just because they might produce ink on the page (such as the target of a cross reference). I think this is another version of Eliot Kimber's instinct to use attributes for information about the element, since such information usually doesn't produce ink on the page ...
These are not failsafe, and I'm sure we can think of cases where they conflict, which is why I call them rules of thumb. Long disputes over whether attributes actually provided a richer language than would be available without them, however, eventually persuaded me that any markup language with attributes could be translated pretty much mechanically into one with only ID and IDREF attributes, and so I personally decided it wasn't such a fateful decision after all, but one you could back out of later if you really needed to. (But since going from attributes to embedded elements is clearly pretty simple and going the other way is much less simple, in the general case, the upshot is that in case of doubt I personally tend to lean toward attributes rather than elements.)
C. M. Sperberg-McQueen
ACH / ACL / ALLC Text Encoding Initiative
University of Illinois at Chicago