A collection of postings on the topic of "SGML Inclusion Exceptions", from December 1994 on CTS (Usenet News COMP.TEXT.SGML).
Subject: Reply: SGMLS bug with inline elements * Submitted to: COMP.TEXT.SGML * Submitted by: Tim Bray (a07893@giant.rsoft.bc.ca ) * Date Of Submission: 21 Dec 1994 02:35:51 UT * Lines: 26 _________________________________________________________________ Organization: MIND LINK! Communications Corp., Langley, BC, Canada Newsgroups: comp.text.sgml Reference: Bernhard Weichel Erik Naggum CTS archive link: here _________________________________________________________________ [Erik Naggum] | don't use inclusion exceptions. if you think you have a reason to use | inclusion exceptions, you don't. if you still think ... But there's this problem. SGML is supposed to be used to model documents. SGML is *good* at this; better than the competition, anyhow. One of the things I find in documents, and I'd like to model, are index-point entries. On the face of it, it seems like inclusions are exactly the tool needed to model this correctly with little constraint on the authors who are trying to put in the HUGE value-add conferred by a good index. And the "index point" concept also subsumes lots of other things like glossary entries and, uh, hypertext links to anywhere; what one might call, generically, in-line metadata. OK, I can work around things and kludge this kind of stuff into an exclusion-free model, but then I get something that's hard to explain to people. Which 9 times out of 10 means it's wrong. Erik, if you're going to make progress on this one I think you need some better examples of why inclusions are hideously evil, cause cancer, and imperil our quest for truth, justice, and the vendor-neutral way... Cheers, Tim Bray, Open Text (tbray@opentext.com) _________________________________________________________________ Back To Complete Subject Listing on remote site. Archive last updated dd. 02/04/96 Suggestions to the compiler. Subject: Reply: SGMLS bug with inline elements * Submitted to: COMP.TEXT.SGML * Submitted by: Robert Streich (streich@austin.sar.slb.com ) * Date Of Submission: 21 Dec 1994 07:22:48 UT * Lines: 56 _________________________________________________________________ Organization: Schlumberger Laboratory for Computer Science Newsgroups: comp.text.sgml Reference: Bernhard Weichel Erik Naggum Tim Bray CTS archive link: here _________________________________________________________________ [Erik Naggum] | don't use inclusion exceptions. if you think you have a reason to use | inclusion exceptions, you don't. if you still think ... [Tim Bray] | But there's this problem. SGML is supposed to be used to model | documents. SGML is *good* at this; better than the competition, | anyhow. One of the things I find in documents, and I'd like to model, | are index-point entries. On the face of it, it seems like inclusions | are exactly the tool needed to model this correctly with little | constraint on the authors who are trying to put in the HUGE value-add | conferred by a good index. I'm not sure I understand this, Tim. How does an inclusion make for a good model? And why would this have any impact on the authors? Inclusions are nothing more than a convenience for the DTD designer as far as I can tell. Authors (usually) won't know, probably don't care, and may not even understand the difference between a proper subelement and an included one. | OK, I can work around things and kludge this kind of stuff into an | exclusion-free model, but then I get something that's hard to explain | to people. Which 9 times out of 10 means it's wrong. I see this the other way around: inclusions are a kludge. I also see them as terribly problematic when it comes to maintaining a complex DTD over time. It's one of those things that I just feel in my guts will come back to haunt you -- they'll show up where they're least expected and entirely unwanted. There is only one place where I've used an inclusion as a very temporary kludge: as a container into which unmapped markup could be placed during a conversion. This is the *one* case I can think of that they are useful, only because their position in the data stream can be important. In this case, these inclusions only remain long enough for someone to go in and clean them up, making any corrections necessary for the markup that wasn't mapped. I agree wholeheartedly with the emphasis that you place on an index, but I'm not conviced that the elements that comprise the indices need to be inline. The clutter they create and the problems they cause in dynamic documents can be a nasty problem. But this is a separate issue.... | Erik, if you're going to make progress on this one I think you need | some better examples of why inclusions are hideously evil, cause | cancer, and imperil our quest for truth, justice, and the vendor- | neutral way... We've seen the discussions of record end handling with respect to inclusions before, Tim. How about you post a situation in which they are a necessity? Robert Streich Schlumberger Austin Research _________________________________________________________________ Back To Complete Subject Listing on remote site. Archive last updated dd. 02/04/96 Suggestions to the compiler. Subject: Reply: SGMLS bug with inline elements * Submitted to: COMP.TEXT.SGML * Submitted by: Erik Naggum (erik@naggum.no ) * Date Of Submission: 21 Dec 1994 09:43:29 UT * Lines: 100 _________________________________________________________________ Organization: Naggum Software; +47 2295 0313 Newsgroups: comp.text.sgml Reference: Bernhard Weichel Erik Naggum Tim Bray CTS archive link: here _________________________________________________________________ [Tim Bray] | But there's this problem. SGML is supposed to be used to model | documents. SGML is *good* at this; better than the competition, | anyhow. One of the things I find in documents, and I'd like to model, | are index-point entries. On the face of it, it seems like inclusions | are exactly the tool needed to model this correctly with little | constraint on the authors who are trying to put in the HUGE value-add | conferred by a good index. yes, that's the standard argument for inclusion. there are two fairly standard counter-arguments: 1. index entries aren't really part of the data, but is a separate document that ties into a source document. we note that index entries have a form that is different from the actual word or words in context. we note that indexes are not in general points in the source document, but areas or ranges that frequently span larger elements, frequently resulting in index entries spanning pages. we also note that index entries have a structure to themselves. finally, we note that indexes are often not created by the author, who may not want to see the source document changed in the process of creating an index. an inclusion is inappropriate for these concerns, and a mechanism to point into a document should be used instead. HyTime provides some ideas for how this might be accomplished (modulo the HyTime syntax). 2. index entries aren't really allowed everywhere if they are in the document at all. i.e., index entries occur in data content, but not in element content. this means that an application will have to do much checking to ensure that index entries occur in reasonable places. this means that we have to weigh the cost of application checking and application code against a (possibly) more complex content model. a misplaced included subelement will not be detected by the parser, and is often overlooked by the application writer who only considers the "usual cases", or has been told that he will received validated data from the SGML parser. inclusion exceptions defeat this purpose. | And the "index point" concept also subsumes lots of other things like | glossary entries and, uh, hypertext links to anywhere; what one might | call, generically, in-line metadata. this is the other problem that urgently needs solving, and which, when solved, will obviate the need for inclusions. for all its failings, HyTime does provide ways and means to address into other documents. in a world of interconnected information, it doesn't make much sense to want to stuff all the information into one document. (hmmm, too vague.) of all the possible things that we would like to include in a document, or say about some piece of information, how do we decide which things are more important? or do we make sure we enable locating information by any number of means? I believe this was what prompted HyTime's intricate and powerful location addressing. | OK, I can work around things and kludge this kind of stuff into an | exclusion-free model, but then I get something that's hard to explain | to people. Which 9 times out of 10 means it's wrong. if so, then an included element will cause kludgy solutions that 9 times out of 10 will be wrong in the application code that implements the semantics of the DTD, or have I missed something? seriously, I think the extra work that included elements cause, either in specifying kludgy content models or hand-waving over kludgy application implementation, show that alternative solutions should be sought. I think pointing into the information from other documents is the way to go. then it is also possible to point to any level in the structure, not only to the leaves. as with tables, we should not primarily be concerned with how things will finally appear in print. after all, a database index lives a separate life from the data, although intimately connected to it. | Erik, if you're going to make progress on this one I think you need | some better examples of why inclusions are hideously evil, cause | cancer, and imperil our quest for truth, justice, and the vendor- | neutral way... I have not yet determined whether they cause cancer, but the rest is right on the mark. hey, wait! they _do_ cause cancer: in the application code, in the documentation and finally it spreads to the users who won't understand why their line breaks don't work. in conclusion: there are obvious and not-so-obvious reasons why inclusions are bad. when discussing why inclusions are bad, the not-so-obvious reasons are invariably challenged. I think we should make a first approximation to rooting out this devil by first taking care of the obvious reasons: you should not use inclusions for data that is obviously part of the contents (like emphasized text), you should not use inclusions for "floating" elements (figures) that will have to be moved, anyway, and you should not use inclusions to "fix" what appears to be broken, but instead take it back to the shop and get a proper repair. finally, the inclusion exception is a cheap way to do something complex in the document that doesn't look like it. the programmer that has to make your complex stuff work the way you intend (which will be harder to express if you use inclusions) will not love you. that means broken code, or delays, and you will get applications that do things that your users will come to rely on even if it's actually broken. (pretty good to pin all this down on inclusions, don't you think?) #<Erik> -- requiescat in pace: Erik Jarve (1944-1994) _________________________________________________________________ Back To Complete Subject Listing on remote site. Archive last updated dd. 02/04/96 Suggestions to the compiler. Subject: Reply: SGMLS bug with inline elements * Submitted to: COMP.TEXT.SGML * Submitted by: David Megginson (dmeggins@aix1.uottawa.ca ) * Date Of Submission: 21 Dec 1994 12:59:11 UT * Lines: 32 _________________________________________________________________ Organization: Department of English, University of Ottawa Newsgroups: comp.text.sgml Reference: Bernhard Weichel Erik Naggum Tim Bray CTS archive link: here _________________________________________________________________ I would like to agree whole-heartedly with Erik about inclusion exceptions. They were a mistake in the standard, and are never necessary for DTD design. A well-designed DTD should use parameter entities as data classes, allowing for inheritance when necessary, as in this grossly-oversimplied example: <!-- Data classes --> <!ENTITY % indexing "index | link"> <!ENTITY % emphasis "em | foreign"> <!ENTITY % citation "title"> <!ENTITY % phrasal "%indexing | %emphasis | %citation | #PCDATA"> <!-- Elements --> [...] <!ELEMENT foo - - (%phrasal)+> In other words, "#PCDATA" should _always_ appear in a parameter entity, and should never be hard-coded into an element definition. Using this system, changes to element content are incredibly simple, the DTD is more self-explaining (ie. "this is an element which contains phrasal data" instead of "this is an element which contains PCDATA and anything else which happens to be included at this point in the document"), and inclusion exceptions are absolutely useless. David -- David Megginson Department of English, University of Ottawa, dmeggins@aix1.uottawa.ca Ottawa, Ontario, CANADA K1N 6N5 dmeggins@acadvm1.uottawa.ca Phone: +1 613 564 6850 (Office) ak117@freenet.carleton.ca +1 613 564 9175 (FAX) _________________________________________________________________ Back To Complete Subject Listing on remote site. Archive last updated dd. 02/04/96 Suggestions to the compiler. Subject: Reply: SGMLS bug with inline elements * Submitted to: COMP.TEXT.SGML * Submitted by: Jeffrey McArthur (j_mcarthur@BIX.com ) * Date Of Submission: 23 Dec 1994 00:22:51 UT * Lines: 21 _________________________________________________________________ Organization: ATLIS Publishing Newsgroups: comp.text.sgml Reference: Erik Naggum CTS archive link: here _________________________________________________________________ [Erik Naggum] | I think you mean "inclusion exception". if so, it's not a bug with | SGMLS, but in your DTD. don't use inclusion exceptions. if you think | you have a reason to use inclusion exceptions, you don't. I love it. Inclusion exceptions have personally caused me a lot of grief. But I thought the problem was unique to me. Hmm, is it possible to get inclusion exceptions removed from the next version of SGML? -- Jeffrey M\kern-.05em\raise.5ex\hbox{\b c}\kern-.05emArthur a.k.a. Jeffrey McArthur email: j_mcarthur@bix.com work: +1 301 306 5188 home: +1 410 290 6935 The opinions express are mine. They do not reflect the opinions of my employer. My access to the Internet is not paid for by my employer. _________________________________________________________________ Back To Complete Subject Listing on remote site. Archive last updated dd. 02/04/96 Suggestions to the compiler. Subject: Inclusion exceptions (was: SGMLS bug (wasn't) with inline elements) * Submitted to: COMP.TEXT.SGML * Submitted by: Joe English (jenglish@crl.com ) * Date Of Submission: 27 Dec 1994 22:02:26 UT * Lines: 55 _________________________________________________________________ Organization: Helpless people on subway trains Newsgroups: comp.text.sgml Reference: Bernhard Weichel Erik Naggum CTS archive link: here _________________________________________________________________ [Erik Naggum | now, the reason this [inclusion exceptions] hits you so hard is that | the rules for when record ends ("line breaks") are significant or | ignored are such that included subelements are not counted when | determining whether to ignore the last record end before an end-tag. | but to make this work would require that you pass over all the included | subelements before you could decide. therefore, the rule was | instigated that record ends occur after any following included | subelements. Aren't there cases where you would want this behavior though? I can think of a few: suppose you wanted to insert elements marking revised sections of a document with something like: <!ELEMENT CHANGED - O EMPTY> <!ATTLIST CHANGED -- (id|idref) -- ID ID #IMPLIED REFID IDREF #IMPLIED -- reftype NAMES #FIXED "refid change" -- > (<CHANGE id=xyzzy001> would mark the beginning of a change, and <CHANGE refid=xyzzy001> would mark the end.) Revisions can appear almost anywhere, and might span just about anything, so it makes sense to make CHANGE an inclusion exception on some high-level document element. Or suppose you wanted to insert editorial comments in a document with something like <!ELEMENT EDNOTE - - (#PCDATA)> ... <p>Blah blah blah <ednote>too many "blah"s here</ednote> blah blah . <CHANGE> might be better handled with processing instructions or HyTime ilinks, but <EDNOTE> has more compelling reasons to be an included element, rather than a comment (so structure-controlled applications can see them) or an ilink (so authors can see them). Since <CHANGE> and <EDNOTE> elements aren't part of the document proper, the weird rules for where the record-ends go might be appropriate. There must be *some* rational explanation for them... Personally, I think Goldfarb's first law of text processing [*] has become a self-fulfilling prophesy. --Joe English jenglish@crl.com [*] That if a text processor has bugs, at least one of them has to do with input line endings, cited as the rationale for clause 7.6.1, "Record Boundaries". [rcc annotation: see below] _________________________________________________________________ Back To Complete Subject Listing on remote site. Archive last updated dd. 02/04/96 Suggestions to the compiler. --------------------------------
Goldfarb's first law - "inexorable operation of Goldfarb's first law of text processing, which states that if a text processor has bugs, at least one of them will have to do with the handling of input line endings" — see The SGML Handbook, page 321, note #2, ad Clause 7.6.1 Record Boundaries.