SGML: Inclusion Exceptions

A collection of postings on the topic of "SGML Inclusion Exceptions", from December 1994 on CTS (Usenet News COMP.TEXT.SGML).
Tim Bray
Robert Streich
Erik Naggum
David Megginson
Jeffrey McArthur
Joe English

Subject: Reply: SGMLS bug with inline elements

     * Submitted to: COMP.TEXT.SGML
     * Submitted by: Tim Bray (a07893@giant.rsoft.bc.ca )
     * Date Of Submission: 21 Dec 1994 02:35:51 UT
     * Lines: 26
       
       _________________________________________________________________
   
   Organization: MIND LINK! Communications Corp., Langley, BC, Canada
   Newsgroups: comp.text.sgml Reference: Bernhard Weichel Erik
   Naggum
   
   CTS archive link: here
     _________________________________________________________________
   


[Erik Naggum]

|   don't use inclusion exceptions.  if you think you have a reason to use
|   inclusion exceptions, you don't.  if you still think ...

But there's this problem.  SGML is supposed to be used to model documents.
SGML is *good* at this; better than the competition, anyhow.  One of the
things I find in documents, and I'd like to model, are index-point entries.
On the face of it, it seems like inclusions are exactly the tool needed to
model this correctly with little constraint on the authors who are trying
to put in the HUGE value-add conferred by a good index.

And the "index point" concept also subsumes lots of other things like
glossary entries and, uh, hypertext links to anywhere; what one might call,
generically, in-line metadata.

OK, I can work around things and kludge this kind of stuff into an
exclusion-free model, but then I get something that's hard to explain to
people.  Which 9 times out of 10 means it's wrong.

Erik, if you're going to make progress on this one I think you need some
better examples of why inclusions are hideously evil, cause cancer, and
imperil our quest for truth, justice, and the vendor-neutral way...

Cheers, Tim Bray, Open Text (tbray@opentext.com)

   
     _________________________________________________________________
   
   Back To Complete Subject Listing on remote site.
   
   Archive last updated dd. 02/04/96
   
   Suggestions to the compiler.


Subject: Reply: SGMLS bug with inline elements

     * Submitted to: COMP.TEXT.SGML
     * Submitted by: Robert Streich (streich@austin.sar.slb.com )
     * Date Of Submission: 21 Dec 1994 07:22:48 UT
     * Lines: 56
     
   
     _________________________________________________________________
   
   Organization: Schlumberger Laboratory for Computer Science
   Newsgroups: comp.text.sgml Reference: Bernhard Weichel Erik
   Naggum Tim Bray
   
   CTS archive link: here
     _________________________________________________________________
   


[Erik Naggum]

|   don't use inclusion exceptions.  if you think you have a reason to use
|   inclusion exceptions, you don't.  if you still think ...

[Tim Bray]

|   But there's this problem.  SGML is supposed to be used to model
|   documents.  SGML is *good* at this; better than the competition,
|   anyhow.  One of the things I find in documents, and I'd like to model,
|   are index-point entries.  On the face of it, it seems like inclusions
|   are exactly the tool needed to model this correctly with little
|   constraint on the authors who are trying to put in the HUGE value-add
|   conferred by a good index.

I'm not sure I understand this, Tim.  How does an inclusion make for a good
model?  And why would this have any impact on the authors?  Inclusions are
nothing more than a convenience for the DTD designer as far as I can tell.
Authors (usually) won't know, probably don't care, and may not even
understand the difference between a proper subelement and an included one.

|   OK, I can work around things and kludge this kind of stuff into an
|   exclusion-free model, but then I get something that's hard to explain
|   to people.  Which 9 times out of 10 means it's wrong.

I see this the other way around: inclusions are a kludge.  I also see them
as terribly problematic when it comes to maintaining a complex DTD over
time.  It's one of those things that I just feel in my guts will come back
to haunt you -- they'll show up where they're least expected and entirely
unwanted.

There is only one place where I've used an inclusion as a very temporary
kludge: as a container into which unmapped markup could be placed during a
conversion.  This is the *one* case I can think of that they are useful,
only because their position in the data stream can be important.  In this
case, these inclusions only remain long enough for someone to go in and
clean them up, making any corrections necessary for the markup that wasn't
mapped.

I agree wholeheartedly with the emphasis that you place on an index, but
I'm not conviced that the elements that comprise the indices need to be
inline.  The clutter they create and the problems they cause in dynamic
documents can be a nasty problem.  But this is a separate issue....

|   Erik, if you're going to make progress on this one I think you need
|   some better examples of why inclusions are hideously evil, cause
|   cancer, and imperil our quest for truth, justice, and the vendor-
|   neutral way...

We've seen the discussions of record end handling with respect to
inclusions before, Tim.  How about you post a situation in which they are a
necessity?

Robert Streich
Schlumberger Austin Research

   
     _________________________________________________________________
   
   Back To Complete Subject Listing on remote site.
   
   Archive last updated dd. 02/04/96
   
   Suggestions to the compiler.


Subject: Reply: SGMLS bug with inline elements

     * Submitted to: COMP.TEXT.SGML
     * Submitted by: Erik Naggum (erik@naggum.no )
     * Date Of Submission: 21 Dec 1994 09:43:29 UT
     * Lines: 100
       
   
     _________________________________________________________________
   
   Organization: Naggum Software; +47 2295 0313 Newsgroups:
   comp.text.sgml Reference: Bernhard Weichel Erik Naggum Tim Bray
   
   CTS archive link: here
     _________________________________________________________________
   


[Tim Bray]

|   But there's this problem.  SGML is supposed to be used to model
|   documents.  SGML is *good* at this; better than the competition,
|   anyhow.  One of the things I find in documents, and I'd like to model,
|   are index-point entries.  On the face of it, it seems like inclusions
|   are exactly the tool needed to model this correctly with little
|   constraint on the authors who are trying to put in the HUGE value-add
|   conferred by a good index.

yes, that's the standard argument for inclusion.  there are two fairly
standard counter-arguments:

1.  index entries aren't really part of the data, but is a separate
    document that ties into a source document.  we note that index entries
    have a form that is different from the actual word or words in context.
    we note that indexes are not in general points in the source document,
    but areas or ranges that frequently span larger elements, frequently
    resulting in index entries spanning pages.  we also note that index
    entries have a structure to themselves.  finally, we note that indexes
    are often not created by the author, who may not want to see the source
    document changed in the process of creating an index.  an inclusion is
    inappropriate for these concerns, and a mechanism to point into a
    document should be used instead.  HyTime provides some ideas for how
    this might be accomplished (modulo the HyTime syntax).

2.  index entries aren't really allowed everywhere if they are in the
    document at all.  i.e., index entries occur in data content, but not in
    element content.  this means that an application will have to do much
    checking to ensure that index entries occur in reasonable places.  this
    means that we have to weigh the cost of application checking and
    application code against a (possibly) more complex content model.  a
    misplaced included subelement will not be detected by the parser, and
    is often overlooked by the application writer who only considers the
    "usual cases", or has been told that he will received validated data
    from the SGML parser.  inclusion exceptions defeat this purpose.

|   And the "index point" concept also subsumes lots of other things like
|   glossary entries and, uh, hypertext links to anywhere; what one might
|   call, generically, in-line metadata.

this is the other problem that urgently needs solving, and which, when
solved, will obviate the need for inclusions.  for all its failings, HyTime
does provide ways and means to address into other documents.  in a world of
interconnected information, it doesn't make much sense to want to stuff all
the information into one document.  (hmmm, too vague.)  of all the possible
things that we would like to include in a document, or say about some piece
of information, how do we decide which things are more important?  or do we
make sure we enable locating information by any number of means?  I believe
this was what prompted HyTime's intricate and powerful location addressing.

|   OK, I can work around things and kludge this kind of stuff into an
|   exclusion-free model, but then I get something that's hard to explain
|   to people.  Which 9 times out of 10 means it's wrong.

if so, then an included element will cause kludgy solutions that 9 times out
of 10 will be wrong in the application code that implements the semantics
of the DTD, or have I missed something?  seriously, I think the extra work
that included elements cause, either in specifying kludgy content models or
hand-waving over kludgy application implementation, show that alternative
solutions should be sought.  I think pointing into the information from
other documents is the way to go.  then it is also possible to point to any
level in the structure, not only to the leaves.

as with tables, we should not primarily be concerned with how things will
finally appear in print.  after all, a database index lives a separate life
from the data, although intimately connected to it.

|   Erik, if you're going to make progress on this one I think you need
|   some better examples of why inclusions are hideously evil, cause
|   cancer, and imperil our quest for truth, justice, and the vendor-
|   neutral way...

I have not yet determined whether they cause cancer, but the rest is right
on the mark.  hey, wait!  they _do_ cause cancer: in the application code,
in the documentation and finally it spreads to the users who won't
understand why their line breaks don't work.

in conclusion: there are obvious and not-so-obvious reasons why inclusions
are bad.  when discussing why inclusions are bad, the not-so-obvious
reasons are invariably challenged.  I think we should make a first
approximation to rooting out this devil by first taking care of the obvious
reasons: you should not use inclusions for data that is obviously part of
the contents (like emphasized text), you should not use inclusions for
"floating" elements (figures) that will have to be moved, anyway, and you
should not use inclusions to "fix" what appears to be broken, but instead
take it back to the shop and get a proper repair.

finally, the inclusion exception is a cheap way to do something complex in
the document that doesn't look like it.  the programmer that has to make
your complex stuff work the way you intend (which will be harder to express
if you use inclusions) will not love you.  that means broken code, or
delays, and you will get applications that do things that your users will
come to rely on even if it's actually broken.  (pretty good to pin all this
down on inclusions, don't you think?)

#<Erik>
--
requiescat in pace: Erik Jarve (1944-1994)

   
     _________________________________________________________________
   
   Back To Complete Subject Listing on remote site.
   
   Archive last updated dd. 02/04/96
   
   Suggestions to the compiler.


Subject: Reply: SGMLS bug with inline elements

     * Submitted to: COMP.TEXT.SGML
     * Submitted by: David Megginson (dmeggins@aix1.uottawa.ca )
     * Date Of Submission: 21 Dec 1994 12:59:11 UT
     * Lines: 32
       
   
     _________________________________________________________________
   
   Organization: Department of English, University of Ottawa
   Newsgroups: comp.text.sgml Reference: Bernhard Weichel Erik
   Naggum Tim Bray
   
   CTS archive link: here
     _________________________________________________________________
   


I would like to agree whole-heartedly with Erik about inclusion exceptions.
They were a mistake in the standard, and are never necessary for DTD
design.  A well-designed DTD should use parameter entities as data classes,
allowing for inheritance when necessary, as in this grossly-oversimplied
example:

  <!-- Data classes -->
  <!ENTITY % indexing "index | link">
  <!ENTITY % emphasis "em | foreign">
  <!ENTITY % citation "title">
  <!ENTITY % phrasal "%indexing | %emphasis | %citation | #PCDATA">

  <!-- Elements -->
  [...]

  <!ELEMENT foo - - (%phrasal)+>

In other words, "#PCDATA" should _always_ appear in a parameter entity, and
should never be hard-coded into an element definition.  Using this system,
changes to element content are incredibly simple, the DTD is more
self-explaining (ie. "this is an element which contains phrasal data"
instead of "this is an element which contains PCDATA and anything else
which happens to be included at this point in the document"), and inclusion
exceptions are absolutely useless.

David
--
David Megginson                Department of English, University of Ottawa,
dmeggins@aix1.uottawa.ca       Ottawa, Ontario, CANADA  K1N 6N5
dmeggins@acadvm1.uottawa.ca    Phone: +1 613 564 6850 (Office)
ak117@freenet.carleton.ca             +1 613 564 9175 (FAX)

   
     _________________________________________________________________
   
   Back To Complete Subject Listing on remote site.
   
   Archive last updated dd. 02/04/96
   
   Suggestions to the compiler.


Subject: Reply: SGMLS bug with inline elements

     * Submitted to: COMP.TEXT.SGML
     * Submitted by: Jeffrey McArthur (j_mcarthur@BIX.com )
     * Date Of Submission: 23 Dec 1994 00:22:51 UT
     * Lines: 21
       
   
     _________________________________________________________________
   
   Organization: ATLIS Publishing Newsgroups: comp.text.sgml
   Reference: Erik Naggum
   
   CTS archive link: here
     _________________________________________________________________
   


[Erik Naggum]

|   I think you mean "inclusion exception".  if so, it's not a bug with
|   SGMLS, but in your DTD.  don't use inclusion exceptions.  if you think
|   you have a reason to use inclusion exceptions, you don't.

I love it.  Inclusion exceptions have personally caused me a lot of grief.
But I thought the problem was unique to me.

Hmm, is it possible to get inclusion exceptions removed from the next
version of SGML?

--
    Jeffrey M\kern-.05em\raise.5ex\hbox{\b c}\kern-.05emArthur
    a.k.a. Jeffrey McArthur          email: j_mcarthur@bix.com
    work:  +1 301 306 5188
    home:  +1 410 290 6935

The opinions express are mine.  They do not reflect the opinions of my
employer.  My access to the Internet is not paid for by my employer.

   
     _________________________________________________________________
   
   Back To Complete Subject Listing on remote site.
   
   Archive last updated dd. 02/04/96
   
   Suggestions to the compiler.


Subject: Inclusion exceptions (was: SGMLS bug (wasn't) with inline elements)

     * Submitted to: COMP.TEXT.SGML
     * Submitted by: Joe English (jenglish@crl.com )
     * Date Of Submission: 27 Dec 1994 22:02:26 UT
     * Lines: 55
       
   
     _________________________________________________________________
   
   Organization: Helpless people on subway trains Newsgroups:
   comp.text.sgml Reference: Bernhard Weichel Erik Naggum
   
   CTS archive link: here
     _________________________________________________________________
   


[Erik Naggum

|   now, the reason this [inclusion exceptions] hits you so hard is that
|   the rules for when record ends ("line breaks") are significant or
|   ignored are such that included subelements are not counted when
|   determining whether to ignore the last record end before an end-tag.
|   but to make this work would require that you pass over all the included
|   subelements before you could decide.  therefore, the rule was
|   instigated that record ends occur after any following included
|   subelements.

Aren't there cases where you would want this behavior though?  I can think
of a few: suppose you wanted to insert elements marking revised sections of
a document with something like:

    <!ELEMENT CHANGED - O EMPTY>
    <!ATTLIST CHANGED -- (id|idref) --
        ID      ID      #IMPLIED
        REFID   IDREF   #IMPLIED
        -- reftype NAMES #FIXED "refid change" --
    >

(<CHANGE id=xyzzy001> would mark the beginning of a change, and <CHANGE
refid=xyzzy001> would mark the end.)

Revisions can appear almost anywhere, and might span just about anything,
so it makes sense to make CHANGE an inclusion exception on some high-level
document element.

Or suppose you wanted to insert editorial comments in a document with
something like

    <!ELEMENT EDNOTE - - (#PCDATA)>
    ...
    <p>Blah blah blah <ednote>too many "blah"s here</ednote> blah blah
.

<CHANGE> might be better handled with processing instructions or HyTime
ilinks, but <EDNOTE> has more compelling reasons to be an included element,
rather than a comment (so structure-controlled applications can see them)
or an ilink (so authors can see them).

Since <CHANGE> and <EDNOTE> elements aren't part of the document proper,
the weird rules for where the record-ends go might be appropriate.  There
must be *some* rational explanation for them...  Personally, I think
Goldfarb's first law of text processing [*] has become a self-fulfilling
prophesy.

--Joe English

  jenglish@crl.com

[*] That if a text processor has bugs, at least one of them
has to do with input line endings, cited as the rationale
for clause 7.6.1, "Record Boundaries". [rcc annotation: see below]

   
     _________________________________________________________________
   
   Back To Complete Subject Listing on remote site.
   
   Archive last updated dd. 02/04/96
   
   Suggestions to the compiler.

--------------------------------
Goldfarb's first law - "inexorable operation of Goldfarb's first law of text processing, which states that if a text processor has bugs, at least one of them will have to do with the handling of input line endings" — see The SGML Handbook, page 321, note #2, ad Clause 7.6.1 Record Boundaries.