SGML: Inclusion exceptions: Eliot Kimber
Subject: Re: Difference between (p|q)+ and (p) +(q) in content model ?
Date: Sat, 20 Apr 1996 15:19:19 -0900
From: "W. Eliot Kimber" <kimber@passage.com>
References: <4l7l6h$qg3@tuegate.tue.nl> <4l86e7$12f8@piglet.cc.uic.edu>
---------------------------------------------
C M Sperberg-McQueen wrote:
...
> Or, viewed a different way:
>
> - INC requires exactly one P, while ALT allows zero or more
> - Q is legal within P if the P occurs within an INC, but not if the P
> occurs within an ALT
>
> None of these differences seem to be affected by whether P is
> defined as #PCDATA or has some more elaborate definition.
>
> Of course, defining INC as <!ELEMENT inc - - (p*) +(q) > and
> ALT as <!ELEMENT alt - - (p | q)* > removes all the differences
> but the one I suspect really counts, namely the last one.
There are two other differences:
- Q is legal within Q if Q occurs within Inc but not if Q occurs
within Alt (Q is not excluded from itself when also defined
as an inclusion).
- Record ends following Q end tag in Inc will not be taken as
data record ends in Inc, but will in Alt. Record ends following
the end tags of included elements are never taken as data record ends.
Inclusions should *only* be used for element types that are semantically
annotative and should not "disturb" the data of the elements they occur
in (by potentially adding extraneous data record ends). The way that
record ends are treated for inclusions is essential to making this
distinction. Otherwise, all content models that use inclusions could be
recast using direct mention in the base content model. (See the notes
under section 11.2.5.1 in ISO 8879 (pg. 419 in The SGML Handbook).
Typical element types that are annotative are index entries and
footnotes. The textbook example is the use of index entries in literal
examples. Consider this document fragment:
<example>
line one of example
<indexentry>line one</indexentry>
second line of example
third line of example
</example>
If the content model of Example is (#PCDATA | IndexEntry)* then
there will be two data record ends between line one and line two (the
one following line one and the one following the IndexEntry end tag). If
the content model of Example is (#PCDATA) +(IndexEntry), then there will
be only one data record end between line one and line two (the one
following line one, as the one following the IndexEntry end tag is
ignored by the rules of inclusions).
Clearly, the second content model is the correct one, because index
entries are inherently annotative and their presence or absence should
not disturb the data round them.
--
<Address HyTime=bibloc
homepage="http://www.squirrel.com/squirrel/drmacro">
W. Eliot Kimber, kimber@passage.com
Senior SGML Consultant and HyTime Specialist
Passage Systems, Inc., 10596 N. Tantau Ave., Cupertino, CA 95014-3535
(408) 366-0300 (Cupertino), (512) 339-1400 (Austin),
http://www.passage.com </Address>
"If I never had existed, would you still remember me?..."
--Austin Lounge Lizards, "1984 Blues"
(http://www.webcom.com/~yeolde/all/lllhome.html)