SGML: Inclusion exceptions: Eliot Kimber


Subject: Re: Difference between (p|q)+  and (p) +(q) in content model ?

Date: Sat, 20 Apr 1996 15:19:19 -0900

From: "W. Eliot Kimber" <kimber@passage.com>

References: <4l7l6h$qg3@tuegate.tue.nl> <4l86e7$12f8@piglet.cc.uic.edu>


---------------------------------------------



C M Sperberg-McQueen wrote:
 ...
> Or, viewed a different way:
> 
>  - INC requires exactly one P, while ALT allows zero or more
>  - Q is legal within P if the P occurs within an INC, but not if the P
>    occurs within an ALT
> 
> None of these differences seem to be affected by whether P is
> defined as #PCDATA or has some more elaborate definition.
> 
> Of course, defining INC as <!ELEMENT inc - - (p*) +(q) > and
> ALT as <!ELEMENT alt - - (p | q)* > removes all the differences
> but the one I suspect really counts, namely the last one.

There are two other differences:

- Q is legal within Q if Q occurs within Inc but not if Q occurs
  within Alt (Q is not excluded from itself when also defined
  as an inclusion).
- Record ends following Q end tag in Inc will not be taken as
  data record ends in Inc, but will in Alt. Record ends following
  the end tags of included elements are never taken as data record ends.

Inclusions should *only* be used for element types that are semantically 
annotative and should not "disturb" the data of the elements they occur 
in (by potentially adding extraneous data record ends). The way that 
record ends are treated for inclusions is essential to making this 
distinction. Otherwise, all content models that use inclusions could be 
recast using direct mention in the base content model. (See the notes
under section 11.2.5.1 in ISO 8879 (pg. 419 in The SGML Handbook).

Typical element types that are annotative are index entries and 
footnotes. The textbook example is the use of index entries in literal 
examples. Consider this document fragment:

<example>
line one of example
<indexentry>line one</indexentry>
second line of example
third line of example
</example>

If the content model of Example is (#PCDATA | IndexEntry)* then
there will be two data record ends between line one and line two (the 
one following line one and the one following the IndexEntry end tag). If 
the content model of Example is (#PCDATA) +(IndexEntry), then there will 
be only one data record end between line one and line two (the one 
following line one, as the one following the IndexEntry end tag is 
ignored by the rules of inclusions).

Clearly, the second content model is the correct one, because index 
entries are inherently annotative and their presence or absence should 
not disturb the data round them. 

-- 
<Address HyTime=bibloc 
         homepage="http://www.squirrel.com/squirrel/drmacro">
W. Eliot Kimber, kimber@passage.com 
Senior SGML Consultant and HyTime Specialist
Passage Systems, Inc., 10596 N. Tantau Ave., Cupertino, CA 95014-3535 
(408) 366-0300 (Cupertino), (512) 339-1400 (Austin), 
http://www.passage.com </Address>
"If I never had existed, would you still remember me?..." 
--Austin Lounge Lizards, "1984 Blues" 
(http://www.webcom.com/~yeolde/all/lllhome.html)