SGML: Inclusion exceptions: Eliot Kimber

SGML: Inclusion exceptions: Eliot Kimber


Subject: Re: Difference between (p|q)+  and (p) +(q) in content model ?
Date: Sat, 20 Apr 1996 15:19:19 -0900
From: "W. Eliot Kimber" <kimber@passage.com>
References: <4l7l6h$qg3@tuegate.tue.nl> <4l86e7$12f8@piglet.cc.uic.edu>
--------------------------------------------- C M Sperberg-McQueen wrote: ... > Or, viewed a different way: > > - INC requires exactly one P, while ALT allows zero or more > - Q is legal within P if the P occurs within an INC, but not if the P > occurs within an ALT > > None of these differences seem to be affected by whether P is > defined as #PCDATA or has some more elaborate definition. > > Of course, defining INC as <!ELEMENT inc - - (p*) +(q) > and > ALT as <!ELEMENT alt - - (p | q)* > removes all the differences > but the one I suspect really counts, namely the last one. There are two other differences: - Q is legal within Q if Q occurs within Inc but not if Q occurs within Alt (Q is not excluded from itself when also defined as an inclusion). - Record ends following Q end tag in Inc will not be taken as data record ends in Inc, but will in Alt. Record ends following the end tags of included elements are never taken as data record ends. Inclusions should *only* be used for element types that are semantically annotative and should not "disturb" the data of the elements they occur in (by potentially adding extraneous data record ends). The way that record ends are treated for inclusions is essential to making this distinction. Otherwise, all content models that use inclusions could be recast using direct mention in the base content model. (See the notes under section 11.2.5.1 in ISO 8879 (pg. 419 in The SGML Handbook). Typical element types that are annotative are index entries and footnotes. The textbook example is the use of index entries in literal examples. Consider this document fragment: <example> line one of example <indexentry>line one</indexentry> second line of example third line of example </example> If the content model of Example is (#PCDATA | IndexEntry)* then there will be two data record ends between line one and line two (the one following line one and the one following the IndexEntry end tag). If the content model of Example is (#PCDATA) +(IndexEntry), then there will be only one data record end between line one and line two (the one following line one, as the one following the IndexEntry end tag is ignored by the rules of inclusions). Clearly, the second content model is the correct one, because index entries are inherently annotative and their presence or absence should not disturb the data round them. -- <Address HyTime=bibloc homepage="http://www.squirrel.com/squirrel/drmacro"> W. Eliot Kimber, kimber@passage.com Senior SGML Consultant and HyTime Specialist Passage Systems, Inc., 10596 N. Tantau Ave., Cupertino, CA 95014-3535 (408) 366-0300 (Cupertino), (512) 339-1400 (Austin), http://www.passage.com </Address> "If I never had existed, would you still remember me?..." --Austin Lounge Lizards, "1984 Blues" (http://www.webcom.com/~yeolde/all/lllhome.html)