6 Complicating the Issue: More on Element Declarations

In the simple cases described so far, it has been assumed that one can identify the immediate constituents of every element defined in a textual structure. A poem consists of stanzas, and an anthology consists of poems. Stanzas do not float around unattached to poems or combined into some other unrelated element; a poem cannot contain an anthology. All the elements of a given document type may be arranged into a hierarchic structure, arranged like a family tree with a single ancestor at the top and many children (mostly the elements containing #PCDATA) at the bottom. This gross simplification turns out to be surprisingly effective for a large number of purposes. It is not however adequate for the full complexity of real textual structures. In particular, it does not cater for the case of more or less freely floating elements that can appear at almost any hierarchic level in the structure, and it does not cater for the case where different elements overlap or several different trees may be identified in the same document. To deal with the first case, SGML provides the exception mechanism; to deal with the second, SGML permits the definition of `concurrent' document structures.

6.1 Exceptions to the Content Model

In most documents, there will be some elements that can occur at any level of its structure. Annotations, for example, might be attached to the whole of a poem, to a stanza, to a line of a stanza or to a single word within it. In a textual critical edition, the same might be true of variant readings. In this simple case, the complexity of adding an annotation element as an optional component of every content model is not particularly onerous; in a more realistically complex model perhaps containing some ten or twenty levels such an approach can become much more difficult.

To cope with this, SGML allows for any content model to be further modified by means of an exception list. There are two types of exception: inclusions, that is, additional elements that can be included at any point in the model group or any of its constituent elements; and exclusions, that is, elements that cannot be included within the current model.

To extend our declarations further to allow for annotations and variant readings, which we will assume can appear anywhere within the text of a poem, we first need to add declarations for these two elements:

<!ELEMENT (note | variant) - - (#PCDATA)>
The note and variant elements must have both start- and end-tags, since they can appear anywhere. Rather than add them to the content model for each type of poem, we can add them in the form of an inclusion list to the poem element, which now reads:
<!ELEMENT poem - O (title?, (stanza+ | couplet+ | line+) )
                                         +(note | variant) >
The plus sign at the start of the (NOTE | VARIANT) name list indicates that this is an inclusion exception. With this addition, notes or variants can appear at any point in the content of a poem element---even those (such as <title>) for which we have defined a content model of #PCDATA. They can thus also appear within notes or variants!

If we wanted for some reason to prevent notes or variants appearing within titles, we could add an exclusion exception to the declaration for <title> above:

<!ELEMENT title  - O  (#PCDATA)  -(note | variant) >
The minus sign at the start of the (NOTE | VARIANT) name list indicates that this is an exclusion exception. With this addition, notes and variants will be prohibited from appearing within titles, notwithstanding their potential inclusion implied by the previous addition to the content model for <poem>.

In the same way, we could prevent notes and variants from nesting within notes and variants by modifying the definition above to read

<!ELEMENT (note | variant) - - (#PCDATA)  -(note | variant) >
The meticulous reader will note that this precludes both variants within notes and notes within variants. Inclusion and exclusion exceptions should be used with care as their ramifications may not be immediately apparent.

6.2 Concurrent Structures

All the structures we have so far discussed have been simply hierarchic: that is, at every level of the tree, each node is entirely contained by a parent node. The figure below represents the structure of a document conforming to the simple DTD we have so far defined as a tree (drawn on its side through exigencies of space). We have already seen how Blake's poem can be divided into a title and two stanzas, each of four lines. In this diagram, we add a second poem, consisting of one stanza and a title, to make up an instance of an anthology:
                         |              |----line1
                         |              |----line2
          |              |              |----line4
          |              |
          |              |              |----line5
          |              |----stanza2---|----line6
          |                             |----line7
          |                             |----line8
          |              |-------------------title
          |              |
          |              |              |----line1
          |              |              |----line2

Clearly, there are many such trees that might be drawn to describe the structure of this or other anthologies. Some of them might be representable as further subdivisions of this tree: for example, we might subdivide the lines into individual words, since no word crosses a line boundary. But equally clearly there are many other trees that might be drawn which do not fit within this tree. We might, for example, be interested in syntactic structures --- which rarely respect the formal boundaries of verse. Or, to take a simpler example, we might want to represent the pagination of different editions of the same text.

One way of doing this would be to group the lines and titles of our current model into pages. A declaration for such an element is simple enough:

<!ELEMENT page - - ((title?, line+)+)   >
That is, a page consists of one or more unnamed groups, each of which contains an optional title, followed by a sequence of lines. (Note, incidentally, that this model prohibits a title appearing on its own at the foot of a page). However, simply inserting the element <page> into the hierarchy already defined is not as easy as it might seem. Some poems are longer than a single page, and other pages contain more than one poem. We cannot therefore insert the element <page> between <anthology> and <poem> in the hierarchy, nor can it go between <poem> and <stanza>, nor yet in both places at once! What is needed is the ability to create a separate hierarchy, with the same elements at the bottom (the stanzas, lines and titles), but combined into a different superstructure. This is the ability which the CONCUR feature of SGML gives.

A separate document type definition must be created for each hierarchic tree into which the text is to be structured. The definition we have so far built up for the anthology looks, in full, like this:

    <!DOCTYPE anthology [
    <!ELEMENT anthology      - -  (poem+)             >
    <!ELEMENT poem           - -  (title?, stanza+)   >
    <!ELEMENT stanza         - O  (line+)             >
    <!ELEMENT (title | line) - O  (#PCDATA)           >
As this example shows, the name of a document type must always be the same as the name of the largest element in it, that is the element at the top of the hierarchy. The syntax used is discussed further below (see section ). Let us now add to this declaration a second definition for a concurrent document type, which we will call a paged anthology, or <p.anth> for short:
    <!DOCTYPE p.anth [
    <!ELEMENT p.anth         - -  (page+)               >
    <!ELEMENT page           - -  ((title?, line+)+)    >
    <!ELEMENT (title|line)   - O  (#PCDATA)             >

We have now defined two different ways of looking at the same basic text---the PCDATA components grouped by both these document type definitions into lines or titles. In one view, the lines are grouped into stanzas and poems; in the other they are grouped into pages only. Notice that it is exactly the same text which is visible in both views: the two hierarchies simply allow us to arrange it in two different ways.

To mark up the two views, it will be necessary to indicate which hierarchy each element belongs to. This is done by including the name of the document type (the view) within parentheses immediately before the identifier concerned, inside both start- and end-tags. Thus, pages (which are only visible in the <p.anth> document type) must be tagged with a <(p.anth)page> tag at their start and a </(p.anth)page> at their end. In the same way, as poems and stanzas appear only in the <anthology> document type, they must now be tagged using <(anthology)poem> and <(anthology)stanza> tags respectively. For the line and title elements, however, which appear in both hierarchies, no document type specification need be given: any tag containing only a name is assumed to mark an element present in every active document type.

As a simple example, let us assume that Blake's poem appears in some paged anthology, with the page break occurring half way through the first stanza. The poem might then be marked up as follows:

    <!--      other titles and lines on this page here -->
         <(anthology)poem><title>The SICK ROSE
              <line>O Rose thou art sick.
              <line>The invisible worm,
              <line>That flies in the night
              <line>In the howling storm:
              <line>Has found out thy bed
              <line>Of crimson joy:
              <line>And his dark secret love
              <line>Does thy life destroy.
    <!--      rest of material on this page here    -->

It is now possible to select only the elements concerned with a particular view from the text, even though both are represented in the tagging. A processor concerned only with the pagination will see only those elements whose tags include the P.ANTH specification, or which have no specification at all. A processor concerned only with the ANTHOLOGY view of things will not see the page breaks. And a processor concerned to inter-relate the two views can do so unambiguously.

A note of caution is appropriate: CONCUR is an optional feature of SGML, and not all available SGML software systems support it, while those which do, do not always do so according to the letter of the standard. For that reason, if for no other, wherever these Guidelines have identified a potential application of CONCUR, they also invariably suggest alternative methods as well. For fuller discussion of these issues, see chapter 31: Multiple Hierarchies.

Note also that we cannot introduce a new element, a page number for example, into the <p.anth> document type, since there is no existing data in the <anthology> document type which could be fitted into it. One way of adding that extra information is discussed in the next section.

Back to table of contents
On to next section
Back to previous section