0-201-41999-8 - Neil Bradley, The Concise SGML Companion

[This local archive copy is from the official and canonical URL, http://cseng.awl.com/bookdetail.qry?ISBN=0-201-41999-8&ptype=33; please refer to the canonical source document if possible.]

Computer & Engineering Publishing Group p r o f e s s i o n a l

This title is available for purchase on-line!

Book Description

Supporting Web site

Table of Contents

Chapter 10

Quick Search

Take our user survey! Let us know what you think.
Subscribe to our mailing lists

The Concise SGML Companion

by: Neil Bradley

ISBN: 0-201-41999-8

Chapter 10

Chapter 10. CALS Tables

SGML does not provide a standard model for representation of tabular material. Although an ISO model exists (in technical report ISO/IEC TR 9573), a different de facto standard has arisen from widespread use of software that supports the CALS DTDs, which were defined by the US Department of Defense for interchange of documentation between the DoD and its sub-contractors (the acronym currently stands for Continuous Acquisition and Lifecycle Support).
The table model described in this chapter is derived from Appendix A of MIL-M-28001B (26 June 1993), and includes many usage recommendations from the SGML Open Committee technical memorandum TR 9502:1995, of 19 October 1995, which addresses several ambiguities in the standard.

When to use CALS tables

SGML may represent tabular material in any number of ways, so why attempt to apply conformity, and the compromises this implies? CALS is not always the best solution - it does little to meet the SGML ideal of replacing format with meaning.
In one sense, any SGML-aware application can 'understand' any SGML table model. The DTD provides enough information to ensure that the correct elements and attributes are used in the construction of a table. In this sense, a table is no different to any other structure. However, by its nature a table makes heavy use of elements or attributes, particularly if it includes border lines, straddled cells and varying text alignments. In addition, there are obvious difficulties reading text intended to be displayed in a two-dimensional grid when it is presented in a linear fashion. The solution to these problems is to 'hide' the tags, and to display the content in its true tabular form. But to do this, the composition software must be aware of the purpose of relevant elements and attributes, and a recognized standard is essential to justify the effort involved in writing WYSIWYG editing and composition routines.
The CALS model has arisen as the de facto solution due to its use in software developed for US defense applications.
Note: The CALS table model has been used to create all of the tables in this book.
The CALS format should not be used if the table structure is simple, and has meaningful columns or rows which need to be identified for database searching or re-formatting in non-tabular representations. In this case, specific elements should be defined for the structure, and software relied upon to translate these elements into a table layout on demand.
A good example of this is a catalog, containing tables that include an item number column and a price column. For software to update the prices, it must be able to recognize these columns. A sensible structure would be:
<prices><br>
<item><code>XYZ-15<price>987</item>
<item><code>XYZ-22<price>765</item>
</prices>
However, more difficulties displaying or printing this information in tabular format should be expected.
The rest of this chapter assumes that the effort of adding a sensible structure to the DTD cannot be justified, in which case the CALS model is ideal.

The Table Structure

The table is divided into logical segments, which in turn form grid-like structures. Column widths are also discussed.
The entire table is enclosed in a Table element. The Table element may contain an Identifier attribute, Id, which serves as the target for all cross-reference links (see Chapter 9). An optional Orient attribute, specifies a portrait ('port') or landscape ('land') orientation for the entire table. The default setting assumes a portrait orientation. An optional Page Wide attribute (Pgwide), specifies whether the table spans only a single column in a multi-column page (using a value of '0') or spans the entire page or display width. A value of '0' (zero) indicates no spanning. A value of '1' allows spanning. When the table is displayed in landscape it has an implied value of '1' (span all columns), simply because it would not be necessary to display the table in landscape orientation if the extra width this provides were not required.
The table is composed of one or more Table Group elements (Tgroup), which define the number of columns present using the Cols attribute.

Note that some composing software will only accept a single Table Group element.
The Table Group element may contain an Id attribute which serves as the target for cross-reference links.
Each Table Group usually contains a Column Specification element (Colspec), for each column present. These elements are not required for very simple tables, but if present they have attributes to define column widths, column names, and defaults for border lines and text alignment.
There may also be Span Specification elements present (Spanspec), which are used to aid cell straddling, and are described in detail later.
Both the Columns Specification and Span Specification elements are empty (in the sense that they do not surround data or other elements). They exist only as containers for various attributes.
The general structure:
<table id="tab123">
<tgroup cols="4">
<colspec ...>
<spanspec ...>
......
.....
</tgroup>
</table>

Column widths

Unlike most other aspects of table creation, definition of column widths can rarely be automated, as the author often needs to make a subjective decision on how to divide the available space, so the issues raised in this section must be appreciated by document authors if fundamental design mistakes are to be avoided.
The Column Specification element contains a Colwidth attribute to determine the width of the specific column. For example:
<colspec colnum="1" colwidth="2 CM">
<colspec colnum="2" colwidth="1.5 CM">
<colspec colnum="3" colwidth="4 CM">
<colspec colnum="4" colwidth="1 CM">

The column widths may be defined using a number of notations:

PT (points)
CM (centimeters)
MM (millimeters)
PI (picas)
IN (inches)
If none of these are specified, 'PT' acts as the default, so a value of '12' is the same as '12 PT' (though some applications may insist on a measurement notation). Case is not significant, so 'pt' is the same as 'PT'.
However, such an approach is not suitable for multimedia publishing, where the available column widths vary (depending on the current page, column or screen width). A more neutral approach uses proportional widths, where each column is measured only as a relative comparison with the other columns.
The notation used for proportional widths is an asterisk '*'. A definition of '1*' (or just '*') specifies the smallest unit available. A width of '3*' specifies three times the smallest unit. The size of a single unit is not pre-defined, but is calculated whenever the table is composed to a specific screen, page or column width. The actual size of the smallest unit is determined by adding together all the unit values, then dividing the available width by this figure. If the available width is 60 millimeters, for example, and the column specification elements declare width values of '1*', '2*' and '3*', then the value of one unit is 60/(1+2+3) = 10. The first column is therefore 1 € 10 = 10 mm wide, the second is 2 € 10 = 20 mm, and the third is 3 € 10 = 30 mm.
If no value is supplied for a particular column, a proportional width of '1*' is assumed.
It is not a requirement that any column be defined using the smallest unit. The previous example could be defined using the values '2*', '4*' and '6*', for example, though there would be little point in doing so. However, this fact is important because fractional values are not advised (some composing software will not recognize fractions). An example above had one column set to '1 cm' and another set to '1.5 cm'. Equivalent proportional values should be larger to avoid the decimal point. Values of '2*' and '3*' would suffice.
Much larger values can be used when necessary. If a two-column table had one column only slightly wider than the other, the proportional values could be '20*' and '21*'.
Fixed and proportional width values may be mixed, though it is not advised for the same reasons as stated previously. If used, the fixed width columns reserve the required horizontal space, and the proportional width columns must share the remaining space using the scheme outlined above.

Table segments

Each Table Group is further divided into three logical divisions; the Table Header segment, Thead, the Table Footer segment, Tfoot, and the Table Body segment, Tbody. The Table Body is the only required element of these three. If present, however, the Table Footer is placed before the Table Body in the data stream.

This unusual configuration aids pagination software when dealing with multi-page tables (note that the element structure has no concept of pages - composing software must decide where to split large tables across pages). By placing the footer text before the body text, composing software is able to access the footer text for insertion at the base of the first page containing a reference to it.
Table Header rows are identified so that the enclosed text may be repeated at the top of each page (and possibly appear in a different style to the body text).
Note that the Table Header and Table Footer segments may include overriding Colspec elements (but not Spanspec elements), as these segments may have differing column widths to the main body of the table. If used, any missing re-definitions default to a proportional width of '1*'.
The CALS model is 'row oriented,' which means that the table is built row-by-row, with each row containing entries.
The three segments in the table group are all composed of one or more Row elements (even the footer segment), and each Row element contains a number of Entry elements:
<tbody>
<row>
<entry>Red</entry>
<entry>Urgent</entry>
</row>
...
...
</tbody>
The number of rows in a segment (and ultimately in a Table Group) is determined purely by the number of Row elements present.
The number of Entries in a Row is pre-determined by the value of the Cols attribute in the Tgroup element. There may not be more Entry elements within a Row element than allowed by this value. Leading empty entries may in theory be omitted by including the Name Start attribute (see the next section) or Colname attribute in the first real Entry, though some composing software may not support this. It is known that some software requires trailing empty Entry elements to be present, though this restraint is not defined by CALS.

Straddled entries

The content of an entry may cross logical cell boundaries, both horizontally and vertically.
An Entry may expand horizontally to the right, and occupy the space normally reserved for other entries.
Various attributes are used to indicate the scope of the straddle. In all cases, attribute values refer to column names, as defined in the Colspec elements. The Column Specification element has a Colname attribute, which attaches a user-defined name to a logical column number. For example, column 1 may be named 'color.' Any later reference to 'color' is actually a reference to column 1:
<colspec colnum="1" colname="color">
<colspec colnum="2" colname="priority">

This indirect method of specifying a column also allows new columns to be inserted, without invalidating such references. The column named 'color' may be easily changed to column 2, for example, and all references to 'color' will be affected.
The Entry element has straddling attributes. The name of the first column in a spanning entry is placed in the Name Start attribute, Namest. The name of the last column in a spanning entry is placed in the Nameend attribute:
<colspec colnum="1" colname="color">
<colspec colnum="2" colname="priority">
.......
<row>
<entry>Red<entry>Urgent
</row>
<row>
<entry namest="color"nameend="priority">
Blue (with no priority indicator)
</row>
Note that the second row contains only one Entry element. Normally, all the Entry elements required to fill a row will be present, even if they contain no text, but in this case composing software is expected to take account of the previous, straddling Entry. A second Entry element in the second row of the example above would be invalid according to the CALS model (though an SGML parser would not detect this error, as the structure still conforms to the DTD rules).
A more convenient mechanism for referring to ranges of columns is provided by the Span Specification element (Spanspec). This element defines a single name for a range of columns. It is of benefit when the same horizontal span is used several times, because repeated reference to the same start and end column names is a laborious task. This element includes the Namest and Nameend attributes to determine the range, and a new attribute called Spanname to label this range. The example below produces identical output to the previous example:
<colspec colnum="1" colname="color">
<colspec colnum="2" colname="priority">
<spanspec namest="color" nameend="priority"
spanname="myspan">
.......
<row>
<entry spanname="myspan">
Blue (with no priority)</entry>
</row>
In the example above, the single name 'myspan' represents a range starting at 'color' and ending at 'priority'. These column names are defined in the Column Specification elements as column 1 and column 2 respectively. Therefore, 'myspan' ultimately indicates a span from column 1 to column 2.
Vertical straddling is achieved using the Entry attribute Morerows. The default value of this attribute is '0' (zero), indicating no vertical span. To span the Entry into lower rows, the value is changed to reflect the number of additional rows.
<row><entry>Red<entry>Urgent<entry>1</row>
<row>
<entry>Blue</entry>
<entry >morerows="1">no priority indicator for these colors</entry>
<entry>2</entry>
</row>
<row>
<entry>brown</entry><entry>3</entry>
</row>
Again, note that there are only two Entry elements in the final row, as the middle column is occupied by the Entry element above this position. The second Entry element is therefore deemed to occupy the third column. As before, it is incorrect to place three Entry elements here, but this rule is beyond the scope of a parser to validate.

Border lines
Border lines may surround the table, or separate specified columns, rows or entries.
There are two concepts embodied in CALS that relate to border lines. The first concerns lines around the entire table, which may be described as 'external' border lines. The second concerns lines between adjacent columns and adjacent rows, which may be described as 'internal' border lines.
The presence of external border lines is controlled by the Frame attribute in the Table element. Its value may be 'none', 'all', 'topbot', 'top', 'bottom' or 'sides'. The default setting is 'all', indicating the presence of a box around the table.
Internal border lines are defined using the Column Separator attribute (Colsep), and the Row Separator attribute (Rowsep). These attributes appear in various elements. They hold numeric values, where a value of '0' (zero) indicates no border line, and a value of '1' (one) represents the presence of a line. The value '1' is the default in both attributes, and in all affected elements, indicating that every column and every row (and therefore every cell) is surrounded by lines. Any value other than '0' may be used to indicate the presence of a border line, and some pagination systems infer further meaning from the value given (for example, '1' may represent a single line, '2' may represent a double line, '3' may represent a dashed line, and so on, though CALS does not presently make this distinction, and there are no agreed number-to-style relationships).
When the presence of a vertical border line is indicated, its position is assumed to be to the right of the column or entry concerned. When a horizontal border line is indicated, its position is assumed to be below the row or entry concerned.
It may seem that a conflict arises between the setting of the column separator of the last column, and the use of the Frame attribute described previously - both would seem to specify a vertical line to the right of the table. This conflict is avoided by ignoring any settings for the last column. The same principle is applied to the last row separator. In both cases, the Frame attribute must be utilized to gain the desired effect, and the default setting of 'all' places lines in these positions.
The elements that contain one or both of these attributes are Table, Table Group, Column Specification, Span Specification, Row, Entry Table and Entry. The principle adopted is that large objects indicate the majority case, and small objects override these settings. For example, the Table element may dictate that all rows have lines, but the Row element representing row six may specify that it has no line, so every row except row six will be followed by a horizontal line. This approach encourages efficient use of attributes. The document author is expected to generalize at each level of the table hierarchy, so as to avoid unnecessary work at the lower levels.
Border lines do not dissect straddled entries. This is illustrated in various examples within this chapter.

Text alignment

The position of text or other objects within the boundary of an entry can be set in various ways, both horizontally and vertically.
When the area of an entry is wider or higher than the text it encloses, the text by default appears in the top-left corner of the entry. However, the text may be re-aligned both horizontally and vertically. As with border lines and straddling entries, control of text alignment is provided by attributes, and managed by the judicious use of these attributes within the hierarchy of elements.
Several elements contain an Align attribute, which takes one of the following values: 'left', 'right', 'center', 'justify' or 'char'. This attribute controls horizontal alignment of text within an Entry element, and the first three settings listed above have an obvious effect. Note the American spelling of 'center' - some composing software will not recognize and apply this alignment if the attribute declaration uses the British spelling of 'centre'.
The 'justify' option aligns text both left and right, using extra inter-word spaces where necessary to achieve this effect. The 'char' option aligns text on a nominated character.
When the 'char' option is selected, further information is required. Both the character to be used for alignment, and the position of this character within the entry are needed. The significant character is specified in the Char attribute (typically, its value would be a full-point, '.', for alignment of decimal numbers in columns of figures). By default, this character will appear horizontally centered within the entry, but if this position is inconvenient, due to a combination of lack of space and unbalanced text, it can be changed using the Character Offset attribute, Charoff. The Charoff attribute value must be a numeric token that represents a percentage offset from the left edge of the entry. A value of '25', for example, would place the left edge of the significant character a quarter of the width of the column from the left edge of the cell.
In the following example, the first column is aligned on full-point, with a character offset of '30' percent, and the second column is aligned on comma, with a character offset of '60' percent:
Elements that contain the attributes described above are Table Group, Column Specification, Span Specification and Entry. In all cases, the default value is 'left'.
Vertical alignment is defined using the Vertical Alignment element (Valign), which is allowed in the Table Head element, the Table Body element, the Table Footer element, and the Row and Entry elements. The Vertical Alignment attribute may take the following values: 'top', 'bottom' and 'middle'. The default value is 'top'. As before, the Row and Entry elements may override higher level settings.
Note the Vertical Alignment option of 'middle' rather than 'center'. This is an example of the SGML restriction on name lists in attribute declarations. As explained previously, two attributes cannot share a name token value. In order to abbreviate the start-tag, as in the following example, there must be no ambiguity as to which attribute a value belongs. In this case, it is clear that Vertical Alignment is being defined:
<entry middle>

Entry content and Inner Tables
The entry element may contain further element structures, or be replaced by an embedded table.
Previous examples demonstrated that an Entry element may contain normal characters, but tables may be more complex than this. The CALS definition does not restrict the content of an Entry element to simple text, the DTD may define any required subelement structures. It may, for example, be necessary to allow multiple Paragraphs:

The Entry element content may be re-defined for specific purposes (no other element in a CALS table should be re-defined, because this would confuse software developed to deal with the standard).
Some applications require the further ability to contain tables within tables. A cell may need to contain a complete embedded table. The CALS standard does not allow a Table element to be contained within an Entry belonging to another Table, but instead defines a new element called Entrytbl, which is used in place of the Entry element.
The Entry Table element has most of the same attributes as the Entry element, but also has an additional Columns attribute, Cols, which works in the same way as the Columns attribute in the Table element.
An Entry Table is similar to a Table, in that it may contain Columns Specification, Span Specifications, Table Header and Table Body elements (though not footers):
<row><entry>Red</><entry>DANGER</><entry>1</>
<row>
<entry>Amber</>
<entrytbl cols="2">
<tbody>
<row><entry>flashing</>
<entry>URGENT</></row>
<row><entry>steady</>
<entry>IMPORTANT</></row>
</tbody>
</entrytbl>
<entry>2</entry>
</row>
<row><entry>Green</><entry>NORMAL</><entry>3</>

An Entry Table cannot be used in header and footer segments.
Although CALS provides this facility, extreme caution is advised as many applications do not support this feature. It is also possible to produce the same effect artificially (without using Entry Table elements), by defining the extra columns and rows required, and using the spanning feature to hide these columns and rows in surrounding entries. Although cumbersome, this method is guaranteed to be compatible with all CALS-aware applications (and was actually used to produce the table shown above, which therefore has four columns and four rows).