[Mirrored from: http://www.aw.com/devpress/titles/41999.html, December 04, 1996]
Clarity, comprehensive coverage and precision are just a few of the reasons why anyone who needs to get up to speed with SGML will appreciate The Concise SGML Companion. This book is destined to become the essential desktop/briefcase reference for busy SGML practitioners. Neil Bradley's philosophy, 'to get to the point quickly and stick with it until it's been fully explained', has been eagerly received by his reviewers:
"I haven't seen anything this useful on the market yet. Its contents span the range of questions from learning about SGML to using it in anger."
Charlie Stross fma Ltd
"It will be one of the best SGML books on the market. "
-Professor David Barron University of Southampton
Features include:
There is also an accompanying World Wide Web site which can be found at http://www.bradley.co.uk
Neil Bradley has worked with SGML since 1986, originally as a programmer working in the data conversion industry, then as a systems integrator, trainer and consultant.
Preface......... vii
Chapter 1. : Using this book .....1
Chapter 2. Overview .........3
Chapter 3. Electronic markup.........9
Chapter 4. SGML markup.........15
Chapter 5. Document components.........31
Chapter 6. Entities .........39
Chapter 7. DTD .........57
Chapter 9. Cross-references .........91
Chapter 10. CALS tables.........101
Chapter 11. ISO 9573 math .........115
Chapter 12. HTML .........125
Chapter 13. SGMLS and NSGMLS parsers.........155
Chapter 14. Charts and tables .........163
Road Map.........183
SGML does not provide a standard model for representation of tabular
material. Although an ISO model exists (in technical report ISO/IEC TR
9573), a different de facto standard has arisen from widespread use of
software that supports the CALS DTDs, which were defined by the US
Department of Defense for interchange of documentation between the DoD and
its sub-contractors (the acronym currently stands for Continuous
Acquisition and Lifecycle Support).
The table model described in this chapter is derived from Appendix A of
MIL-M-28001B (26 June 1993), and includes many usage recommendations from
the SGML Open Committee technical memorandum TR 9502:1995, of 19 October
1995, which addresses several ambiguities in the standard.
SGML may represent tabular material in any number of ways, so why attempt
to apply conformity, and the compromises this implies? CALS is not always
the best solution - it does little to meet the SGML ideal of replacing
format with meaning.
In one sense, any SGML-aware application can 'understand' any SGML table
model. The DTD provides enough information to ensure that the correct
elements and attributes are used in the construction of a table. In this
sense, a table is no different to any other structure. However, by its
nature a table makes heavy use of elements or attributes, particularly if
it includes border lines, straddled cells and varying text alignments. In
addition, there are obvious difficulties reading text intended to be
displayed in a two-dimensional grid when it is presented in a linear
fashion. The solution to these problems is to 'hide' the tags, and to
display the content in its true tabular form. But to do this, the
composition software must be aware of the purpose of relevant elements and
attributes, and a recognized standard is essential to justify the effort
involved in writing WYSIWYG editing and composition routines.
The CALS model has arisen as the de facto solution due to its use in
software developed for US defense applications.
Note: The CALS table model has been used to create all of the tables in
this book.
The CALS format should not be used if the table structure is simple, and
has meaningful columns or rows which need to be identified for database
searching or re-formatting in non-tabular representations. In this case,
specific elements should be defined for the structure, and software relied
upon to translate these elements into a table layout on demand.
A good example of this is a catalog, containing tables that include an item
number column and a price column. For software to update the prices, it
must be able to recognize these columns. A sensible structure would be:
<prices><br>
However, more difficulties displaying or printing this information in
tabular format should be expected.
The rest of this chapter assumes that the effort of adding a sensible
structure to the DTD cannot be justified, in which case the CALS model is
ideal.
The table is divided into logical segments, which in turn form grid-like
structures. Column widths are also discussed.
The entire table is enclosed in a Table element. The Table element may
contain an Identifier attribute, Id, which serves as the target for all
cross-reference links (see Chapter 9). An optional Orient attribute,
specifies a portrait ('port') or landscape ('land') orientation for the
entire table. The default setting assumes a portrait orientation. An
optional Page Wide attribute (Pgwide), specifies whether the table spans
only a single column in a multi-column page (using a value of '0') or spans
the entire page or display width. A value of '0' (zero) indicates no
spanning. A value of '1' allows spanning. When the table is displayed in
landscape it has an implied value of '1' (span all columns), simply because
it would not be necessary to display the table in landscape orientation if
the extra width this provides were not required.
The table is composed of one or more Table Group elements (Tgroup), which
define the number of columns present using the Cols attribute.
Note that some composing software will only accept a single Table Group element.
The Table Group element may contain an Id attribute which serves as the
target for cross-reference links.
Each Table Group usually contains a Column Specification element (Colspec),
for each column present. These elements are not required for very simple
tables, but if present they have attributes to define column widths, column
names, and defaults for border lines and text alignment.
There may also be Span Specification elements present (Spanspec), which are
used to aid cell straddling, and are described in detail later.
Both the Columns Specification and Span Specification elements are empty
(in the sense that they do not surround data or other elements). They exist
only as containers for various attributes.
The general structure:
<table id="tab123">
Unlike most other aspects of table creation, definition of column widths
can rarely be automated, as the author often needs to make a subjective
decision on how to divide the available space, so the issues raised in this
section must be appreciated by document authors if fundamental design
mistakes are to be avoided.
The Column Specification element contains a Colwidth attribute to determine
the width of the specific column. For example:
<colspec colnum="1" colwidth="2 CM">
The column widths may be defined using a number of notations:
However, such an approach is not suitable for multimedia publishing, where
the available column widths vary (depending on the current page, column or
screen width). A more neutral approach uses proportional widths, where each
column is measured only as a relative comparison with the other columns.
The notation used for proportional widths is an asterisk '*'. A definition
of '1*' (or just '*') specifies the smallest unit available. A width of
'3*' specifies three times the smallest unit. The size of a single unit is
not pre-defined, but is calculated whenever the table is composed to a
specific screen, page or column width. The actual size of the smallest unit
is determined by adding together all the unit values, then dividing the
available width by this figure. If the available width is 60 millimeters,
for example, and the column specification elements declare width values of
'1*', '2*' and '3*', then the value of one unit is 60/(1+2+3) = 10. The
first column is therefore 1 ¥ 10 = 10 mm wide, the second is 2 ¥ 10 = 20
mm, and the third is 3 ¥ 10 = 30 mm.
If no value is supplied for a particular column, a proportional width of
'1*' is assumed.
It is not a requirement that any column be defined using the smallest unit.
The previous example could be defined using the values '2*', '4*' and '6*',
for example, though there would be little point in doing so. However, this
fact is important because fractional values are not advised (some composing
software will not recognize fractions). An example above had one column set
to '1 cm' and another set to '1.5 cm'. Equivalent proportional values
should be larger to avoid the decimal point. Values of '2*' and '3*' would
suffice.
Much larger values can be used when necessary. If a two-column table had
one column only slightly wider than the other, the proportional values
could be '20*' and '21*'.
Fixed and proportional width values may be mixed, though it is not advised
for the same reasons as stated previously. If used, the fixed width columns
reserve the required horizontal space, and the proportional width columns
must share the remaining space using the scheme outlined above.
Each Table Group is further divided into three logical divisions; the Table
Header segment, Thead, the Table Footer segment, Tfoot, and the Table Body
segment, Tbody. The Table Body is the only required element of these three.
If present, however, the Table Footer is placed before the Table Body in
the data stream.
This unusual configuration aids pagination software when dealing with
multi-page tables (note that the element structure has no concept of pages
- composing software must decide where to split large tables across pages).
By placing the footer text before the body text, composing software is able
to access the footer text for insertion at the base of the first page
containing a reference to it.
Table Header rows are identified so that the enclosed text may be repeated
at the top of each page (and possibly appear in a different style to the
body text).
Note that the Table Header and Table Footer segments may include overriding
Colspec elements (but not Spanspec elements), as these segments may have
differing column widths to the main body of the table. If used, any missing
re-definitions default to a proportional width of '1*'.
The CALS model is 'row oriented,' which means that the table is built
row-by-row, with each row containing entries.
The three segments in the table group are all composed of one or more Row
elements (even the footer segment), and each Row element contains a number
of Entry elements:
<tbody>
The number of rows in a segment (and ultimately in a Table Group) is
determined purely by the number of Row elements present.
The number of Entries in a Row is pre-determined by the value of the Cols
attribute in the Tgroup element. There may not be more Entry elements
within a Row element than allowed by this value. Leading empty entries may
in theory be omitted by including the Name Start attribute (see the next
section) or Colname attribute in the first real Entry, though some
composing software may not support this. It is known that some software
requires trailing empty Entry elements to be present, though this restraint
is not defined by CALS.
The content of an entry may cross logical cell boundaries, both
horizontally and vertically.
An Entry may expand horizontally to the right, and occupy the space
normally reserved for other entries.
Various attributes are used to indicate the scope of the straddle. In all
cases, attribute values refer to column names, as defined in the Colspec
elements. The Column Specification element has a Colname attribute, which
attaches a user-defined name to a logical column number. For example,
column 1 may be named 'color.' Any later reference to 'color' is actually a
reference to column 1:
<colspec colnum="1" colname="color">
This indirect method of specifying a column also allows new columns to be
inserted, without invalidating such references. The column named 'color'
may be easily changed to column 2, for example, and all references to
'color' will be affected.
The Entry element has straddling attributes. The name of the first column
in a spanning entry is placed in the Name Start attribute, Namest. The name
of the last column in a spanning entry is placed in the Nameend attribute:
<colspec colnum="1" colname="color">
Note that the second row contains only one Entry element. Normally, all the
Entry elements required to fill a row will be present, even if they contain
no text, but in this case composing software is expected to take account of
the previous, straddling Entry. A second Entry element in the second row of
the example above would be invalid according to the CALS model (though an
SGML parser would not detect this error, as the structure still conforms to
the DTD rules).
A more convenient mechanism for referring to ranges of columns is provided
by the Span Specification element (Spanspec). This element defines a single
name for a range of columns. It is of benefit when the same horizontal span
is used several times, because repeated reference to the same start and end
column names is a laborious task. This element includes the Namest and
Nameend attributes to determine the range, and a new attribute called
Spanname to label this range. The example below produces identical output
to the previous example:
<colspec colnum="1" colname="color">
In the example above, the single name 'myspan' represents a range starting
at 'color' and ending at 'priority'. These column names are defined in the
Column Specification elements as column 1 and column 2 respectively.
Therefore, 'myspan' ultimately indicates a span from column 1 to column 2.
Vertical straddling is achieved using the Entry attribute Morerows. The
default value of this attribute is '0' (zero), indicating no vertical span.
To span the Entry into lower rows, the value is changed to reflect the
number of additional rows.
<row><entry>Red<entry>Urgent<entry>1</row>
Again, note that there are only two Entry elements in the final row, as the
middle column is occupied by the Entry element above this position. The
second Entry element is therefore deemed to occupy the third column. As
before, it is incorrect to place three Entry elements here, but this rule
is beyond the scope of a parser to validate.
There are two concepts embodied in CALS that relate to border lines. The
first concerns lines around the entire table, which may be described as
'external' border lines. The second concerns lines between adjacent columns
and adjacent rows, which may be described as 'internal' border lines.
The presence of external border lines is controlled by the Frame attribute
in the Table element. Its value may be 'none', 'all', 'topbot', 'top',
'bottom' or 'sides'. The default setting is 'all', indicating the presence
of a box around the table.
Internal border lines are defined using the Column Separator attribute
(Colsep), and the Row Separator attribute (Rowsep). These attributes appear
in various elements. They hold numeric values, where a value of '0' (zero)
indicates no border line, and a value of '1' (one) represents the presence
of a line. The value '1' is the default in both attributes, and in all
affected elements, indicating that every column and every row (and
therefore every cell) is surrounded by lines. Any value other than '0' may
be used to indicate the presence of a border line, and some pagination
systems infer further meaning from the value given (for example, '1' may
represent a single line, '2' may represent a double line, '3' may represent
a dashed line, and so on, though CALS does not presently make this
distinction, and there are no agreed number-to-style relationships).
When the presence of a vertical border line is indicated, its position is
assumed to be to the right of the column or entry concerned. When a
horizontal border line is indicated, its position is assumed to be below
the row or entry concerned.
It may seem that a conflict arises between the setting of the column
separator of the last column, and the use of the Frame attribute described
previously - both would seem to specify a vertical line to the right of the
table. This conflict is avoided by ignoring any settings for the last
column. The same principle is applied to the last row separator. In both
cases, the Frame attribute must be utilized to gain the desired effect, and
the default setting of 'all' places lines in these positions.
The elements that contain one or both of these attributes are Table, Table
Group, Column Specification, Span Specification, Row, Entry Table and
Entry. The principle adopted is that large objects indicate the majority
case, and small objects override these settings. For example, the Table
element may dictate that all rows have lines, but the Row element
representing row six may specify that it has no line, so every row except
row six will be followed by a horizontal line. This approach encourages
efficient use of attributes. The document author is expected to generalize
at each level of the table hierarchy, so as to avoid unnecessary work at
the lower levels.
Border lines do not dissect straddled entries. This is illustrated in
various examples within this chapter.
The position of text or other objects within the boundary of an entry can
be set in various ways, both horizontally and vertically.
When the area of an entry is wider or higher than the text it encloses, the
text by default appears in the top-left corner of the entry. However, the
text may be re-aligned both horizontally and vertically. As with border
lines and straddling entries, control of text alignment is provided by
attributes, and managed by the judicious use of these attributes within the
hierarchy of elements.
Several elements contain an Align attribute, which takes one of the
following values: 'left', 'right', 'center', 'justify' or 'char'. This
attribute controls horizontal alignment of text within an Entry element,
and the first three settings listed above have an obvious effect. Note the
American spelling of 'center' - some composing software will not recognize
and apply this alignment if the attribute declaration uses the British
spelling of 'centre'.
The 'justify' option aligns text both left and right, using extra
inter-word spaces where necessary to achieve this effect. The 'char' option
aligns text on a nominated character.
When the 'char' option is selected, further information is required. Both
the character to be used for alignment, and the position of this character
within the entry are needed. The significant character is specified in the
Char attribute (typically, its value would be a full-point, '.', for
alignment of decimal numbers in columns of figures). By default, this
character will appear horizontally centered within the entry, but if this
position is inconvenient, due to a combination of lack of space and
unbalanced text, it can be changed using the Character Offset attribute,
Charoff. The Charoff attribute value must be a numeric token that
represents a percentage offset from the left edge of the entry. A value of
'25', for example, would place the left edge of the significant character a
quarter of the width of the column from the left edge of the cell.
In the following example, the first column is aligned on full-point, with a
character offset of '30' percent, and the second column is aligned on
comma, with a character offset of '60' percent:
Elements that contain the attributes described above are Table Group,
Column Specification, Span Specification and Entry. In all cases, the
default value is 'left'.
Vertical alignment is defined using the Vertical Alignment element
(Valign), which is allowed in the Table Head element, the Table Body
element, the Table Footer element, and the Row and Entry elements. The
Vertical Alignment attribute may take the following values: 'top', 'bottom'
and 'middle'. The default value is 'top'. As before, the Row and Entry
elements may override higher level settings.
Note the Vertical Alignment option of 'middle' rather than 'center'. This
is an example of the SGML restriction on name lists in attribute
declarations. As explained previously, two attributes cannot share a name
token value. In order to abbreviate the start-tag, as in the following
example, there must be no ambiguity as to which attribute a value belongs.
In this case, it is clear that Vertical Alignment is being defined:
<entry middle>
The entry element may contain further element structures, or be replaced by
an embedded table.
Previous examples demonstrated that an Entry element may contain normal
characters, but tables may be more complex than this. The CALS definition
does not restrict the content of an Entry element to simple text, the DTD
may define any required subelement structures. It may, for example, be
necessary to allow multiple Paragraphs:
The Entry element content may be re-defined for specific purposes (no other
element in a CALS table should be re-defined, because this would confuse
software developed to deal with the standard).
Some applications require the further ability to contain tables within
tables. A cell may need to contain a complete embedded table. The CALS
standard does not allow a Table element to be contained within an Entry
belonging to another Table, but instead defines a new element called
Entrytbl, which is used in place of the Entry element.
The Entry Table element has most of the same attributes as the Entry
element, but also has an additional Columns attribute, Cols, which works in
the same way as the Columns attribute in the Table element.
An Entry Table is similar to a Table, in that it may contain Columns
Specification, Span Specifications, Table Header and Table Body elements
(though not footers):
<row><entry>Red</><entry>DANGER</><entry>1</>
An Entry Table cannot be used in header and footer segments.
Although CALS provides this facility, extreme caution is advised as many
applications do not support this feature. It is also possible to produce
the same effect artificially (without using Entry Table elements), by
defining the extra columns and rows required, and using the spanning
feature to hide these columns and rows in surrounding entries. Although
cumbersome, this method is guaranteed to be compatible with all CALS-aware
applications (and was actually used to produce the table shown above, which
therefore has four columns and four rows).
Glossary .........241
Index .........307
Chapter 10. CALS Tables
When to use CALS tables
<item><code>XYZ-15<price>987</item>
<item><code>XYZ-22<price>765</item>
</prices>
The Table Structure
<tgroup cols="4">
<colspec ...>
<spanspec ...>
......
.....
</tgroup>
</table>
Column widths
<colspec colnum="2" colwidth="1.5 CM">
<colspec colnum="3" colwidth="4 CM">
<colspec colnum="4" colwidth="1 CM">
If none of these are specified, 'PT' acts as the default, so a value of
'12' is the same as '12 PT' (though some applications may insist on a
measurement notation). Case is not significant, so 'pt' is the same as
'PT'.
Table segments
<row>
<entry>Red</entry>
<entry>Urgent</entry>
</row>
...
...
</tbody>
Straddled entries
<colspec colnum="2" colname="priority">
<colspec colnum="2" colname="priority">
.......
<row>
<entry>Red<entry>Urgent
</row>
<row>
<entry namest="color"nameend="priority">
Blue (with no priority indicator)
</row>
<colspec colnum="2" colname="priority">
<spanspec namest="color" nameend="priority"
spanname="myspan">
.......
<row>
<entry spanname="myspan">
Blue (with no priority)</entry>
</row>
<row>
<entry>Blue</entry>
<entry >morerows="1">no priority indicator for
these colors</entry>
<entry>2</entry>
</row>
<row>
<entry>brown</entry><!--NO ENTRY--><entry>3</entry>
</row>
Border lines
Border lines may surround the table, or separate specified columns, rows or
entries.
Text alignment
Entry content and Inner Tables
<row>
<entry>Amber</>
<entrytbl cols="2">
<tbody>
<row><entry>flashing</>
<entry>URGENT</></row>
<row><entry>steady</>
<entry>IMPORTANT</></row>
</tbody>
</entrytbl>
<entry>2</entry>
</row>
<row><entry>Green</><entry>NORMAL</><entry>3</>