TABLE INTEROPERABILITY: Issues for the CALS Table Model

SGML Open Technical Research Paper 9501:1995

Eric Severson, Interleaf
Co-chair, Table Interchange Subcommittee
SGML Open
Harvey Bingham, Interleaf
Co-chair, Table Interchange Subcommittee
SGML Open
1995 November 21

© 1995 SGML Open

Permission to reproduce parts or all of this information in any form is granted to SGML Open members provided that this information by itself is not sold for profit and that SGML Open is credited as the author of this information.

Abstract

To help address the existing interoperability issues when using tabular material ("tables") in SGML implementations, SGML Open's Technical Committee formed a Table Interchange subcommittee to research these issues.

Because the CALS table model has proliferated widely, it was chosen as the initial starting point. Although it has evolved to the point of a de facto standard, the specification leaves a large number of semantics open to interpretation which in turn has made interoperability difficult to achieve. As its first major task, the Committee therefore set out to identify and document ambiguities in the CALS table model specifications, identify and document related interoperability issues between SGML Open vendor products, and lay the groundwork for developing a proposed clarification of the standard that will minimize ambiguity and maximize interoperability.

This paper summarizes the results of this initial work, identifies the sources of current interoperability issues for the CALS model, and summarizes the most common set of practices currently followed by SGML Open vendors.

Technical Research Paper 9501:1995
Committee Draft: 1995 May 10
Committee Draft: 1995 August 5
Final Draft Technical Research Paper: 1995 September 15
Final Technical Research Paper: 1995 November 21

Background

For the last several years, SGML users have been pointing out that there are major interoperability issues when using tabular material ("tables") in SGML implementations. First, SGML itself does not prescribe any standard way of encoding tables, leaving that to individual applications which may have taken different, possibly incompatible approaches. Furthermore, even when an application standard has been defined (such as the U.S. Department of Defense CALS model), different vendors' products may handle the same table in different ways.

Recognizing the importance of this issue, SGML Open's Technical Committee formed a Table Interchange subcommittee in 1994 to research current interoperability issues and recommend changes that would resolve the bulk of these problems. The Committee's mission has involved two fundamental goals:

Because the CALS table model in particular has proliferated widely, appearing in a large number of other applications, it was chosen as the initial starting point. Although it has evolved to the point of a de facto standard, the CALS table model's design was never actually completed. The specification leaves a large number of semantics open to interpretation which in turn has made interoperability difficult to achieve.

As its first major task, the Committee therefore set out to identify and document ambiguities in the CALS table model specifications, identify and document related interoperability issues between SGML Open vendor products, and lay the groundwork for developing a proposed clarification of the standard that will minimize ambiguity and maximize interoperability.

This paper summarizes the results of this initial work, identifies the sources of current interoperability issues for the CALS model, and summarizes the most common set of practices currently followed by SGML Open vendors.

A brief introduction to the CALS table model

The present "baseline" CALS table model was officially released on 26 June 1993 as part of the U.S. Department of Defense SGML standard MIL-M-28001B. First released in 1990 as part of the previous MIL-M-28001A specification, it has now been adopted, sometimes with small modifications, in a large variety of non-military industry applications. These include commercial aerospace (ATA and AECMA), computer documentation (DocBook), automotive (J2008), semiconductors (Pinnacles), telecommunications, and many site-specific uses. Even within the Department of Defense, several versions have evolved (e.g., 38784C vs. 38784D vs. Navy MID, etc.).

Designed to handle a variety of complex tables in military technical documents, the CALS table model focuses on encoding two-dimensional row and column geometry with basic formatting features such as cell alignment, borders, and rotation. It anticipates complex cell content, such as multiple paragraphs, lists, and graphics, and even provides for one level of nested tables within cells. Other than providing a facility for logically naming columns (e.g., "state" and "population" rather than simply "column 1" and "column 2"), it does not address semantic encoding of the content. Structurally, the CALS table model presumes that tables are made up of an optional title plus one or more TGROUPs, each of which has its own body (TBODY) and its own independent set of optional column headings (THEAD) and footings (TFOOT). The column definitions for each TGROUP are provided by COLSPEC elements, which can be inferred if desired, and special COLSPECs can be provided for THEAD and TFOOT.

Each THEAD, TBODY, and TFOOT section within a TGROUP is made up of a series of ROWs. ROWs are formed as a left-to-right sequence of cells (ENTRYs), which can contain mixtures of text, graphics, and more complex structural objects such as list items. An ENTRY may span more than one column (horizontal spanning) and more than one row (vertical spanning), depending on how its attributes are set. As a shortcut for defining horizontal spans, optional SPANSPEC elements can be added at the TGROUP level, each referring to spans across column names defined in COLSPECs.

A table cell can contain either a simple ENTRY or an ENTRYTBL, which is essentially a table-within-a-table. ENTRYTBLs cannot contain TGROUPs (they are implicitly a TGROUP themselves), and cells within an ENTRYTBL cannot contain other ENTRYTBLs. Also, although ENTRYTBLs may span more than one column, they are constrained to one row.

Cell formatting attributes are limited to text alignment (both horizontal and vertical), cell borders (rulings), and rotation. For all but the last of these there is a complex inheritance scheme that allows values to be defaulted and overridden at multiple levels.

The dimensions of tables interoperability

Interoperability potentially covers anything that would surprise a user when moving tables between different SGML systems. In essence, it has to do with answering the key question of the frustrated user: "If I spent time and energy getting this table right on one SGML system, why doesn't it automatically work on another?"

However, interoperability is not as straightforward as one might think. In fact, it is to some extent in the eyes of the beholder. Interoperability can be defined (and is defined by users) in at least two different ways:

In the first, data-centric view, detailed differences in format are tolerated when rendering across different systems. However, it is essential that original SPANSPEC definitions, for example, are faithfully carried through from input to output. It is also essential to maintain and allow edits to data (e.g., TFOOT) that is not supported for rendering purposes. Put another way, users are not surprised if their tables don't always look the same, but are upset if any of the detailed tags or attributes change when passed through multiple systems.

In the second, appearance-centric view, detailed differences in SGML syntax are tolerated when interchanging tables between different systems. It doesn't matter, for example, that original SPANSPECs are transformed into individual spanned cells, with a new set of SPANSPECs generated upon export. What is essential is that the table continues to look the same. Put another way, users are not surprised if detailed tags or attributes change when passed through multiple systems, but are upset if their tables don't always look the same.

Ideally we would achieve both kinds of interoperability simultaneously. However, in practice part of the interoperability problem comes from the difference in these two viewpoints, and from not understanding these differences. As vendors we must have a very clear definition that can be readily understood by users. But users must also understand the limits of any definition, realizing that interoperability is partly implementation and partly education.

Interoperability must also be carefully defined at a specific level. For example, one user might conclude format has been preserved if the all cells maintain the same left, right, top, and bottom borders. Another might disagree because line widths are different between two systems. This becomes especially tricky when the difference between page-oriented and pageless systems is considered. "Looking the same" is not an intuitively obvious judgment.

Of course, the keys to interoperability must still be addressed through specific features and functional behavior of the products themselves. Thus another cut at interoperability, one which we have adopted here, breaks the issues down into three implementation-oriented categories:

In a perfect world, competent vendors acting in good faith would avoid any such differences between their products. However, Einstein's "God is in the details" applies here with a vengeance. Two products may easily seem to support the same feature set in the same manner; for example, each might support the use of COLSPECs and allow the number of COLSPECs to differ from the defined number of COLS. But upon deeper inquiry, it can turn out that one assumes the actual number of columns is defined by COLS, inferring COLSPECs if needed and ignoring any excess COLSPECs, while the other uses the number of COLSPECs to determine the actual number of columns, resetting COLS to match. Thus, even though things seem to match up at a high level, in fact a subtle but potentially quite serious interoperability problem exists.

Finally, in analyzing interoperability it is important to understand what the underlying model was actually intended to do. The CALS table model, for example, was purposely designed to deal only with the structural aspects of tabular information, together with basic presentation choices (e.g., rulings around cells, alignment, etc.). It was not meant to dictate precise rules for typesetting, composition or screen display, or the complex interactions between table and page layout. Nor does it specify how to handle errors. If we try to measure interoperability at a higher level of precision, we have already gone beyond the scope of the CALS table model itself.

Our definition of interoperability

We have chosen to define interoperability in terms of the ability to support an agreed-upon exchange feature set using an agreed-upon, unambiguous set of semantics. We assume that standards for identifying and processing error conditions must be included as part of the agreed-upon semantics.

We believe this provides a pragmatic working definition that vendors and users can both understand objectively. While not absolutely comprehensive, we think it will be sufficient to prevent any significant differences in the look and behavior of CALS tables when moving between SGML Open vendor products. Of course, because of the complex nature of the interoperability problem, it is also important to note that some differences will inevitably continue to exist between individual products. Success must be measured in terms of removing all significant issues, not in achieving absolute perfection.

Methodology for the study

After a brief review of the CALS standard and current vendor products, the Committee recognized that the "interoperability issue" was in fact made up of a large number of subtle ambiguities and small differences in interpretation. Taken individually, each problem was minor and could easily escape notice. When added together, however, they created an overall issue with major implications.

Because the essence of the problem was lurking in the details, the Committee felt it was imperative to take a very thorough, rigorous approach. Therefore we began by combing through the existing CALS table model and related tag/attribute definitions, attempting to identify every possible area of potential ambiguity or misunderstanding. From this baseline, we then created a highly detailed vendor questionnaire, consisting of over one hundred questions designed to pinpoint all possible areas of difference between products. These were in turn broken down into eight major categories:

Individual questions focused on both differences in the set of supported features across vendor products, and in the way each vendor had interpreted the semantic details. The "general" questions were designed to elicit comments that might shed additional light on the issues from slightly different angles. Furthermore, to minimize the possibility of misunderstanding, we encouraged participants to attach comments of unlimited length to each response. A complete list of survey questions is contained in Appendix A.

All SGML Open vendors were invited to participate in filling out the questionnaire, with the goal of obtaining a large enough representative sample on which to base solid conclusions. In addition, we solicited input from other past and current members of the CALS technical committee that architected and now maintains the CALS table model. After this process was complete, we had obtained a wealth of information, including completed questionnaires for seven SGML Open vendors that provide authoring, publishing, and electronic viewing products that support CALS SGML tables. While a few vendors elected not to participate for various reasons, a vast majority of the largest and most experienced SGML vendors were included, and virtually all SGML Open vendors offering related products were members of our committee. Therefore we felt confident that our sample of seven vendors was a sufficient base for analysis.

After a few iterations in which vendors were allowed to obtain clarifications and refine their answers, results were formally tabulated in matrices and cross-analyzed to extract the key issues. See the detailed survey results in Appendix B. In the initial extraction process, we identified "significant" issues on the following basis:

A summary of the most commonly supported features was then constructed by excluding those features that had been identified as significant issues.

Similarly, a list of key ambiguities/differences was drawn up consisting of all specific semantics that had been identified as significant issues. A summary of most commonly supported semantic interpretations was formulated using the interpretation shared by the largest number of vendors in our sample.

Summary of findings

Vendors agreed on the great bulk of table features and semantic interpretations included in the survey. As expected, a number of detailed differences surfaced between the products surveyed. However, most of these were relatively subtle. Following our working definition of tables interoperability, we separated these into two fundamental categories, summarized below:

A summary of these results in matrix form can be found in Appendix B.

Unsupported Features

Our analysis shows the following features are generally unsupported:

Differences in Interpretation

Our analysis shows there are differences in interpretation in the following areas:

Summary of most common practices

As a means of laying the groundwork for establishing table interoperability between vendor products, this section summarizes the practices and commonly supported interpretations by SGML Open vendors. These include:

Summary of most commonly supported features

The following represents the set of features most commonly supported by SGML Open vendors at the time of our survey. Please note that not all vendors support each of these features in every case (see the detailed survey results in Appendix B.)

Backbone Table Structure/Format
Row/Column Structure
Cell Formatting
Cell Content

Features not commonly supported

The following features were not found to be commonly supported at the time of our survey.

Backbone Table Structure/Format
Row/Column Structure
Cell Formatting
Cell Content

Summary of most common semantic interpretations

The following represents a consolidated list of semantic interpretations most commonly followed by SGML Open vendors. Please note that not all vendor products currently implement all of these interpretations in each case. (See the detailed survey results in Appendix B.)

Backbone Table Structure/Format
Row/Column Structure
Cell Formatting
Cell Content

Next steps

As a result of this study the Committee plans to propose an SGML Technical Resolution that will provide a common definition of tables interoperability using the CALS model. We are also sharing our recommendations with the CALS Electronic Publishing Committee (EPC) as input to improving the CALS table model and its documented semantics. When this phase of the Committee's work is complete, we will move on to the second goal in our mission statement: suggesting a standard framework and set of approaches for the next generation of SGML table markup. This work will explore where the current CALS table model falls short, going beyond format and layout issues to a model which captures the author's intent for underlying table data in an unambiguous and interchangeable way. We expect this may include a set of standard approaches and DTD fragments for different purposes.


Appendix A: Survey questions

The following questions were submitted to all SGML Open Member companies in order to find commonality and differences in the implementations of the CALS Table Model. Detailed answers identified more questions. Several rounds of clarification and augmentation of these questions occurred.

The results for the questions that bear on interoperability are grouped in the survey results in a slightly different order and are paraphrased along with the percentage of vendors indicating lack or difference in support.

Product questions
Detailed questions: backbone table structure / format issues
Detailed questions: row / column structure issues
Column Widths
Rotation and Alignment
Cell Borders
Inheritance
Detailed questions: cell content issues
General questions

Appendix B: Detailed analysis of survey results

The survey questions and responses from seven vendors from Appendix A went through many iterations. The final results from the vendors in February 1995 are combined in two sets of tables.

From these two sets of tables, the significant issues were identified. The questions that lead to quantitative comparisons are paraphrased in the first column of the tables. the percentage scores in the second (larger scores are worse), and the (issue/non-issue) status in the third.

Unsupported features

The original survey distinguished among:

The Scores below show the percent of vendors that do not provide full support. "ISSUE" status is assumed if more than 1/3 of vendors fail to fully support (score > 33%)

Backbone table structure / format

Backbone table structure / formatScoreStatus
Multiple TGROUPs (redefinition of columns) 29%
Separate THEAD section0%
Separate TFOOT section43%ISSUE
ORIENT attribute (portrait vs. landscape) 57%ISSUE
PGWIDE attribute (full page vs. column) 71%ISSUE
TABSTYLE attribute (named styles) 86% ISSUE
TGROUPSTYLE attribute (named group styles) 86%ISSUE
FRAME attribute (outer borders) 14%

Row / column structure

Row / column structure ScoreStatus
Horizontal spans using SPANSPEC 0%
Horizontal spans on cells using NAMEST/END 43%ISSUE
Vertical spans using MOREROWS 14%
Preservation of SPANSPEC import to export 57%ISSUE
Preservation of COLNAMEs import to export 57%ISSUE
SPANSPECs allowed in THEAD and TFOOT 0%
Separate COLSPECs at the THEAD level 57%ISSUE
Separate COLSPECs at the TFOOT level 71%ISSUE
Different number of columns in THEAD or TFOOT86%ISSUE

Cell formatting - column widths

Cell formatting - column widthsScoreStatus
Fixed COLWIDTHs14%
Decimal values for fixed COLWIDTHs 29%
Proportional COLWIDTHs0%
Decimal values for proportional COLWIDTHs 57%ISSUE
Mixed COLWIDTHs (proportional/fixed in one column)57%ISSUE

Cell formatting - rotation and alignment

Cell formatting - rotation and alignmentScoreStatus
ROTATE attribute (rotation of individual cells)100%ISSUE
ALIGN attribute - "left/right/center" values14%
ALIGN attribute - "justify" value29%
ALIGN attribute - "char" value (with CHAR attr)29%
CHAROFF attribute57%ISSUE
VALIGN attribute - "top/middle/bottom" values29%

Cell formatting - cell borders

Cell formatting - cell borders Score Status
COLSEP / ROWSEP attributes14%

Cell formatting - inheritance

Cell formatting - inheritance ScoreStatus
Preservation of inheritance import to export57%ISSUE
Inheritance of cell alignment from SPANSPEC 86%ISSUE
Inheritance of cell borders from SPANSPEC 86%ISSUE

Cell content

Cell contentScore Status
Graphics within table ENTRYs 0%
Mixture of text and graphics in table cells29%
Other structural objects within table cells14%
ENTRYTBL (tables within tables) 57% ISSUE
Multiple TGROUPs within ENTRYTBL 71% ISSUE

Differences in interpretation

The original survey results distinguish among implementations where semantic descriptions were inadequate:

"ISSUE" status is assumed if ANY vendor interprets things differently (score > 0%)

Backbone table structure / format

Backbone table structure / formatScoreStatus
ORIENT "landscape" as 90 degrees counter0%
Default for ORIENT as "no relative rotation"0%
PGWIDE "yes" as full page, "no" as column0%
Default for PGWIDE as "full page" 0%
TABSTYLE as style name for entire table 0%
TGROUPSTYLE as style names for TGROUPs 0%
FRAME as override for outer borders 14%ISSUE
Default for FRAME as "all borders on"14%ISSUE

Row / column structure

Row / column structure ScoreStatus
Number of cols determined absolutely by COLS43%ISSUE
Create "1*" cols if less COLSPECs than COLS43%ISSUE
Ignore excess if more COLSPECs than COLS57%ISSUE
COLSPECs allowed to be non-sequential 57%ISSUE
Fit unnumbered COLSPECs next in sequence 57%ISSUE
ENTRYs allowed to be non-sequential 29%ISSUE
Fit unnumbered ENTRYs next in sequence 43%ISSUE
SPANSPECs in head/foot refer to local COLSPECs43%ISSUE
COLSPEC and SPANSPEC names allowed to overlap29%ISSUE
Precedence COLNAME->NAMEST/END->SPANSPEC 100%ISSUE
Error if "covered" ENTRY present with content14% ISSUE

Cell formatting - column widths

Cell formatting - column widthsScoreStatus
Allowed fixed units exactly: IN, CM, MM, PI, PT100%ISSUE
Default fixed unit as "PT" 86%ISSUE
Default for COLWIDTH as unit proportional (1*)14%ISSUE

Cell formatting - rotation and alignment

Cell formatting - rotation and alignmentScoreStatus
Default for ALIGN as "left" 29%ISSUE
CHAR can be any ASCII character, but no SDATA0%
CHAR can be a single character only 0%
Align on leftmost occurrence of CHAR 0%
Align to left side of char bounding box14%ISSUE

Cell formatting - cell borders

Cell formatting - cell bordersScoreStatus
COLSEP as right border / ROWSEP as bottom 0%
COLSEP / ROWSEP "yes" as single light ruling0%
Default for COLSEP / ROWSEP as "yes" 14%ISSUE
FRAME attribute determines left/top border0%
FRAME attribute overrides right/bottom border14%ISSUE

Cell formatting - inheritance

Cell formatting - inheritance ScoreStatus
ALIGN precedence as entry<span<col<tgrp 57%ISSUE
VALIGN precedence as entry<row<thead/body/foot 43%ISSUE
ROWSEP precedence as entry<row<span<col<tgrp<tbl 86%ISSUE
COLSEP precedence as entry<span<col<tgrp<tbl 71%ISSUE
Local COLSPECs override for thead / tfoot71%ISSUE

Cell content

Cell contentScore Status
Graphics that don't fit resized not clipped14%ISSUE
ENTRYTBL format attrs apply only within 0%
ENTRYTBLs that don't fit resized not clipped14%ISSUE
Inheritance rules for ENTRYTBL same as table0%