[This local archive copy mirrored from the canonical site: http://www.gca.org/conf/sgml97/det_swan.htm; links may not have complete integrity, so use the canonical document at this URL if possible.]
[Abstract: Many companies are required to deliver documentation to customers electronically. As a significant step in solving Electronic Document Delivery (EDD) issues, the telecommunications industry has developed an interchange DTD and a packaging guideline that provide a common "language" for expressing document content and logical structure. Documents created on any system may be translated to this "language" by document producers, and from this "language" to any display or production system by document recipients. Although the interchange DTD and packaging guideline were designed by telecommunications industry, they are general enough to be directly used or slightly modified to meet EDD requirements in other industries as well.]
Introduction
Why are Electronic Document Delivery (EDD) standards needed? Many companies are required to deliver their documentation to customers electronically. Traditionally this is done by prepackaging the documents with a browser on a CD-ROM or delivering a bunch of HyperText Markup Language (HTML) files. But what if the customer wanted the "source" documentation instead so that they could re-use the information, incorporate it into their document management system, or choose their own browser(s) or other method of output? If a document producer and a document recipient both use the same applications and file formats for documents, such as a particular version of MS Word, FrameMaker, or HTML, they should feel free to exchange documents in those shared formats, but such a match will occur only occasionally -- in fact, it is rare that everyone in the same company uses the same applications.
Document producers also have many customers with different electronic documentation requirements. As a result, they must produce and manage different production processes for different outputs to different customers. Document recipients, on the other hand, receive different pre-packaged browsers and formats from their suppliers, requiring end-users to learn multiple interfaces to documentation and making management of all these formats cumbersome.
As the first step in solving these document interchange issues, in 1990 the Telecommunications Industry Forum (TCIF) adopted SGML as the standard for exchanging technical documentation because:
In 1996, after six years of document and requirements analysis, DTD development, and testing, TCIF released two important EDD guidlelines:
TIM is an SGML interchange application that provides a common "language" for expressing document content and logical structure. Documents created on any system may be translated to this "language" by document producers, and from this "language" to any display or production system by document recipients. TEDD is a guideline for packaging document files in a standard way for delivery.
Although TIM and TEDD were designed by TCIF to capture the structure found in the kinds of technical documents that are interchanged among telecom companies, we feel that TIM and TEDD are general enough to be directly used or slightly modified to meet EDD requirements in other industries, as well.
Procedural vs. Structural Markup for Interchange
When documents are created with a word-processing or desktop-publishing program, such as Word or FrameMaker, special codes are inserted into the document that provide information about how the text is to be formatted (font, type size, line justification, etc.). This markup describes the way the document should look when printed or displayed on a screen, but does not provide any explicit information about the document itself (What type of document is it? What kinds of information does it contain?).
Procedural (appearance-oriented) markup is fine when the only product is paper. Readers can usually infer most of what they need to know about a document by the way it looks. We know, for example, that a line of bold text that begins with the word "Chapter" is a chapter title, and we can infer that a new chapter has begun. We also know that a distinct block of text is generally a paragraph; and that italicized text represents some form of emphasis. We recognize these structures easily without being told what they are because we've learned the conventions that relate a document's appearance to its structure.
Procedural markup does not work well, however, in electronic publishing, because documents generally have to be reformatted to be read on-screen, and computers are exceptionally poor at discerning the structure they must preserve from the procedural markup they must discard. A computer must be told explicitly what each structure element is.
Suppose the computer did simply reproduce a document's appearance without understanding its structure, leaving the interpretation to the reader as is done in paper or What-You-See-Is-What-You-Get (WYSIWYG) publishing. Besides reducing readability, that removes many of the offsetting advantages electronic publishing has over traditional paper publishing. Why, for instance, should the reader settle for "see page 5-17 for more"? A computer that could recognize a cross-reference in the text could turn it into a hyperlink: "click here for more."
Electronic documents in any form have advantages over paper in speed and ease of delivery. Going beyond those, one of the best reasons for electronic publishing is that it lets the producer reuse and repackage information with little or no added cost. A single electronic document can be used to print a paper copy, create a colorful hypertext CD-ROM product, and load information into a database for on-line search and retrieval using low-cost terminals. And it can be rendered in large type or through text-to-speech devices for the sight-impaired. This is possible because the computer can apply different formatting rules to the same document depending on the output medium, as long as it understands what kinds of information it is dealing with. For these things to happen, the structure of the document must be explicitly defined and understood by the computer.
A further advantage to structural markup is its vendor- and platform-independence. It is easy to write formatting rules like "make all chapter titles 18-point Helvetica bold, and have them start a new page." And it is easy to translate such rules into the procedural markup of any word processor, desktop-publishing system, or web browser.
SGML is used to define the structural elements that make up a document (such as chapters and paragraphs); the relationships that exist between those structural elements (e.g., a chapter contains one or more paragraphs); and the attributes that each structural element can possess (for instance, chapters and paragraphs can be numbered or unnumbered). All value-added information contained within an SGML file relates to the structure of the document. An SGML file does not specify anything about the appearance (or output medium) of the document. Therefore, a single SGML file can be rendered in a variety of ways according to formatting rules appropriate for a given output system and application.
The structure of any particular SGML document is defined explicitly in a Document Type Definition (DTD). A DTD is a series of declarations that define the structural elements of the document, their content models (what elements they can contain, in what order), and the attributes that can be assigned to the element. In practice, the DTD is a set of rules for describing a structured document.
Authoring DTDs vs. Interchange DTDs
Many DTDs that exist today can be classified as "authoring DTDs." They were specifically developed for authoring environments and provide writers with a clearly defined structure and a set of meaningful semantic elements for creating a particular document type in an SGML editor, such as ADEPT*Publisher or FrameMaker+SGML.
A rigid structure in an authoring DTD ensures consistency and enforces common standards for all documents written under it. For example, a DTD can require that documents have a title, document number, revision, copyright, disclaimer, and table of contents in the front matter. As a quality check, the writer can, at any time, use an SGML parser to validate their document against the DTD to ensure that it conforms to the correct structure and includes all the required elements.
An authoring DTD can also enforce good writing practices, such as the requirement that at least two items exist in a list, as well as enforce industry standards, such as the requirement that danger, caution, or warning admonishments precede the action statement in a step.
Meaningful semantic elements in an authoring DTD help writers identify and relate the type of information that should be included in a particular element. For example, the writer can clearly understand that a part number is typed within <partnum> tags. Elements that contain other elements help writers see how information objects relate to each other. For example, an action statement, command to be entered, and resulting printout are contained within a step element.
Semantic elements also allow writing groups to easily find and identify reusable information objects that can used across multiple documents. For example, document management systems or other tools that manage file entities can be setup so that writers can re-use common information elements, such as procedures or steps, across multiple documents. This improves the quality and consistency of documents and saves writers time by reducing the need to type redundant information. In general, semantic elements improve the quality of information that is written within them.
An interchange DTD, on the other hand, is simply used to pass information to a document recipient so that they can process it for output or store it in a document management system without human intervention. An interchange DTD does not impose "writing rules." This is assumed in the document producer's authoring environment and can be validated with quality checks and automated tools. Since documents marked up in an interchange DTD should be considered released, the focus of the DTD should be to handle any type of information that can exist in a production environment, regardless of the application used to author the document.
TCIF developed the TIM DTD specifically to meet the common needs of companies who deliver documentation for their products and the companies who receive that documentation. TIM is a pure interchange DTD and is not intended to be used for authoring: Most originators will prefer more highly structured DTDs or semantic DTDs that constrain and therefore simplify creation of documents fitting their particular document designs and business processes. The TIM DTD identifies only the generalized structural components that occur in technical documents, yet it allows originators of the documents to pass on all useful meta-information about their specialized use of those components (structural and semantic labels, languages used, revision status, cross-references, keywords, etc.).
Implementing TIM and TEDD
TIM and TEDD are blueprints for electronically interchanging documentation. They enable companies who produce and deliver documentation to build a single set of production processes and tools that convert information from their internal formats, such as FrameMaker, MS Word, or internal authoring DTDs, to an industry-wide accepted interchange format for all customers. Document recipients receive consistent predictable information products from all of their suppliers. Whether a document recipient uses TIM files directly or converts them to something else, TIM will guarantee that technical documents from each supplier will always be readily understood and processed by electronic publishing systems.
For document producers, the ease of document conversion from their internal format to TIM depends on the consistency of their internal data. Ideally, document producers will author documents in SGML in a rigidly structured authoring DTD, making the conversion to TIM much easier.
Some of the benefits of using TIM as an industry-wide interchange format are:
1. Lower publishing costs. A document producer who is distributing electronic documents to several customers will have to distribute only one file format, thereby eliminating the need for extra conversion and cleanup steps.
2. Multiple output formats. A single TIM file can be output in a variety of formats (paper, hypertext CD-ROM, etc.) without additional coding and with minimal conversion costs, making it easy for companies to produce low-cost custom products.
3. Reuse of information. Because the value-added information within a TIM document relates to its content and not its appearance, TIM documents can be stored as a flexible database of information that can be repackaged and reused in a variety of ways. For example: TIM documents (or portions of them) can be embedded into other corporate documentation; portions of several TIM documents can be combined to create new documents; and TIM documents can be linked so that the information in several related documents is updated or revised simultaneously.
Renee Swank
2200 North Lamar Street, Suite 230
Dallas, Texas 75202
renee@isogen.com
Renee Swank is an Applications Engineer for Isogen International Corp. Before joining Isogen in 1996, Renee spent six and a half years as an SGML Specialist at Ericsson where she was involved in a corporate-wide effort to migrate all documentation to SGML. She has also been an active participant on the TCIF/IPI Committee for 5 years and has served as Vice Chair and Secretary.
Don Pratt
8 Corporate Place, PYA-3J128
Piscataway, NJ 08854
dfp@ims.bellcore.com
Don Pratt is an internal consultant at Bellcore in electronic documentation and the leader of the technical team of the the Information Products Interchange Committee of the Telecommunications Industry Forum. He has a Ph.D. in human- factors psychology, 8 years' experience as a technical writer and technical-writing teacher, and 13 years' experience with desktop, corporate, and electronic publishing. He is the principal author of TCIF's guidelines on electronic document interchange: the TIM, TEDD, and graphics guidelines and the TCIF writers' guide.
Return to the SGML/XML '97 index page | Register for SGML/XML '97
Copyright © 1997, Graphic Communications Association
100 Daingerfield Road
Alexandria, VA 22314-2888
Ph: +1 703-519-8160
Fax: +1 703-548-2867