Cover Pages: IBM's Darwin Architecture Supports Enhancements for Domain Specialization, Content Reuse, and Linking Logic.

Communiqués from Don Day and Michael Priestley of IBM describe new features in the 2002-05 update of IBM's XML-Based Darwin Information Typing Architecture (DITA). The DITA XML-based architecture "provides a way for documentation authors and architects to create collections of typed topics that can be easily assembled into various delivery contexts. Topic specialization is the process by which authors and architects can define topic types, while maintaining compatibility with existing style sheets, transforms, and processes. The new topic types are defined as an extension, or delta, relative to an existing topic type, thereby reducing the work necessary to define and maintain the new type." Improving upon the original release of March 2001, DITA v1.0 features "a logical extension of specialization that has now been incorporated into DITA: the ability to extend existing content markup to represent domains of specialized markup that are common across particular sets of typed topics (hardware vs. software, for example)." The DITA design has a unified content reuse mechanism which enables one to combine several topics into a single document: "an element can replace itself with the content of a like element elsewhere, either in the current topic or in a separate topic that shares the same content models. The distinction between reusable content and reusing content, which is enshrined in the file entity scheme, disappears: any element with an ID, in any DITA topic, is reusable by 'conref' transclusion. The linking logic is also now supports type checking and takes advantage of the short description element to provide progressive disclosure."

The DITA designers report that they are "trying to take full advantage of the semantic awareness built into the specialization model, while at the same time making that model more flexible and extensible. [This yields] more flexibility at design time, and more rigorous validation at authoring and build time."

From the new 'Domain Specialization' documentation:

The Darwin Information Typing Architecture (DITA) is an XML architecture for extensible technical information. A domain extends DITA with a set of elements whose names and content models are unique to an organization or field of knowledge. Architects and authors can combine elements from any number of domains, leading to great flexibility and precision in capturing the semantics and structure of their information.

In DITA, the topic is the basic unit of processable content. The topic provides the title, metadata, and structure for the content. Some topic types provide very simple content structures. For example, the concept topic has a single concept body for all of the concept content. By contrast, a task topic articulates a structure that distinguishes pieces of the task content, such as the prerequisites, steps, and results.

In most cases, these topic structures contain content elements that are not specific to the topic type. For example, both the concept body and the task prerequisites permit common block elements such as p paragraphs and ul unordered lists.

Domain specialization lets you define new types of content elements independently of topic type. That is, you can derive new phrase or block elements from the existing phrase and block elements. You can use a specialized content element within any topic structure where its base element is allowed. For instance, because a p paragraph can appear within a concept body or task prerequisite, a specialized paragraph could appear there, too.

[Summary:] Through topic specialization and domains, DITA provides the following benefits: (1) Simpler topic design: The document designer can focus on the structure of the topic without having to foresee every variety of content used within the structure. (2) Simpler topic hierarchies: The document designer can add new types of content without having to add new types of topics. (3) Extensible content for existing topics: The document designer can reuse existing types of topics with new types of content. (4) Semantic precision: Content elements with more specific semantics can be derived from existing elements and used freely within documents. (5) Simpler element lists for authors: The document designer can select domains to minimize the element set. Authors can learn the elements that are appropriate for the document instead of learning to disregard unneeded elements. In short, the DITA domain feature provides for great flexibility in extending and reusing information types. The highlight, programming, and UI domains provided with the base DITA release are only the beginning of what can be accomplished..."

DITA content reuse is supported by the conref attribute: "you can throw a conref attribute on just about any element, to grab content from an equivalent element in another DITA topic..." This mechanism, said by Eliot Kimber to provide the equivalent of HyTime's #ELEMENT value reference facility, is also weakly expressed in ISO 8879 (SGML) by #CONREF attributes. Note that SGML's CONREF [content reference attribute] feature was highly controversial; it was dropped by XML.

DITA's conref "transclusion" mechanism is similar to the SGML conref mechanism, which uses an empty element as a reference to a complete element elsewhere. However, DITA requires that at least a minimal content model for the referencing element be present, and performs checks during processing to ensure that the replacement element is valid in its new context. This mechanism goes beyond standard XInclude, in that content can be incorporated only when it is equivalent: If there is a mismatch between the reusing and reused element types, the conref is not resolved. It also goes beyond standard entity reuse, in that it allows the reused content to be in a valid XML file with a DTD. The net result is that reused content gets validated at authoring time, rather than at reuse time, catching problems at their source.

Content referencing can be used at any scope of elements in a DITA document, from a keyword phrase that contains only PCDATA to a whole topic with other nested topics. Conref can cross file boundaries, using the same syntax as that of the href attribute on the xref element. If your authoring DTD allows topic nesting, you can create a set of minimal child topics and then use their conref attributes to pull in content from fully populated topics in other files.

From the FAQ document: "Darwin [is the name because] it uses the principles of specialization and inheritance. Information Typing capitalizes on the semantics of topics (concept, task, reference) and of content (messages, typed phrases, semantic tables). The architecture provides vertical headroom (new applications) and edgewise extension (specialization into new types) for information..."

Principal references:

"Introduction to the Darwin Information Typing Architecture. Toward portable technical information." Updated May 2002.
FAQ document: "Answers about the XML-based Darwin Information Typing Architecture (DITA) for documentation"
"Specializing Domains in DITA. Feature provides for great flexibility in extending and reusing information types." By Erik Hennum (Advisory Software Engineer, IBM Corporation). From IBM developerWorks, XML zone. May 2002.
"Specializing topic types in DITA. Creating new topic-based document types." By Michael Priestley. Updated May 2002.
DITA XML DTDs, style sheets, and sample documents. See the file listing. [cache]
DITA Forum. News and discussion forum.
Contact: Michael Priestley or Don R. Day (both of IBM)
IBM's Darwin Information Typing Architecture (DITA)." News item 2001-03-16.
"Darwin Information Typing Architecture (DITA XML)" - Main reference page.


SEARCH \| ABOUT \| INDEX \| NEWS \| CORE STANDARDS \| TECHNOLOGY REPORTS \| EVENTS \| LIBRARY