Update 2005-02-17: The OASIS Darwin Information Typing Architecture (DITA) TC approved DITA 1.0 as a Committee Draft and advanced it for public review. DITA is an architecture for creating topic-oriented, information-typed content that can be reused and single-sourced in a variety of ways. It also supports specialization to create new topic types for new information domains. The review draft includes an Architectural Specification, Language Reference, XML Schemas, and DTDs. See details in the news story "OASIS Darwin Information Typing Architecture TC Approves DITA Version 1.0 as a Committee Draft."
[March 29, 2004] OASIS Sponsor Members Arbortext, IBM, Innodata Isogen, and Nokia have proposed a new OASIS DITA Technical Committee for development and maintenance of the Darwin Information Typing Architecture (DITA). Joined by other individual members (from Comtech Services, IXIASOFT, Mulberry Technologies, Syntext), the TC will further define and maintain DITA to promote the use of the architecture for creating standard information types and domain-specific markup vocabularies.
Pioneered by researchers at IBM (Don Day, Michael Priestley, David Schell, and others), DITA is an architecture for creating topic-oriented, information-typed content that can be reused and single-sourced in a variety of ways. It is also an architecture for creating new information types and describing new information domains based on existing types and domains. DITA supports content reuse through a unique transclusion mechanism that involves validation under DTD processing rules: an element "can replace itself with the content of a like element elsewhere, either in the current topic or in a separate topic that shares the same content models." DITA is "specializable, which allows for the introduction of specific semantics for specific purposes without increasing the size of other DTDs, and which allows the inheritance of shared design and behavior and interchangeability with unspecialized content. Providing specific semantics allows more automatable processes, more consistent authoring, better retrievability, and better applicability to specific groups."
The OASIS DITA Technical Committee will "articulate the principles of the DITA architecture through formal specifications, assess the relationship of DITA specialization to emerging XML standards, define appropriate enhancements of the architecture, and standardize information types in the DITA type hierarchy. The TC will also encourage cooperation within and between the various topical domains of potential DITA users, designing a generic methodology for specialized extensions of the base specification by user communities."
In connection with the new TC's formation, the DITA developers have prepared an updated version of the DITA Toolkit containing new DTDs and source for the DITA Language Reference. Ant build scripts are also provided: "in a properly set-up environment, they can generate output to HTML Help, JavaHelp, Eclipse Help, Web page models, and XSL-FO — all the way to PDF if you invoke FOP."
Don Day of IBM is the Proposed Chair of the OASIS DITA TC. The first meeting of the TC will be held as a teleconference on May 4, 2004.
News Story Contents
- Summary
- DITA: Key Architecture Features
- DITA Toolkit Version 3.1
- OASIS DITA TC Proposal
- Principal References
DITA: Key Architecture Features
DITA has several unifying features that serve to organize and integrate information:
Topic orientation. The highest standard structure in DITA is the topic. Any higher structure than a topic is usually part of the processing context for a topic, such as a print-organizing structure or the helpset-like navigation for a set of topics. Also, topics have no internal hierarchical nesting; for internal organization, they rely on sections that define or directly support the topic. A topic is a unit of information that describes a single task, concept, or reference item. The information category (concept, task, or reference) is its information type (or infotype). A new information type can be introduced by specialization.
Reuse. A principal goal for DITA has been to reduce the practice of copying content from one place to another as a way of reusing content. Reuse within DITA occurs on two levels:
- Topic reuse: Because of the non-nesting structure of topics, a topic can be reused in any topic-like context. Information designers know that when they reuse a topic in a new information model, the architecture will process it consistently in its new context.
- Content reuse: The SGML method of declaring reusable external entities is available for XML users, but this has several practical limitations in XML. DITA instead leans toward a different SGML reuse technique and provides each element with a conref attribute that can point to any other equivalent element in the same or any other topic.
Specialization. The class mechanism in CSS indicates a common formatting semantic for any element that has a matching class value. In the same way, any DITA element can be extended into a new element whose identifier gets added to the class attribute through its DTD. Therefore, a new element is always associated to its base, or to any element in its specialization sequence.
- Topic specialization: Applied to topic structures, specialization is a natural way to extend the generic topic into new information types (or infotypes), which in turn can be extended into more specific instantiations of information structures. For example, a recipe, a material safety data sheet, and an encyclopedia article are all potential derivations from a common reference topic.
- Domain specialization. Using the same specialization principle, the element vocabulary within a generic topic (or set of infotyped topics) can be extended by introducing elements that reflect a particular information domain served by those topics. A specialized domain, such as programming phrases, can be introduced by substitution anywhere that the root elements are allowed. This makes the entire vocabulary available throughout all the infotyped topics used within a discipline.
Property-based processing. The DITA model provides metadata and attributes that can be used to associate or filter the content of DITA topics with applications such as content management systems, search engines, processing filters, and so on.
- Extensive metadata to make topics easier to find: The DITA model for metadata supports the standard categories for the Dublin Core Metadata Initiative. In addition, the DITA metadata enables many different content management approaches to be applied to its content.
- Universal properties. Most elements in the topic DTD contain a set of universal attributes that enable the elements to be used as selectors, filters, content referencing infrastructure, and multi-language support.
Taking advantage of existing tags and tools. Rather than being a radical departure from the familiar, DITA builds on well-accepted sets of tags and can be used with standard XML tools.
- Leveraging popular language subsets: The core elements in DITA's topic DTD borrow from HTML and XHTML, using familiar element names like p, ol, ul, and dl within an HTML-like topic structure; DITA topics can be written, like HTML for rendering directly in a browser. DITA also makes use of the popular OASIS (formerly CALS) table model.
- Leveraging popular and well-supported tools: The XML processing model is widely supported by a number of vendors. The class-based extension mechanism in DITA translates well to the design features of the XSLT and CSS stylesheet languages defined by the World Wide Web Consortium and supported in many transformation tools, editors, and browsers..." [adapted from the Introduction to the Darwin Information Typing Architecture]
DITA Toolkit Version 3.1
The IBM DITA development team has prepared a new DITA package updating the DITA Toolkit issued in June 2003. This Version 3.1 distribution package is available for download from the IBM developerWorks web site. The version 3.1 release includes the latest set of bug fixes and some proposed new markup; these same updated XML DTDs/Schemas will be provided to the OASIS Darwin Information Typing Architecture Technical Committee.
The DITA Toolkit contains some 520+ source files; see the file listing for the version 1.3 distribution package. Included is the primary DITA reference, DITA Language Reference. Learning Your Way Around DITA Markup, in PDF and in HTML Help (.chm) format.
The DITA development team has included the DITA source for the DITA Language Reference in this distribution as well. A standard map produces alphabetical online help; a bookmap version produces the hierarchically-arranged PDF of the same topics. The new toolkit also contains demos of map-based processing, which is where the real power of using topics is demonstrated. Ant build scripts are provided; in a properly set-up environment, they can generate output to HTML Help, JavaHelp, Eclipse Help, Web page models, and XSL-FO — all the way to PDF if you invoke FOP..." From the documentation ('Building DITA Output with Ant'): "You can use the Ant build tool to automate builds that use the DITA processes including output for the sample documents provided with DITA. DITA provides a set of XSLT scripts for producing help output in Eclipse, Java Help, HTML Help, for producing web HTML pages, or for producing PDF. To make it easier to invoke these scripts, the DITA distribution now provides an experimental Ant file that you can use to build the DITA documentation, demos, and samples. Ant is a Java-based, Open Source tool provided by the Apache Foundation to declare a sequence of build actions. As such, Ant is well suited for document builds as well as development builds..."
"The Revision 1.3 (dita13.zip) release provides a number of bug fixes for both the DTDs and Schemas. Moreover, it introduces several backwards-compatible upgrades that have been requested. For example, a new <prophead> element is supported for property tables in the reference infotype. This change allows you to create regular headings for property tables. By using conref, you can instance a common property heading throughout many reference topics that follow that heading pattern. There is a more complete definition of the map DTD, introducing markup that better supports navigation features in Eclipse. Demos are provided for the new markup in the map DTD and a specialization of map for book printing, the bookmap DTD. The occurrence rule for <metadata> is changed from 0-or-1 to any number, which allows you to create sets of metadata, increasing its utility for specialization. There are expanded contexts in which the <term> and <keyword> elements are valid, which allows conrefing these elements everywhere that general entities (like SGML text symbols) might be desired. A new <imagemap> element enables producing linked images that can be output as navigational structures in HTML. This element is being introduced in this release in a new utilities domain; however, it is felt that after the design has been proven, this markup is a good candidate for becoming a new core structure of topic.mod. A new <alt> element is provided for the <image> element, as it is ever more important to be able to easily author accessible information. By moving the authoring context for alternative text from an attribute into element context, authors can more easily address their business requirements for authoring accessible information..." [see full description in the README, "Changes for this release" ]
From the OASIS DITA TC Proposal
Name of the Technical Committee
OASIS Darwin Information Typing Architecture (DITA) Technical Committee
Statement of Purpose
The purpose of the OASIS DITA Technical Committee (TC) is to define and maintain the Darwin Information Typing Architecture (DITA) and to promote the use of the architecture for creating standard information types and domain-specific markup vocabularies.
DITA is specializable, which allows for the introduction of specific semantics for specific purposes without increasing the size of other DTDs, and which allows the inheritance of shared design and behavior and interchangeability with unspecialized content.
More specific semantics allow:
- more automatable processes
- more consistent authoring
- better retrievability
- better applicability to specific groups
The work of this TC will differ from similar efforts such as DocBook because of:
- broader scope, inasmuch as DITA applies to more areas than just technical manuals
- more specific scope, inasmuch as DITA applies to topic-oriented information rather than all technical manuals
Scope of Work
The TC will create specifications for the Darwin Information Typing Architecture suitable for submitting for balloting by OASIS membership for OASIS standard status.
DITA is an XML-based specification for modular and extensible topic-based information. DITA provides a model for defining and processing new information types as specializations of existing types.
DITA populates the model with an extensible hierarchy of standard types. DITA encourages reuse by reference either of topics or of fragments of topics. DITA topics:
- can be assembled in different combinations for many deliverables or output formats
- are optimized for navigation and search
- are well suited for concurrent authoring and content management
Through use of a common specification, DITA content owners can benefit from industry support, interoperability, and reuse of community contributions. At the same time, through specialization, content owners can address the specific requirements of their business or industry.
This committee builds upon the foundation established by the work of IBM on DITA.
The tasks of the TC include:
- To articulate the principles of the DITA architecture through formal specifications
- To assess the relationship of DITA specialization to emerging XML standards (such as the ontology initiatives associated with the Semantic Web)
- To define appropriate enhancements of the architecture
- To standardize the information types in the DITA type hierarchy
- To encourage cooperation within and between the various topical domains of potential DITA users. It is anticipated that, in addition to the common information elements provided in the base specification, specific communities of users may develop additional, specialized type hierarchies of particular relevance to their use cases. The TC may choose to recognize new information types or domain specializations where a new specialization provides a standard solution for a well-established need, has broad support, does not conflict with existing types, and serves as a useful base for additional specialization. For example, the concept, task, and reference information types do so for the user assistance community. The TC anticipates maintaining a set of core information types of general utility, implemented in schema languages (such as DTD or XML Schema) selected by the TC. Recognized types may also be maintained by other groups (including other OASIS TCs).
- To design a generic methodology for specialized extensions of the base specification by user communities. This methodology may address issues such as delivery of a reference implementation, operation of a public registry for specializations, suggested guidelines for development of a user community's information types, and so forth. When the above tasks are completed, the TC may reconsider further work, which will be defined as allowed by the OASIS TC Process.
List of Deliverables
Within three months of the first meeting, the existing DITA specification will be contributed to OASIS by its author, will be further developed by the TC and approved as a Committee Draft, and then submitted to OASIS for consideration as an OASIS Standard. The specification consists of:
- a formal definition of the rules for creating new information type and domain specializations through specialization
- the DTDs and XML Schemas for the initial DITA information type, domain, and map specializations
- a processing model description that defines standard usage of the DITA specifications
Within six months of the first meeting, the TC will seek to encourage specific specialized extensions of the DITA specification, as well as these deliverables:
- guidelines and methodologies for the development of DITA specializations by a user community
- a possible specification of a standards-based public registry or repository for such DITA specializations or a method for creating or federating such resources
The TC may consider the creation of subcommittees where there is an immediate interest in developing specialized extensions, but it is also anticipated that such extensions could be adopted locally and informally within specific information exchange communities.
One year after producing the first DITA Committee Draft, the TC will produce a new major revision of DITA including:
- evolution of the DITA architecture to address issues such as namespaces, type unification, extension by addition, and extensible enumerations
- formal specifications of all aspects of the DITA architecture with primers, use cases, and scenarios
- maintenance of the earlier DITA types
- addition to the base specification of those new DITA information types that appear from specialized uses to have general utility
- a continuing methodology for the harvesting and incorporation of additional, useful types into the base specification
Anticipated Audience
- Writers of other specifications that could benefit from DITA's specialization model or other aspects of its architecture
- Vendors offering XML authoring or development products
- XML architects and developers who design and write XML applications
- Information developers and information architects
TC Language
English
Identification of Similar or Applicable Work
DITA is an enabling technology that has potential relationships with many other activities. It is compatible with ISO topic maps, although DITA's use of the word "topic" is considerably more constrained than in that standard, and DITA maps use structuring principles specifically designed to support specialization. It supports semantic web initiatives, inasmuch as DITA both enables rich semantic markup and provides a taxonomy for semantics through its type and domain hierarchies. It is compatible with ontological efforts in general, inasmuch as DITA maps are a way of describing the relationships among topics, and can be used to describe multiple ontologies across the same topic sets.
Because DITA is a constrained architecture dealing specifically with topics and relationships among topics, it does not directly impact more general activities. However, DITA topics are ideal candidates for participation in semantic web relationships, and DITA maps can be excellent sources for the description of these relationships.
The work of the OASIS DocBook TC is similar or applicable.
The proposed work is different from DocBook in that DITA is topic-oriented, which lends itself to different uses than DocBook. Topic orientation allows the separation of content (specific topics) from context (including links to other topics, context-specific metadata, navigation, and print hierarchies.
The DITA TC will identify liaisons with other committees or groups doing related work to investigate points of common interest. Additionally, the TC may have some coordinated activities with the DocBook TC focusing on interoperability of content in the two formats.
List of Contributions of Existing Technical Work
The proposers anticipate that IBM will contribute a starter set of information types, formal definitions of four domains, five document types, two maps, and several common modules.
See the table below for a list of the current DITA DTDs, schemas, and related documentation. Additional information concerning these materials, along with some IBM proprietary materials that are not being contributed, can be found at:
http://www-106.ibm.com/developerworks/xml/library/x-dita1/index.html.
Other contributions within the scope of the TC will also be considered.
Date and Time of the First Meeting
4 May 2004, 11am ET, a teleconference to be hosted by IBM
Projected On-Going Meeting Schedule
11 EST each Tuesday for 1 hour, for the year following formation of the TC; hosted by IBM
Proposers
- Paul Grosso <pgrosso@arbortext.com>, Arbortext
- Indi Liepa <indi.liepa@nokia.com>, Nokia
- Eliot Kimber <ekimber@innodata-isogen.com>, Innodata Isogen
- Don Day <dond@us.ibm.com>, IBM
- Michael Priestley <mpriestl@ca.ibm.com>, IBM
- France Baril <France.Baril@ixiasoft.com>, Individual
- JoAnn Hackos <JoAnn.Hackos@Comtech-Serv.com>, Individual
- Debbie Aleyne Lapeyre <dalapeyre@mulberrytech.com>, Individual
- Dave Schell <dschell@us.ibm.com>, IBM
- Paul Antonov <apg@syntext.com>, Individual
TC Convener
Dave Schell
Proposed Chair
Don Day
Attachment
Table of current DITA DTDs, schemas, and related documentation
Unit DTDs Schemas ------------------- ---------------- Information types topic topic.mod topic.mod concept concept.mod concept.mod task task.mod task.mod reference reference.mod reference.mod Domains highlighting highlight-domain.mod highlight-domain.mod highlight-domain.ent programming programming-domain.mod programming-domain.mod programming-domain.ent software software-domain.mod software-domain.mod software-domain.ent user interfaces ui-domain.mod ui-domain.mod ui-domain.mod Document types (integrate domains and information types) topics topics.dtd topic.xsd concepts concept.dtd concept.xsd tasks tasks.dtd tasks.xsd reference reference.dtd reference.xsd mixed ditibase.dtd ditabase.xsd Maps base map.dtd book-specialized book.dtd Common modules metadata meta_xml.mod meta_xml.mod CALS tables tbl_xml.mod tbl_xml.mod standard XML attributes xml.xsd
Principal References
- Call for Participation: OASIS Darwin Information Typing Architecture (DITA) Technical Committee
- Announcement 2004-04-12: "OASIS DITA Technical Committee Forms to Advance XML Standard for Authoring Reusable Content in Documents."
- "Arbortext Announces DITA Support, Enables Information-Architected Topic-Based Authoring." Announcement 2004-04-13.
- OASIS Darwin Information Typing Architecture TC web site
- DITA TC Charter
- DITA TC mailing list archives
- DITA References from IBM developerWorks:
- DITA Introduction
- DITA FAQ document
- DITA: Specializing Information Types
- DITA: Specializing Domains
- DITA Forum
- DITA Language Reference. Learning Your Way Around DITA Markup. HTML Help (.chm) format. Copyright (c) International Business Machines 2001, 2004. March 30, 2004. Also available in PDF format. Extracted from the DITA Toolkit Version 3.1 distribution archive; see downloads.
- DITA Downloads. Download the latest DITA DTDs, style sheets, and sample documents. See the file listing for the version 1.3 distribution package. Note the complete statement of license terms in the IBM Darwin Information Typing Architecture Specification Agreement. Within this agreement it is said (in part): "... IBM hereby grants to you a worldwide, irrevocable, royalty free, non- exclusive license under IBM's copyrights in the Specifications, to copy, modify, publish and distribute the Specifications..."
- Earlier DITA News:
- "IBM Development Team Publishes Updated DITA Toolkit and Language Reference." News story 2003-06-24.
- "The Holy Grail of Content Reuse: IBM's DITA XML." News story 2003-04-25.
- "IBM's Darwin Information Typing Architecture (DITA)." News story 2001-03-16.
- Local References for "Darwin Information Typing Architecture (DITA XML)"