The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
SEARCH | ABOUT | INDEX | NEWS | CORE STANDARDS | TECHNOLOGY REPORTS | EVENTS | LIBRARY
SEARCH
Advanced Search
ABOUT
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

NEWS
Cover Stories
Articles & Papers
Press Releases

CORE STANDARDS
XML
SGML
Schemas
XSL/XSLT/XPath
XLink
XML Query
CSS
SVG

TECHNOLOGY REPORTS
XML Applications
General Apps
Government Apps
Academic Apps

EVENTS
LIBRARY
Introductions
FAQs
Bibliography
Technology and Society
Semantics
Tech Topics
Software
Related Standards
Historic
Created: April 05, 2007.
News: Cover StoriesPrevious News ItemNext News Item

W3C Publishes Internationalization Tag Set (ITS) Version 1.0 as Recommendation.

Contents

W3C has released Internationalization Tag Set (ITS) Version 1.0 as a Recommendation. This latest Web Standard from W3C "makes it easy to create internationalized XML content. Such content can be adapted, at lower cost, to the language, cultural and other requirements of a specific target market, a process called localization."

According to the announcement, ITS is a technology designed to support creation of XML which is internationalized and can be localized effectively. On the one hand, the ITS specification identifies concepts (such as "directionality") which are important for internationalization and localization. On the other hand, the ITS specification defines implementations of these concepts (termed "ITS data categories") as a set of elements and attributes called the Internationalization Tag Set (ITS). The ITS Recommendation provides implementations for three schema languages: XML DTD, XML Schema, and RELAX NG.

ITS 1.0 addresses a number of internationalization requirements, including being able to identify the language of a piece of text, to specify the directionality of text (such as right-to-left Hebrew and Arabic or mixed directionality texts), to provide Ruby annotations (used in East Asian documents to indicate pronunciation or to provide a short annotation), and to indicate whether content should be translated — an important requirement for people building tools to help with localization.

Selection encompasses mechanisms to specify to what parts of an XML document an ITS data category and its values should be applied to. Selection can be applied globally and locally. Local selection in XML documents is realized with local ITS attributes, the ruby element, or the span element. span serves just as a carrier for the local ITS attributes and a container for ruby. Global, rule-based selection is implemented using the rules element. It contains zero or more rule elements. Each rule element has a mandatory selector attribute. This attribute and all other possible attributes on rule elements are in the empty namespace and used without a prefix. As for global selection, ITS information can be added to the selected nodes, or it can point to existing information which is related to selected nodes. Selection relies on the information that is given in the XML Information Set. ITS applications may implement inclusion mechanisms such as XInclude or DITA's conref.

An internationalized XML schema takes into consideration these requirements and others, ideally early in the design process. With ITS 1.0, XML schema designers can build localization-ready schemas at lower cost by reusing the predefined ITS 1.0 constructs, such as the its:dir attribute to specify text directionality.

ITS 1.0 also enables people to improve the internationalization of existing XML documents without modifying them. To do so, one describes how the features of the existing format relate to the corresponding ITS 1.0 features. By creating this association with the powerful features of ITS 1.0, localization tools that support ITS 1.0 can be expanded at low cost to handle legacy content, including content in formats such as XHTML, DocBook, and DITA. ITS 1.0 also makes it easier and less expensive to build localization tools by offering a standard for localization concepts.

The Internationalization Tag Set (ITS) Version 1.0 Recommendation is accompanied by an Implementation Report. It covers implementations of all conformance types defined in the specification, including (1) Summary of CR Exit Criteria; (2) Conformance Type 1: ITS Markup Declarations; (3) Conformance Type 2: The Processing Expectations for ITS Markup — Implementations and Test Suite.

Internationalization Tag Set Specification

Internationalization Tag Set (ITS) Version 1.0. W3C Recommendation. 03-April-2007. This version URI: http://www.w3.org/TR/2007/REC-its-20070403/. Latest version URI: http://www.w3.org/TR/its/. Previous version URI: http://www.w3.org/TR/2007/PR-its-20070226/. Edited by Christian Lieske (SAP AG) and Felix Sasaki (W3C). This document is also available in these non-normative formats: ODD/XML document, self-contained zipped archive, and XHTML Diff markup to publication from 26-February-2007.

Motivation for ITS: Content or software that is authored in one language (so-called source language) is often made available in additional languages or adapted with regard to other cultural aspects. This is done through a process called localization, where the original material is translated and adapted to the target audience.

In addition, document formats expressed by schemas may be used by people in different parts of the world, and these people may need special markup to support the local language or script. For example, people authoring in languages such as Arabic, Hebrew, Persian, or Urdu need special markup to specify directionality in mixed direction text.

From the viewpoints of feasibility, cost, and efficiency, it is important that the original material should be suitable for localization (l10n). This is achieved by appropriate design and development, and the corresponding process is referred to as internationalization (i18n).

The increasing usage of XML as a medium for documentation-related content (e.g. DocBook and DITA as formats for writing structured documentation, well suited to computer hardware and software manuals) and software-related content (e.g. the Extensible User Interface Language - XUL) creates challenges and opportunities in the domain of XML internationalization and localization.

Users and Usages of ITS: The ITS specification aims to provide different types of users with information about what markup should be supported to enable worldwide use and effective internationalization and localization of content.

  • Schema developers who start a schema from ground up — This type of user will find proposals for attribute and element names to be included in their new schema, also called "host vocabulary". Using the attribute and element names proposed in the ITS specification may be helpful because it leads to easier recognition of the concepts represented by both schema users and processors. It is perfectly possible, however, for a schema developer to develop his own set of attribute and element names. The specification sets out, first and foremost, to ensure that the required markup is available, and that the behavior of that markup meets established needs.

  • Schema developers who work with an existing schema — This type of user will be working with schemas such as DocBook, DITA, or perhaps a proprietary schema. The ITS Working Group has sought input from experts developing widely used formats such as the ones mentioned. Developers working on existing schemas should check whether their schemas support the markup proposed in this specification, and, where appropriate, add the markup proposed here to their schema.

  • Vendors of content-related tools — This type of user includes companies which provide tools for authoring, translation or other flavors of content-related software solutions. It is important to ensure that such tools enable worldwide use and effective localization of content. For example, translation tools should prevent content marked up as not for translation from being changed or translated.

  • Content producers — This type of user comprises authors, translators and other types of content author. The markup proposed in this specification may be used by them to mark up specific bits of content. This global work, however, may fall to information architects, rather than the content producers themselves.

In order to support all of these users, the information about what markup should be supported to enable worldwide use and effective localization of content is provided in this specification in two ways: (1) abstractly in the data category descriptions; (2) concretely in the ITS schemas, as presented in Appendix D: 'Schemas for ITS'.

From the W3C Announcement

From the announcement "W3C Sets New Standard for Internationalized Web Content. ITS 1.0 Shows the Way to Making Interoperable Markup Languages":

The latest Web Standard from W3C, Internationalization Tag Set (ITS) 1.0, makes it easy to create internationalized XML content. This content can be adapted, at lower cost, to the language, cultural and other requirements of a specific target market, a process called localization. Whether ITS 1.0 is used to build an internationalized XML schema from scratch, to add support to an existing schema, or to improve the internationalization of existing content, ITS 1.0 gives users the power to create XML for worldwide use.

"It's all too common for international users and localizers to struggle with document formats due to a lack of internationalization during schema design, " explained Richard Ishida, W3C Internationalization Activity Lead. " Developers may not know what's needed, or may only provide part of what's needed, and then do so inconsistently from schema to schema. ITS is there to help with this, whether you are creating a new schema or working with an established one."

ITS 1.0 Designed with International Cooperation, Requirements

In designing ITS 1.0, the Internationalization Activity took into account the diverse internationalization and localization requirements of schema developers (with new or existing schemas), vendors of content-related tools, and content providers. ITS 1.0 was developed in liaison with some of the leading standardization efforts in the localization industry such as the XLIFF TC in OASIS, and the OSCAR SIG at LISA.

About the World Wide Web Consortium (W3C)

The World Wide Web Consortium (W3C) is an international consortium where Member organizations, a full-time staff, and the public work together to develop Web standards. W3C primarily pursues its mission through the creation of Web standards and guidelines designed to ensure long-term growth for the Web. Over 400 organizations are Members of the Consortium. W3C is jointly run by the MIT Computer Science and Artificial Intelligence Laboratory (MIT CSAIL) in the USA, the European Research Consortium for Informatics and Mathematics (ERCIM) headquartered in France and Keio University in Japan,and has additional Offices worldwide. For more information see http://www.w3.org/.

XML Internationalization: Requirements and Best Practices

Companion documents produced by members of the Internationalization Tag Set (ITS) Working Group:

  • Best Practices for XML Internationalization. W3C Working Draft. 18-May-2006 (or later). Edited by Yves Savourel (ENLASO Corporation) and Diane Stoick (Boeing Corporation). "This [version 18-May-2006] is a First Public Working Draft of Best Practices for XML Internationalization (XML i18n BP). The document provides best practices and techniques related to the internationalization of XML that developers of XML applications as well as content authors can use to ensure that their XML documents and schemas are easily adaptable for an international audience. These are practices and techniques that are best addressed from the start of content development if unnecessary costs and resource issues are to be avoided later on. The document is a complement to ITS. Not all internationalization-related issues can be solved with special markup described in ITS; there are a number of problems that can be avoided by designing correctly the XML format, and by applying a few guidelines when designing and authoring documents. This document and ITS implement requirements formulated in [the requirements document]. This document is divided into two main sections: The first one is intended to the designers and developers of XML applications; The second is for the XML content authors — this includes users modifying the original content such as the translators..."

  • Internationalization and Localization Markup Requirements. W3C Working Draft. 18-May-2006 (or later). Edited by Yves Savourel (ENLASO Corporation). "Content or software that is authored in one language (i.e. source language) is often made available in additional languages. This is done through a process called localization, where the original material is translated and adapted to the target audience. From the viewpoints of feasibility, cost, and efficiency, it is important that the original material should be suitable for localization. This is achieved by proper design and development, and the corresponding process is referred to as internationalization. The increasing usage of XML as a medium for documentation-related content (e.g., DocBook, being a format for writing structured documentation, well suited to computer hardware and software manuals) and software-related content (e.g., the eXtensible User Interface Language (XUL)) provides growing challenges and opportunities in the domain of XML internationalization and localization.

    When creating schemas (XML Schema, DTD, etc.), it is important to include constructs that meet the needs of content authors dealing with international audiences, and address the needs of the localization community. This document defines requirements for a set of solutions that would address the main challenges and issues of internationalizing and localizing XML documents. The solutions are expected to include several aspects: a specialized vocabulary that XML users can include in their own documents, a set of guidelines to apply when using existing XML technologies, and a range of possible mechanisms for applying those..."

Specification Development: Use of ODD and NVDL

Two technologies used in the development of the ITS specification are of note.

TEI 'ODD' Literate Programming. The Recommendation is provided in the non-normative XML format 'ODD'. "The specification has been developed using the ODD ('One Document Does It All') language of the Text Encoding Initiative (TEI)). This is a literate programming language for writing XML schemas, with three characteristics:

  • The element and attribute set is specified using an XML vocabulary which includes support for macros (like DTD entities, or schema patterns), a hierarchical class system for attributes and elements, and creation of modules.
  • The content models for elements and attributes are written using embedded RELAX NG XML notation.
  • Documentation for elements, attributes, value lists etc. is written inline, along with examples and other supporting material.

XSLT transformations are provided by the TEI to create documentation into HTML, XSL FO or LaTeX forms, and to generate RELAX NG documents and DTD. From the RELAX NG documents, James Clark's trang can be used to create XML Schema documents.

See the discussion in "Internationalization and Localization of XML: Introducing 'ITS'": "The ODD language has been applied for the creation of the ITS working draft, and the automatic generation of ITS markup declarations in the formats XML DTD, XML Schema and RELAX NG... Some W3C working groups use the traditional model of writing DTDs or Schemas by hand, with associated documentation in a variant of HTML. This has a number of drawbacks. Maintaining consistency in documentation and specification of a large or complex schema, particularly one which is multiply authored, is a non-trivial problem, analogous to the problem of maintaining large software development projects. The ODD language was found to have several advantages. It allows for the automated production of outputs in different schema languages, as noted above. Formalising the process of documentation ensures and enforces good practice, and allows for automatic production of completely documented schemas and consistently detailed support documentation. The process is familiar, of course, to adherents of Don Knuth's Literate Programming ideas... The ODD language is defined as one of the 22 modules which make up the TEI Guidelines (TEI P5). These Guidelines are themselves written using this language, and generated as an output from that source. The module concerned adds a number of specialist elements to the existing range of TEI markup, which supports a broad range of documentation-style elements comparable to those provided by other XML schemas such as DocBook or XHTML..."

Support for Checking ITS Markup with NVDL. ISO DSDL Part 4 (ISO/IEC 19757-4) Namespace-based validation dispatching language — NVDL was prepared by Joint Technical Committee ISO/IEC JTC 1, Information technology, Subcommittee SC 34, Document description and processing languages. This part of ISO/IEC 19757 "specifies a Namespace-based Validation Dispatching Language (NVDL). An NVDL script controls the dispatching of elements or attributes in a given XML document to different validators, depending on the namespaces of the elements or attributes. An NVDL script also specifies which schemas are used by these validators. These schemas may be written in any schema languages, including those specified by ISO/IEC 19757."

Informative Appendix F. Checking ITS Markup with NVDL "provides a document which allows validation of ITS markup which has been added to a host vocabulary. Only ITS elements and attributes are checked. Elements and attributes of host language are ignored during validation against this NVDL document/schema. The NVDL schema depends on the following two RELAX NG schemas: (1) RELAX NG schema for ITS elements; (2) RELAX NG schema for ITS attributes..."

About the W3C Internationalization Tag Set (ITS) Working Group

The mission of the Internationalization Tag Set Working Group (ITS WG) "is to develop a set of elements and attributes that can be used with new DTDs/Schemas to support the internationalization and localization of documents, and to provide techniques for developers of DTDs/Schemas dealing with approaches that best support internationalization of their documents.

The World Wide Web is by its name and by its actual extent world-wide. Enabling people from all parts of the world to make full use of Web technologies requires support for their languages, writing systems and cultures. The W3C is firmly committed to making sure that its specifications and other outputs are adequately internationalized. Overall W3C work has moved on at a high pace, and is expected to continue to do so, but internationalization is a continuing concern for W3C, as part of the W3C goal of Universal Access.

The W3C Internationalization Activity was created in October 1995. In February 1998, the Internationalization (I18N) WG was created, and has been rechartered regularly. In 2002 the WG divided its work across three task forces. This charter proposes a new Working Group.

Developers who create formats based on XML cannot be expected to foresee all the potential requirements and issues associated with internationalization and localization of content created using those formats. Nor is it sensible to expect each developer to reinvent the wheel each time. It would be much better if there was a set of 'ready-made' elements and attributes, that had been designed using state of the art knowledge about internationalization and localization needs, that developers could include in the format they are developing. Furthermore, standardization of internationalization and localization related tags would also benefit the localization industry, particularly where automated translation tool technology is deployed. Not all best practice can be implemented simply by adding a standard set of tags, so it would be useful to also provide a set of complementary guidelines.

The W3C is a logical place to develop a tag set and guidelines of this kind. As the developer and maintainer of XML, the Consortium has a tremendous amount of knowledge and experience to bring to bear on the topic. That experience is strengthened through its work over the years internationalizing XML-based formats such as XHTML, SVG, SMIL, MathML, etc. In addition, the Consortium provides a highly visible and credible forum for exposing best practice to the developers that should consider their use.

The time is also right for this development. Industry has expressed renewed interest in pursuing this topic during 2004, and the Working Group should include participants who also represent the interests of the Localization Industry Standards Association (LISA) and the OASIS Localization Technical Committee (which produces XLIFF)..."

About the W3C Internationalization (I18N) Activity

The mission of the W3C Internationalization Activity is to ensure that W3C's formats and protocols are usable worldwide in all languages and in all writing systems. The work of the Internationalization Activity is done by three Working Groups and by participants in the Internationalization Interest Group:

  • The W3C Internationalization Core Working Group enables universal access to the World Wide Web by reviewing specifications produced by other W3C Working Groups and producing its own specifications. The mission of the Internationalization Core Working Group, part of the Internationalization Activity, is to enable universal access to the World Wide Web by proposing and coordinating the adoption by the W3C of techniques, conventions, technologies, and designs that enable and enhance the use of W3C technology and the Web worldwide, with and between the various different languages, scripts, regions, and cultures. See the Charter.

  • The W3C I18N Architecture Working Group was chartered to move the Character Model for the World Wide Web 1.0: Resource Identifiers and Character Model for the World Wide Web 1.0: Normalization to Recommendation status. The former is currently in Candidate Recommendation, and the latter is a Working Draft. Based on Character Model for the World-Wide Web 1.0: Fundamentals , these Architectural Specifications provide authors of specifications, software developers, and content developers with a common reference on the use of normalization of text and string identity matching, and the use of resource identifiers building on the Universal Character Set on the Web. The goal of these specifications is to improve interoperable text manipulation on the World Wide Web. See the Charter.

  • The ITS (Internationalized Tag Set) Working Group was chartered to develop a set of elements and attributes that can be used with new DTDs/Schemas to support the internationalization and localization of documents. It will also provide best practise techniques for developers of DTDs/Schemas that show how to enable internationalization of their documents. See the Charter.

  • Internationalization Interest Group. Anyone can participate in the Interest Group by joining the mailing list. The purpose of this IG is to help the Working Groups within the Internationalization Activity with advice and opinions from a larger group of people with knowledge in different languages and cultures as well as different parts of the Web architecture. The I18N IG also provides a forum to discuss issues related to the internationalization of the Web. See the IG Charter and public discussion list.

Principal References


Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation

Primeton

XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Bottom Globe Image

Document URI: http://xml.coverpages.org/ni2007-04-05-a.html  —  Legal stuff
Robin Cover, Editor: robin@oasis-open.org