The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
SEARCH | ABOUT | INDEX | NEWS | CORE STANDARDS | TECHNOLOGY REPORTS | EVENTS | LIBRARY
SEARCH
Advanced Search
ABOUT
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

NEWS
Cover Stories
Articles & Papers
Press Releases

CORE STANDARDS
XML
SGML
Schemas
XSL/XSLT/XPath
XLink
XML Query
CSS
SVG

TECHNOLOGY REPORTS
XML Applications
General Apps
Government Apps
Academic Apps

EVENTS
LIBRARY
Introductions
FAQs
Bibliography
Technology and Society
Semantics
Tech Topics
Software
Related Standards
Historic
Created: February 11, 2008.
News: Cover StoriesPrevious News ItemNext News Item

New OASIS Standard: XML Localization Interchange File Format (XLIFF) v1.2.

Contents

OASIS has announced the approval of the XML Localization Interchange File Format (XLIFF) specification Version 1.2 as an OASIS Standard. The specification was produced by members of the OASIS XML Localisation Interchange File Format (XLIFF) Technical Committee.

The purpose of the XLIFF vocabulary is to store localizable data and carry it from one step of the localization process to the other, while allowing interoperability between tools. The specification is tool-neutral, supports the entire localization process, and supports common software, document data formats, and markup languages. The specification provides an extensibility mechanism to allow the development of tools compatible with an implementer's data formats and workflow requirements. The extensibility mechanism provides controlled inclusion of information not defined in the specification.

The XLIFF file format serves as a container for externalized data to be interchanged between software publishers, documentation writers (including, but not limited to documents written in DITA, Docbook, HTML, and other XML document formats), localization tools, and software services providers in order to facilitate all the phases of the localization process.

XLIFF is an XML-based vocabulary. Use of XLIFF is represented in the DITA Translation Subcommittee, and will be featured in a translation best practices document. XLIFF has a working relationship with LISA/OSCAR standards related to Translation and Localization, and is a requirement for several LISA standards: (1) TMX supports XLIFF inline markup, where TMX Version 2.0 was in progress [public review draft] as of March 28, 2007; (2) Global Information Management Metrics eXchange (GMX), where XLIFF is a requirement; (3) xml:tm, where XLIFF is a requirement.

Statements of "successful use" with XLIFF Version 1.2 have been provided by Lionbridge Inc., SDL International, OSCAR, LISA, Idiom Technologies Inc., and Localisation Research Centre (LRC), University of Limerick.

Bibliographic Information

  • XLIFF Version 1.2. OASIS Standard. 01-February-2008. Also available in PDF format [source]. Edited by Yves Savourel, John Reid, Tony Jewtushenko, and Rodolfo M. Raya. Namespace: urn:oasis:names:tc:xliff:document:1.2. See the specification drafts for XLIFF Version 1.2, and URIs for the OASIS Standard.

    Final Version 1.2 includes two XML schemas: strict and transitional. XLIFF is specified in two "flavors". Indicate which of these variants you are using by selecting the appropriate schema. The schema may be specified in the XLIFF document itself or in an OASIS catalog. The namespace is the same for both variants. Thus, if you want to validate the document, the tool used knows which variant you are using. Each variant has its own schema that defines which elements and attributes are allowed in certain circumstances... For 'Transitional', applications that produce older versions of XLIFF may still use deprecated items; for 'Strict', all deprecated elements and attributes are not allowed, so obsolete items from previous versions of XLIFF are deprecated and should not be used when writing new XLIFF documents.

  • Representation Guide for HTML. One of the "XLIFF-Related Documents" referenced in the text of the XLIFF TC's v1.2 submission for OASIS approval. Edited by Yves Savourel, Bryan Schnabel, Tony Jewtushenko, and Doug Domeny. Source HTML and PDF.

    "This document describes how HTML (in its different flavors), should be coded when extracted to an XLIFF document... As different tools may provide different filters to extract the content of HTML documents it is important for interoperability that they represent the extracted data in identical manner in the XLIFF document. The intent of this document is to provide a set of guidelines to represent HTML data in XLIFF. It offers a collection of recommended mapping of the HTML elements and attributes developers of XLIFF filters can implement, and users of XLIFF utilities can rely on to insure a better interoperability between tools.

    Many HTML documents are generated dynamically, in some cases using server-side script files which are often made of a mixture of HTML constructs and server-side instructions written in one of the server-side languages such as PHP, JSP, ASP, or many others. While such source documents are generally outside of the scope of this document, an effort is made to try to address some of the issues you may run into when extracting such source documents...

    There are many ways to process a source HTML document and create its corresponding XLIFF output. One interesting method is to make use of XML standards, such as XSLT, XPath, or XSL-FO. Of these, XSLT is a particularly good tool for transforming HTML to XLIFF, and XLIFF back to HTML. Another method to extract HTML files to XLIFF is to use custom filter applications. Such tools can be written in a variety of programming and scripting languages such as Perl, Python, C, C++, C#, Java, and so forth. This document makes no assumption on the type of language used to process the HTML input documents. It also makes no assumptions whether or not the tool creates a Skeleton file along with the XLIFF document generated, or if it creates one, how data are represented in the Skeleton.

  • Representation Guide for Java Resource Bundles. One of the "XLIFF-Related Documents" referenced in the text of the XLIFF TC's v1.2 submission for OASIS approval. Edited by Tony Jewtushenko and Rodolfo M. Raya. Source HTML and PDF.

    "This document describes how Java Resource Bundles, should be coded when extracted to an XLIFF document... As different tools may provide different filters to extract the content of Java Resource Bundles, it is important for interoperability that they represent the extracted data in identical manner in the XLIFF document. The intent of this document is to provide a set of guidelines to represent data contained in Java Resource Bundles as XLIFF content. It offers a collection of recommended mapping of Java Resource Bundles that developers of XLIFF filters can implement, and users of XLIFF utilities can rely on to insure a better interoperability between tools.

  • Representation Guide for Gettext PO. One of the "XLIFF-Related Documents" referenced in the text of the XLIFF TC's v1.2 submission for OASIS approval. Edited by Asgeir Frimannsson, Tony Jewtushenko, and Rodolfo M. Raya. Source HTML and PDF.

    "This document defines a guide for mapping the GNU Gettext PO (Portable Object) file format to XLIFF (XML Localisation Interchange File Format). There are two types of PO files: PO Template files (POTs) and Language specific PO files (POs). POTs contains a skeleton header, followed by the extracted translation units. POTs are generated by the xgettext extraction tool and are not meant to be edited by humans. POTs are converted into Language Specific POs by the msginit tool, and these files are then edited by translators.

    When source code is updated, a new POT is generated for the project, and the changes from previous versions are incorporated into the existing translations by using the msgmerge tool. This tool inserts new translation units into the existing PO files, marks translation units no longer in use as obsolete, and updates any references and extracted comments.

    Translated PO files are converted to binary resource files, known as MO (Machine Object) files, by the msgfmt tool. The Gettext library use MO files at run time; hence PO files are only used in the development and localisation process...

From the OASIS Announcement

OASIS, the international open standards consortium, today announced that its members have approved the XML Localisation Interchange File Format (XLIFF) version 1.2 as an OASIS Standard, a status that signifies the highest level of ratification. Developed through an open process by the OASIS XLIFF Technical Committee, the new standard defines a vocabulary for storing localizable data and carrying it from one step of the localization process to another.

"XLIFF enables interoperability between tools throughout the digital content localization lifecycle. It provides publishers with a standard data interchange container that can be understood by any localization provider," noted Tony Jewtushenko, who co-chairs the OASIS XLIFF Technical Committee with Bryan Schnabel.

"XLIFF is a powerful and concise format for content that needs to be translated," said Schnabel. "Until now, we've had to develop custom mechanisms for data providers and translators to accomplish localization. With XLIFF, we have an open standard that is efficient, predictable, and transferable for tool makers and localization service providers, as well as for content owners."

The new standard defines how to mark up and capture localizable data that can interoperate with different processes or phases without loss of information. Tool-neutral, XLIFF supports the entire localization process, including common software, document data formats and markup languages. It provides an extensibility mechanism to allow the development of tools compatible with an implementer's data formats and workflow requirements. The extensibility mechanism supports controlled inclusion of information not expressly defined in the specification.

"OASIS members have succeeded in reducing the complexity of localizing software by providing a standard, XML-based, end-to-end resource container," explained James Bryce Clark, director of standards development at OASIS. "Software and documentation publishers can extract content into XLIFF and localize it — over and over again — using shrink-wrapped solutions, customized tools or even automated enterprise workflow systems. XLIFF's built-in support for Computer Aided Translation technologies such as translation memory and machine translation add even greater process efficiency."

Successful use of XLIFF 1.2 was verified by Lionbridge, the Localization Industry Standards Association (LISA), the University of Limerick Localisation Research Centre (LRC), and SDL, in accordance with eligibility requirements for all OASIS Standards. XLIFF was developed under the Royalty-Free on RAND Terms Mode of the OASIS Intellectual Property Rights Policy, which generally requires committee participants to license their Essential Claims using royalty-free elements.

To encourage widespread adoption, the OASIS XLIFF Technical Committee continues its work defining implementation guides for some of the most commonly used resource formats (HTML, Java Resource Bundles, and Gettext PO Files). Participation in the Committee remains open to all government agencies, companies, non-profit groups, academic institutions, and individuals. Archives of the work are publicly accessible, and OASIS offers a mechanism for public comment.

Support for the XLIFF OASIS Standard

Comtech

"The XLIFF standard is a vital part of the translation tool kit, supporting our efforts to move XMLbased structured content seamlessly through the translation process," said JoAnn T. Hackos, PhD, president, Comtech Services and chair of the OASIS DITA Translation Subcommittee.

Lionbridge

"As an early adopter and promoter of XLIFF, Lionbridge continues to strongly embrace XLIFF as an open standard for the localization industry. As usage increases, our customers appreciate the flexibility of the native XLIFF support provided within Logoport today. After many years of domination by proprietary formats, the XLIFF standard facilitates interoperability in a fragmented localization tools industry, providing customers and practitioners with the freedom to choose their language tools," said Eric Blassin, vice president, Language Technology, Lionbridge.

SAP

"One important factor for SAP's success in the marketplace is our broad experience with globalization, internationalization, localization and translation of our solutions. We congratulate the working group for another important milestone," said Michael Bechauf, Vice President, Industry Standards at the Global Ecosystem and Partner Group, SAP.

From the White Paper

Members of the XLIFF TC have prepared a White Paper covering XLIFF Version 1.2. This white paper is provided as a high level guide to anyone who seeks to better understand XLIFF in general terms, with particular emphasis on XLIFF 1.2's features. Originally authored in October 2007 (Revision 1.0), the latest version of this 34-page document was submitted to the XLIFF TC document repository by Bryan S. Schnabel (XLIFF TC co-chair) on January 15, 2008. Excerpts:

When XLIFF was initially envisioned many years ago, it was in response to the growing complexity of the software localization process. At the time, the Internet 'revolution' was forcing software publishers to converge upon new technologies based on software standards such as HTML, XML and Java. Although these standards were designed with the international market in mind and were intended to simplify the development of globalized applications, they had the opposite effect on localization. The proliferation of so many disparate software resource formats meant that the process of localising Internet based applications was complex, expensive and opaque. Software publishers seeking to localize these products had to choose between two equally expensive options: develop complex localization tools internally, or outsource the entire localization process as a "black box".

In addition to responding to the complexity of the software localization process, XLIFF also began to be considered as a solution to the equally complex documentation and technical communication localization process. Challenges in translating proprietary word processor and desktop publishing files, and even challenges having to do with the age-old task of localizing text strings in graphics showed proved to be a potential good fit for XLIFF.

XLIFF reduces this complexity of localising software by providing a standard, XML-based, end-to-end, tool neutral resource container. Software and documentation publishers can extract their localizable content into XLIFF and localize them using shrink-wrapped tools solutions, customized tools or automated enterprise workflow systems. Additional process efficiency is achieved by XLIFF's built-in support for Computer Aided Translation technologies such as translation memory and machine translation.

Architecture: XLIFF is based on the concept of extracting the source localization-related data from the original format, and merging it back in place after the localization has been done. Depending on the extract/merge method, the parts that are not related to localization can be preserved temporarily into the Skeleton. Or, usually when the source is already XML, the non localizable parts can be preserved within the XLIFF hierarchy using <group> elements to preserve the hierarchy.

There are no rules to date on how to represent the data in the Skeleton itself, this is left to the discretion of the filters. XLIFF 1.2 focuses on how to store and organize the extracted parts. Skeletons can be either embedded directly in the XLIFF document with the <internal-file> element or simply referred to with the <external-file> element.

The text extracted from the original source material is stored in translation units. Each <trans-unit> element contains a <source> element where the original text is copied. The translation goes into a corresponding <target> element. The content of the <target> element depends on the stage at which the document is. Often tools set the initial translation text to the source text.

Abstracted Inline Codes: Inline codes (e.g. markers for bold or italics, links information, or image references) can be represented using either an encapsulation mechanism or a placeholder method. Those are derived respectively from TMX (LISA's Translation Memory Exchange Standard), and OpenTag, a localization data container...

Since XLIFF is a standard interchange format for localization, it offers benefits over native file formats when moving between tools. Tools that interpret the XLIFF format understand not only the content, but also the meta data, e.g. status flags, memo information, source and target text, etc. This allows for seamless transfer of information between tools and achieves the desire in the mission statement that XLIFF be tool neutral.

There are now commercial tools [see listing] on the market supporting the XLIFF standard. These fall into two categories. There are those that support the standard itself and interpret the XLIFF statements in a document. Since XLIFF is based on XML, there are also some tools that support XLIFF via a standard xml layer...

Future Work on XLIFF

A document Requirements and Goals for XLIFF 2.0 has been published by members of the XLIFF TC. It was uploaded to the XML TC document repository on January 15, 2008 by Bryan Schnabel. Excerpts:

A list of goals was evaluated by the XLIFF TC, and categorized into one of three ratings:

  • Must have: Has to be part of the next release
  • Should have: Will probably be part of the next release
  • Nice to have: Good ideas that we want to record but are not required to be in the next release (could be goals we accomplish, but do not tie to the approval of XLIFF 2.0)

There were a number of items discarded from consideration. They are not listed in this document, but are recored in TC meeting minutes.

1. Must Have Requirements

Each of these are rated as Must have - Has to be part of the next release.

  • Create an XLIFF 2.0 Requirements Document similar to XSLT 2.0 Requirements Document
  • Create a bare-bones XLIFF core, then provide extension modules that support additional features. "Simplify, Extend and Clarify" (Core XLIFF functionality would be in primary XLIFF namespace.) (Extensions would provide support for more complicated features like, Segmentation, Pre-Translation, Generic Inline Markup, etc.).
  • Reference implementations or toolkit(s) similar to DITA
  • XLIFF validator tool to be included in a toolkit

2. Should Have Requirements

Each of these are rated as Should have - Will probably be part of the next release.

  • Submitting the specification to ISO once XLIFF becomes an OASIS standard
  • Additional Representation Guides (DITA,.NET / Win32, Docbook)
  • Subsetting / restricting XSD
  • Version control Metadata - need to track revision information - V2.0

3. Nice to Have Requirements and Goals

Each of these are rated as Nice to have - Good ideas that we want to record but are not required to be in the next release (could be goals we accomplish, but do not tie to the approval of XLIFF 2.0).

  • Ability to store Multilingual Content. Note: will be promoted to Must or Should if a champion is assigned
  • XLIFF Round Trip tool could be modified to support, Note: A goal that will be done, but not a requirement for XLIFF 2.0).
    • Feature updates: Update to XLIFF 1.2; Add option to produce minimalist or maximalist XLIFF round trip
    • DITA/ITS -> XLIFF -> Translated DITA model
    • Change the xliffRoundTrip Tool license from GPL to Apache

XLIFF Technical Committee Charter

As clarified in June 2006, the XLIFF TC is chartered with the following goals:

The purpose of the OASIS XLIFF TC is to define, through extensible XML vocabularies, and promote the adoption of, a specification for the interchange of localisable software and document based objects and related metadata. To date, the committee has published two specifications - XLIFF 1.0 and XLIFF 1.1 - that define how to mark up and capture localisable data that will interoperate with different processes or phases without loss of information. The specifications are tool-neutral, support the entire localization process, and support common software and document data formats and mark-up languages. The specifications provide an extensibility mechanism to allow the development of tools compatible with an implementer's data formats and workflow requirements. The extensibility mechanism provides controlled inclusion of information not defined in the specification.

The state of software and documentation localisation before XLIFF was that a software or documentation provider delivered their localisable resources to a localisation service provider in a number of disparate file formats. Once software providers and technical communicators commenced implementing XLIFF, the task of interchanging localisation data was simplified. Using proprietary and/or non-standard resource formats force either the source provider or the localisation service provider to implement costly and inefficient pre-processing of localisable content. For publishers with many proprietary or non-standard formats, this requirement becomes a major hurdle when attempting to localise their software. For software developers and technical communicators employing enterprise localisation tools and processes, XLIFF defines a standard but extensible vocabulary that captures relevant metadata for any point in the lifecycle which can be exchanged between a variety of commercial and open-source tools.

The first phase, completed 31 October 2003, created a 1.1 version committee specification that concentrated on software UI resource file localisable data requirements. The next phase consists of promoting the adoption of XLIFF throughout the industry through additional collateral and specifications, continuing to advance the committee specification towards an official OASIS standard, and revising the XLIFF spec to 1.2 version to support document based content segmentation and alignment requirements. To encourage adoption of XLIFF, the TC will define and publish implementation guides for some of the most commonly used resource formats (HTML, Java Resource Bundles, and gettext PO Files).

Principal References


Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation

Primeton

XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Bottom Globe Image

Document URI: http://xml.coverpages.org/ni2008-02-11-a.html  —  Legal stuff
Robin Cover, Editor: robin@oasis-open.org