CP RSS Channel
About Our Sponsors
Articles & Papers
Technology and Society
|New OASIS Standard: XML Localization Interchange File Format (XLIFF) v1.2.|
OASIS has announced the approval of the XML Localization Interchange File Format (XLIFF) specification Version 1.2 as an OASIS Standard. The specification was produced by members of the OASIS XML Localisation Interchange File Format (XLIFF) Technical Committee.
The purpose of the XLIFF vocabulary is to store localizable data and carry it from one step of the localization process to the other, while allowing interoperability between tools. The specification is tool-neutral, supports the entire localization process, and supports common software, document data formats, and markup languages. The specification provides an extensibility mechanism to allow the development of tools compatible with an implementer's data formats and workflow requirements. The extensibility mechanism provides controlled inclusion of information not defined in the specification.
The XLIFF file format serves as a container for externalized data to be interchanged between software
publishers, documentation writers (including, but not limited to documents written in DITA,
Docbook, HTML, and other XML document formats), localization tools, and software services
providers in order to facilitate all the phases of the localization process.
XLIFF is an XML-based vocabulary. Use of XLIFF is represented in the DITA Translation Subcommittee, and will be featured in a translation best practices document. XLIFF has a working relationship with LISA/OSCAR standards related to Translation and Localization, and is a requirement for several LISA standards: (1) TMX supports XLIFF inline markup, where TMX Version 2.0 was in progress [public review draft] as of March 28, 2007; (2) Global Information Management Metrics eXchange (GMX), where XLIFF is a requirement; (3) xml:tm, where XLIFF is a requirement.
Statements of "successful use" with XLIFF Version 1.2 have been provided by Lionbridge Inc., SDL International, OSCAR, LISA, Idiom Technologies Inc., and Localisation Research Centre (LRC), University of Limerick.
XLIFF Version 1.2. OASIS Standard. 01-February-2008. Also available in PDF format [source]. Edited by Yves Savourel, John Reid, Tony Jewtushenko, and Rodolfo M. Raya. Namespace: urn:oasis:names:tc:xliff:document:1.2. See the specification drafts for XLIFF Version 1.2, and URIs for the OASIS Standard.
Final Version 1.2 includes two XML schemas: strict and transitional. XLIFF is specified in two "flavors". Indicate which of these variants you are using by selecting the appropriate schema. The schema may be specified in the XLIFF document itself or in an OASIS catalog. The namespace is the same for both variants. Thus, if you want to validate the document, the tool used knows which variant you are using. Each variant has its own schema that defines which elements and attributes are allowed in certain circumstances... For 'Transitional', applications that produce older versions of XLIFF may still use deprecated items; for 'Strict', all deprecated elements and attributes are not allowed, so obsolete items from previous versions of XLIFF are deprecated and should not be used when writing new XLIFF documents.
Representation Guide for HTML. One of the "XLIFF-Related Documents" referenced in the text of the XLIFF TC's v1.2 submission for OASIS approval. Edited by Yves Savourel, Bryan Schnabel, Tony Jewtushenko, and Doug Domeny. Source HTML and PDF.
"This document describes how HTML (in its different flavors), should be coded when extracted to an XLIFF document... As different tools may provide different filters to extract the content of HTML documents it is important for interoperability that they represent the extracted data in identical manner in the XLIFF document. The intent of this document is to provide a set of guidelines to represent HTML data in XLIFF. It offers a collection of recommended mapping of the HTML elements and attributes developers of XLIFF filters can implement, and users of XLIFF utilities can rely on to insure a better interoperability between tools.
Many HTML documents are generated dynamically, in some cases using server-side script files which are often made of a mixture of HTML constructs and server-side instructions written in one of the server-side languages such as PHP, JSP, ASP, or many others. While such source documents are generally outside of the scope of this document, an effort is made to try to address some of the issues you may run into when extracting such source documents...
There are many ways to process a source HTML document and create its corresponding XLIFF output. One interesting method is to make use of XML standards, such as XSLT, XPath, or XSL-FO. Of these, XSLT is a particularly good tool for transforming HTML to XLIFF, and XLIFF back to HTML. Another method to extract HTML files to XLIFF is to use custom filter applications. Such tools can be written in a variety of programming and scripting languages such as Perl, Python, C, C++, C#, Java, and so forth. This document makes no assumption on the type of language used to process the HTML input documents. It also makes no assumptions whether or not the tool creates a Skeleton file along with the XLIFF document generated, or if it creates one, how data are represented in the Skeleton.
Representation Guide for Java Resource Bundles. One of the "XLIFF-Related Documents" referenced in the text of the XLIFF TC's v1.2 submission for OASIS approval. Edited by Tony Jewtushenko and Rodolfo M. Raya. Source HTML and PDF.
"This document describes how Java Resource Bundles, should be coded when extracted to an XLIFF document... As different tools may provide different filters to extract the content of Java Resource Bundles, it is important for interoperability that they represent the extracted data in identical manner in the XLIFF document. The intent of this document is to provide a set of guidelines to represent data contained in Java Resource Bundles as XLIFF content. It offers a collection of recommended mapping of Java Resource Bundles that developers of XLIFF filters can implement, and users of XLIFF utilities can rely on to insure a better interoperability between tools.
Representation Guide for Gettext PO. One of the "XLIFF-Related Documents" referenced in the text of the XLIFF TC's v1.2 submission for OASIS approval. Edited by Asgeir Frimannsson, Tony Jewtushenko, and Rodolfo M. Raya. Source HTML and PDF.
"This document defines a guide for mapping the GNU Gettext PO (Portable Object) file format to XLIFF (XML Localisation Interchange File Format). There are two types of PO files: PO Template files (POTs) and Language specific PO files (POs). POTs contains a skeleton header, followed by the extracted translation units. POTs are generated by the xgettext extraction tool and are not meant to be edited by humans. POTs are converted into Language Specific POs by the msginit tool, and these files are then edited by translators.
When source code is updated, a new POT is generated for the project, and the changes from previous versions are incorporated into the existing translations by using the msgmerge tool. This tool inserts new translation units into the existing PO files, marks translation units no longer in use as obsolete, and updates any references and extracted comments.
Translated PO files are converted to binary resource files, known as MO (Machine Object) files, by the msgfmt tool. The Gettext library use MO files at run time; hence PO files are only used in the development and localisation process...
OASIS, the international open standards consortium, today announced that its members have approved the XML Localisation Interchange File Format (XLIFF) version 1.2 as an OASIS Standard, a status that signifies the highest level of ratification. Developed through an open process by the OASIS XLIFF Technical Committee, the new standard defines a vocabulary for storing localizable data and carrying it from one step of the localization process to another.
"XLIFF enables interoperability between tools throughout the digital content localization lifecycle.
It provides publishers with a standard data interchange container that can be understood by any
localization provider," noted Tony Jewtushenko, who co-chairs the OASIS XLIFF Technical
Committee with Bryan Schnabel.
"XLIFF is a powerful and concise format for content that needs to be translated," said Schnabel.
"Until now, we've had to develop custom mechanisms for data providers and translators to
accomplish localization. With XLIFF, we have an open standard that is efficient, predictable, and
transferable for tool makers and localization service providers, as well as for content owners."
The new standard defines how to mark up and capture localizable data that can interoperate with
different processes or phases without loss of information. Tool-neutral, XLIFF supports the entire
localization process, including common software, document data formats and markup languages. It
provides an extensibility mechanism to allow the development of tools compatible with an
implementer's data formats and workflow requirements. The extensibility mechanism supports
controlled inclusion of information not expressly defined in the specification.
"OASIS members have succeeded in reducing the complexity of localizing software by providing a
standard, XML-based, end-to-end resource container," explained James Bryce Clark, director of
standards development at OASIS. "Software and documentation publishers can extract content into
XLIFF and localize it — over and over again — using shrink-wrapped solutions, customized tools or
even automated enterprise workflow systems. XLIFF's built-in support for Computer Aided
Translation technologies such as translation memory and machine translation add even greater
Successful use of XLIFF 1.2 was verified by Lionbridge, the Localization Industry Standards
Association (LISA), the University of Limerick Localisation Research Centre (LRC), and SDL, in
accordance with eligibility requirements for all OASIS Standards. XLIFF was developed under the
Royalty-Free on RAND Terms Mode of the OASIS Intellectual Property Rights Policy, which
generally requires committee participants to license their Essential Claims using royalty-free
To encourage widespread adoption, the OASIS XLIFF Technical Committee continues its work
defining implementation guides for some of the most commonly used resource formats (HTML,
Java Resource Bundles, and Gettext PO Files). Participation in the Committee remains open to all
government agencies, companies, non-profit groups, academic institutions, and individuals. Archives of the work are publicly accessible, and OASIS offers a mechanism for public comment.
Support for the XLIFF OASIS Standard
"The XLIFF standard is a vital part of the translation tool kit, supporting our efforts to move XMLbased
structured content seamlessly through the translation process," said JoAnn T. Hackos, PhD,
president, Comtech Services and chair of the OASIS DITA Translation Subcommittee.
"As an early adopter and promoter of XLIFF, Lionbridge continues to strongly embrace XLIFF as
an open standard for the localization industry. As usage increases, our customers appreciate the
flexibility of the native XLIFF support provided within Logoport today. After many years of
domination by proprietary formats, the XLIFF standard facilitates interoperability in a fragmented localization tools industry, providing customers and practitioners with the freedom to choose their
language tools," said Eric Blassin, vice president, Language Technology, Lionbridge.
"One important factor for SAP's success in the marketplace is our broad experience with
globalization, internationalization, localization and translation of our solutions. We congratulate the
working group for another important milestone," said Michael Bechauf, Vice President, Industry
Standards at the Global Ecosystem and Partner Group, SAP.
Members of the XLIFF TC have prepared a White Paper covering XLIFF Version 1.2. This white paper is provided as a high level guide to anyone who seeks to better understand XLIFF in
general terms, with particular emphasis on XLIFF 1.2's features. Originally authored in October 2007 (Revision 1.0), the latest version of this 34-page document was submitted to the XLIFF TC document repository by Bryan S. Schnabel (XLIFF TC co-chair) on January 15, 2008. Excerpts:
When XLIFF was initially envisioned many years ago, it was in response to the growing complexity of the
software localization process. At the time, the Internet 'revolution' was forcing software publishers to
converge upon new technologies based on software standards such as HTML, XML and Java. Although
these standards were designed with the international market in mind and were intended to simplify the
development of globalized applications, they had the opposite effect on localization. The proliferation of
so many disparate software resource formats meant that the process of localising Internet based
applications was complex, expensive and opaque. Software publishers seeking to localize these products
had to choose between two equally expensive options: develop complex localization tools internally, or
outsource the entire localization process as a "black box".
In addition to responding to the complexity of the software localization process, XLIFF also began to be
considered as a solution to the equally complex documentation and technical communication localization
process. Challenges in translating proprietary word processor and desktop publishing files, and even
challenges having to do with the age-old task of localizing text strings in graphics showed proved to be a
potential good fit for XLIFF.
XLIFF reduces this complexity of localising software by providing a standard, XML-based, end-to-end,
tool neutral resource container. Software and documentation publishers can extract their localizable
content into XLIFF and localize them using shrink-wrapped tools solutions, customized tools or
automated enterprise workflow systems. Additional process efficiency is achieved by XLIFF's built-in
support for Computer Aided Translation technologies such as translation memory and machine
Architecture: XLIFF is based on the concept of extracting the source localization-related data from the original format, and merging it back in place after the localization has been done. Depending on the extract/merge method, the parts that are not related to localization can be preserved
temporarily into the Skeleton. Or, usually when the source is already XML, the non localizable parts can
be preserved within the XLIFF hierarchy using <group> elements to preserve the hierarchy.
There are no rules to date on how to represent the data in the Skeleton itself, this is left to the discretion
of the filters. XLIFF 1.2 focuses on how to store and organize the extracted parts. Skeletons can be either
embedded directly in the XLIFF document with the <internal-file> element or simply referred to with
the <external-file> element.
The text extracted from the original source material is stored in translation units. Each <trans-unit>
element contains a <source> element where the original text is copied. The translation goes into a
corresponding <target> element. The content of the <target> element depends on the stage at which
the document is. Often tools set the initial translation text to the source text.
Abstracted Inline Codes: Inline codes (e.g. markers for bold or italics, links information, or image references) can be represented
using either an encapsulation mechanism or a placeholder method. Those are derived respectively from
TMX (LISA's Translation Memory Exchange Standard), and OpenTag, a localization data container...
Since XLIFF is a standard interchange format for localization, it offers benefits over native file formats
when moving between tools. Tools that interpret the XLIFF format understand not only the content, but
also the meta data, e.g. status flags, memo information, source and target text, etc. This allows for
seamless transfer of information between tools and achieves the desire in the mission statement that
XLIFF be tool neutral.
There are now commercial tools [see listing] on the market supporting the XLIFF standard. These fall into two
categories. There are those that support the standard itself and interpret the XLIFF statements in a
document. Since XLIFF is based on XML, there are also some tools that support XLIFF via a standard xml
A document Requirements and Goals for XLIFF 2.0 has been published by members of the XLIFF TC. It was uploaded to the XML TC document repository on January 15, 2008 by Bryan Schnabel. Excerpts:
A list of goals was evaluated by the XLIFF TC, and categorized into one of three ratings:
- Must have: Has to be part of the next release
- Should have: Will probably be part of the next release
- Nice to have: Good ideas that we want to record but are not required to be in the next release (could be goals we accomplish, but do not tie to the approval of XLIFF 2.0)
There were a number of items discarded from consideration. They are not listed in this document,
but are recored in TC meeting minutes.
1. Must Have Requirements
Each of these are rated as Must have - Has to be part of the next release.
- Create an XLIFF 2.0 Requirements Document similar to XSLT 2.0 Requirements Document
- Create a bare-bones XLIFF core, then provide extension modules that support additional features. "Simplify, Extend and Clarify" (Core XLIFF functionality would be in primary XLIFF namespace.) (Extensions would provide support for more complicated features like, Segmentation, Pre-Translation, Generic Inline Markup, etc.).
- Reference implementations or toolkit(s) similar to DITA
- XLIFF validator tool to be included in a toolkit
2. Should Have Requirements
Each of these are rated as Should have - Will probably be part of the next release.
- Submitting the specification to ISO once XLIFF becomes an OASIS standard
- Additional Representation Guides (DITA,.NET / Win32, Docbook)
- Subsetting / restricting XSD
- Version control Metadata - need to track revision information - V2.0
3. Nice to Have Requirements and Goals
Each of these are rated as Nice to have - Good ideas that we want to record but are not required to be in the next release (could be goals we accomplish, but do not tie to the approval of XLIFF 2.0).
- Ability to store Multilingual Content. Note: will be promoted to Must or Should if a champion is assigned
- XLIFF Round Trip tool could be modified to support, Note: A goal that will be done, but not a requirement for XLIFF 2.0).
- Feature updates: Update to XLIFF 1.2; Add option to produce minimalist or maximalist XLIFF round trip
- DITA/ITS -> XLIFF -> Translated DITA model
- Change the xliffRoundTrip Tool license from GPL to Apache
As clarified in June 2006, the XLIFF TC is chartered with the following goals:
The purpose of the OASIS XLIFF TC is to define, through extensible XML vocabularies, and promote the adoption of, a specification for the interchange of localisable software and document based objects and related metadata. To date, the committee has published two specifications - XLIFF 1.0 and XLIFF 1.1 - that define how to mark up and capture localisable data that will interoperate with different processes or phases without loss of information. The specifications are tool-neutral, support the entire localization process, and support common software and document data formats and mark-up languages. The specifications provide an extensibility mechanism to allow the development of tools compatible with an implementer's data formats and workflow requirements. The extensibility mechanism provides controlled inclusion of information not defined in the specification.
The state of software and documentation localisation before XLIFF was that a software or documentation provider delivered their localisable resources to a localisation service provider in a number of disparate file formats. Once software providers and technical communicators commenced implementing XLIFF, the task of interchanging localisation data was simplified. Using proprietary and/or non-standard resource formats force either the source provider or the localisation service provider to implement costly and inefficient pre-processing of localisable content. For publishers with many proprietary or non-standard formats, this requirement becomes a major hurdle when attempting to localise their software. For software developers and technical communicators employing enterprise localisation tools and processes, XLIFF defines a standard but extensible vocabulary that captures relevant metadata for any point in the lifecycle which can be exchanged between a variety of commercial and open-source tools.
The first phase, completed 31 October 2003, created a 1.1 version committee specification that concentrated on software UI resource file localisable data requirements. The next phase consists of promoting the adoption of XLIFF throughout the industry through additional collateral and specifications, continuing to advance the committee specification towards an official OASIS standard, and revising the XLIFF spec to 1.2 version to support document based content segmentation and alignment requirements. To encourage adoption of XLIFF, the TC will define and publish implementation guides for some of the most commonly used resource formats (HTML, Java Resource Bundles, and gettext PO Files).
XLIFF Version 1.2 review and approval announcements:
XLIFF Version 1.2 Specification:
XLIFF References and Resources:
|Receive daily news updates from Managing Editor, Robin Cover.|