XLIFF Description from XML Internationalization and Localization
XLIFF description extracted/excerpted from pages 383ff. in XML Internationalization and Localization, by Yves Savourel (used with permission):
XML Localisation Interchange File Format
At the end of 2000, a group driven by companies including Oracle, Novell, Sun, and IBM/Lotus started to define an exchange format for translatable data: XLIFF (XML Localisation Interchange File Format). The format is based on the principles defined by Open Tag and borrows some of its tags. It also adopts some of the ideas developed later in TMX and adds a few innovations of its own: project information, pretranslation and history, versioning, binary objects, and so forth. The first draft of XLIFF was released in May 2001. The information provided here is subject to change as the draft is finalized. You can find the latest specifications and more information on XLIFF at http://www.xliff.org. There is also a discussion group at http://groups.yahoo.com/group/DataDefinition.
How It Works
XLIFF is close to OpenTag in many respects, but it is more defined format, enabling fewer possibilities to express the same content in different ways, and therefore offering better interoperability. The format also, for now, specializes in storing text extracted from software-type files and tagged documents. This more specialized aim eliminates the need for some compromises that OpenTag made to accommodate documentation-type data.
The base element of XLIFF is <trans-unit>. It corresponds to a unique item extracted from the original file (label, caption, paragraph, string, and so forth). The content of the item is stored in its <source> element for the source language, and, optionally, its <target> element for the target language. Both <source> and <target> elements contain the text and any inline elements inculded with the text. Note that currently no mechanism is dedicated to break the item into smaller segments, for instance, sentences inside a paragraph.
As you can see from using <source> and <target>, a <file> element can contain only a source and one target language. However, and XLIFF document can contain several <file> elements, and the source and target locales of each <file> element can be different. The xml:lang attribute can be used to indicate the language of the content at any level where there is text. The source-language and target-language attributes in the <file> element indicate the corresponding languages for <source> and <target>. To allow XML tools that are not XLIFF-aware to process XLIFF documents correctly, it is recommended that you still use xml:lang with <source> and <target>, even if some tools could guess the correct locales implicitly.
In XLIFF, the skeleton file can be stored inside the XLIFF documentor in a separate file. When the skeleton file is stored inside the document, you can use a simple CDATA section to encapsulate its body... If the skeleton file is binary it can be coded in Base64 and inserted in the document. To allow the verification that the skeleton data have not changed during the localization process, the tools can use a CRC (Cyclic Redundancy Check) signature through the crc attribute. This mechanism is available throughout the XLIFF document for most of the elements that contain data.
An innovative aspect of XLIFF concerns binary objects. The format offers a way to transport any object and its associated localization metadata (project, phase, and so forth) as part of the document. The object itself, for example a bitmap from a resource file, is either embedded directly in the XLIFF document or referenced to an external file using the same methods as for the skeleton file. The XLIFF tools can make the appropriate calls to choose the relevant applications needed to edit the object. The object is included in the <bin-unit> element that contains a <bin-source> and <bin-target> element. The type of the object is specified in the mime-type attribute of the <bin-unit> (overridden in the <bin-target> element if the translated version of the object is in a different format). Each <bin-unit> element can also contain one or more <trans-unit> elements if you choose to offer some of the object's text in its extracted form as well. Note that the file has no skeleton. This is allowed in XLIFF because, as in our example, you could have an XLIFF document that is only used to transport project and metadata imformation.
Project Information and Versioning
Other advantages of XLIFF compared to OpenTag include the predefined project information, pretranslation candidates, and version tracking data that can be stored along with the extracted text. This information can be coded in OpenTag as well by using tool-specific <prop> elements, not a standardized and dedicated structure. The metadata works as follows: The <header> element can contain a <phase-group> that lists the different steps the file went through. Each <phase> element is uniquely identified in its <file> by a phase-name attribute. Each <trans-unit> can contain a set of <alt-trans> elements that act as suggestions, or can record a list of its privious versions of <source> and <target>. The <target> element (for both <alt-trans> and <trans-unit>) can have a phase-name attribute pointing to the <phase> element [in view at the time] during which the change was made. The <phase> element has the information about tools, date, user, and so forth. A tool can make use of this mechanism to offer a very powerful pretranslation and versioning interface for the different users of the file during the process. [Listing 17.8 shows an example of the use of such tracking metadata.] ... This aspect of XLIFF is important because it goes in the same direction as a trend that translation customers have shown recently: the need to have more control early in the process over the preparation of the localizable files. For example, this enables you to provide exact and fuzzy matches already associated with the source text, bypassing the use of translation memories for this first leverage. It permits the document authors to use other types of leveraging methods (database driven, or ID-based, for example).
Inside the <source> and the <target> elements you can have inline codes. XLIFF offers support for the two main mechanisms:
- The substitution method consists of extracting each native code to the skeleton file and replacing it with a placeholder element. This also is how OpenTag deals with inline codes. The <g> replaces paired codes, while <x/> marks any standalone code. In addition, <bx/> and <ex/> offer a solution for paired codes that overlap and could not be marked up with a <g> element.
- The encapsulation method consists of bracketing the native codes between XLIFF metatags. This is how TMX deals with inline codes. The <bpt> and <ept> elements are used to encapsulate paired codes; the <it> element is used for any isolated part of paired codes; and the <ph> element is used for any other standalone code. If any text occurs inside a sequence of encapsulated native code (for example, the text of an alt attribute in an <img> element in XHTML), you can use the <sub> element to delimit it. You might consider going a step further and creating a specific <trans-unit> for this type of text as well.
XLIFF offers many more powerful features to handle various aspects of the localization process: word count, context, coordinates, font and style information, and so forth. In addition, as with OpenTag, tools can add their own private metadata using the <prop-group> and <prop> elements as well as the ts attribute.
Translating Extracted Text
After the text has been extracted to OpenTag or XLIFF it can be translated using the same process as for any other XML documents. Some of these methods are discribed in Chapter 16, "Using XML to Localize." Any XML-enabled tool will be capable of loading the extracted file. As long as you can specify which elements are translatable, the file can be dealt with like any other XML document. For example, as shown in Figure 17.3, you can load an OpenTag file in TagEditor Some tools even use extracted text formats as standard input. For instance, SDLX uses OpenTag as its default format. You can work directly with any OpenTag document you generate with you own filters. Figure 17.4 shows our sample document when it is loaded into SDL Edit. If no XML-enabled translation environment is available, you can code the extracted text into a more generic format, such as RTF, that can be used in a word processor. Figure 17.5 displays an XLIFF document with and RTF layer opened in Word.
Prepared by Robin Cover for The XML Cover Pages archive. See: "XML Localization Interchange File Format (XLIFF)."