CP RSS Channel
About Our Sponsors
Articles & Papers
Technology and Society
|Translation Memory Exchange|
[September 10, 2001] "TMX stands for Translation Memory eXchange. OSCAR (Open Standards for Container/Content Allowing Re-use) is the LISA Special Interest Group responsible for its definition. The purpose of TMX is to allow easier exchange of translation memory data between tools and/or translation vendors with little or no loss of critical data during the process."
The TMX 1.4a Specification (OSCAR Recommendation, 10-July-2002) "defines the Translation Memory eXchange format (TMX). The purpose of the TMX format is to provide a standard method to describe translation memory data that is being exchanged among tools and/or translation vendors, while introducing little or no loss of critical data during the process... TMX is defined in two parts: (1) A specification of the format of the container (the higher-level elements that provide information about the file as a whole and about entries). In TMX, an entry consisting of aligned segments of text in two or more languages is called a Translation Unit (the <tu> element); (2) A specification of a low-level meta-markup format for the content of a segment of translation-memory text. In TMX, an individual segment of translation-memory text in a particular language is denoted by a <seg> element. TMX is XML-compliant. It also uses various ISO standards for date/time, language codes, and country codes. TMX files are intended to be created automatically by export routines and processed automatically by import routines. TMX files are 'well-formed' XML documents that can be processed without explicit reference to the TMX DTD. However, a 'valid' TMX file must conform to the TMX DTD, and any suspicious TMX file should be verified against the TMX DTD using a validating XML parser..."
[August 21, 1998] The specifications for TMX have recently been revised (Version 1.1, Aug-12-1998), icluding the TMX 1.1 Document Type Definition and the detailed TMX Implementation Notes. "The purpose of the TMX format is to provide a standard method to describe translation memory data that is being exchanged among tools and/or translation vendors, while introducing little or no loss of critical data during the process. TMX is XML-compliant. It also uses various ISO standards for date/time, language codes, and country codes." OSCAR (Open Standards for Container/Content Allowing Re-use) is the LISA Special Interest Group responsible for the definition of TMX.
[May 13, 1999] Developed under the Localisation Standards Industry Association, TMX is designed "to allow easier exchange of translation memory data between tools and/or translation vendors with little or no loss of critical data during the process. TMX is defined in two parts: 1) A specification of the format of the container (the higher-level elements that provide information about the file as a whole and about entries). In TMX, an entry consisting of aligned segments of text in two or more languages is called a Translation Unit (the <TU> element); 2) A low-level meta-markup format for the content of a segment of translation-memory text. In TMX, an individual segment of translation-memory text in a particular language is called a <SEG> element."
"TMX is XML-compliant (and therefore SGML-compliant as well). It also uses various ISO standards for date/time, language codes, and country codes. TMX files are intended to be created automatically by export routines and processed automatically by import routines. TMX files are 'well-formed' XML documents that can be processed without explicit reference to the TMX DTD. However, a 'valid' TMX file must conform to the TMX DTD, and any suspicious TMX file should be validated against the TMX DTD using a general-purpose XML parser. XML 'well-formed' documents may start with the XML processing statement, but it is not required."
Tony Graham wrote: "While you may find concepts from the TMX work that are useful to you, TMX stands for Translation Memory eXchange, and is concerned with importing and exporting portions of translation memory -- phrases that have been translated once and saved so they don't need to be translated again -- between translation tools. TMX has structures for parallel portions of text in multiple languages, but there is no concept that these chunks of text can, should, or will string together to make a coherent "document", in anybody's sense of the word. The only markup in a TMX document, which is in XML, is concerned with delimiting and identifying the parallel chunks of text for the purposes of the translation tool: other markup from the source document may be saved in the TMX document (with significant XML characters escaped with entities) but only as a translation aid for those tools that can use it." [XML-DEV mailing list, 1998-11-09]
Translation Memory eXchange - Main Page
"TMX 1.4b Specification. OSCAR Recommendation. October 07, 2004. Edited by Yves Savourel. Latest version URL: http://www.lisa.org/tmx/tmx.htm. Copyright (c) The Localisation Industry Standards Association [LISA] 1997-2004. All Rights Reserved.
TMX Specification page
TMX 1.4a Specification. OSCAR Recommendation, 10 July 2002. Edited by Yves Savourel. Copyright The Localisation Industry Standards Association (LISA).
Translation Memory Exchange Standards Mailing List. Yahoo Group. Send mail to firstname.lastname@example.org. List started apparently in September 2001.
[September 30, 2002] "The Importance of TMX." By David Pooley (SDL International). In Globalization Insider: The LISA Newsletter Volume XI, Number 3.6 (September 26, 2002). ISSN: 1420-3693. TMX Special Issue. "TMX stands for Translation Memory eXchange. OSCAR (Open Standards for Container/Content Allowing Re-use) is the LISA Special Interest Group responsible for its definition. OSCAR members include translation tools developers, service providers and other interested parties (e.g., large translation clients). They came together over five years ago to specify a way in which translation memory data could be exchanged between tools and/or vendors with little or no loss of critical data in the process. OSCAR has recently voted TMX version 1.4 as an accepted standard... TMX is an XML format for the interchange of translation memory data. As such, it consists of elements (with attributes) that provide information about translation "segments". The size of a segment is not pre-defined and it will usually be a phrase, sentence or paragraph. For most tools using TMX, the default segment size is a sentence. Within each segment of TMX, there are optionally elements that provide information about the formatting contained in the segment (change of font, hyperlink etc.). TMX also provides for the definition of text "subflows" such as footnotes and index entries..."
[September 30, 2002] "TMX 1.4a." By Yves Savourel, (OSCAR). In Globalization Insider: The LISA Newsletter Volume XI, Number 3.6 (September 26, 2002). ISSN: 1420-3693. TMX Special Issue. "... a standard is only as good as its implementations. TMX follows that rule as well. A compliance kit is incorporated with the new version. This should help developers to implement solid and interoperable TMX functionalities. Tools vendors can develop import and export functions so their applications can read and write TMX documents. Those TMX files must be valid, that is: well-formed XML that can be validated against the TMX DTD. However some aspects of the implementation cannot be verified by the DTD (for example: what type of inline elements the document uses to enclose inline codes). One way to verify a tool does a good job is to provide test case and check that the model TMX documents [in the compliance kit] are the same as the ones generated by the tool. TMX 1.4a has two [certification] levels: Level 1 is for TM with no inline codes (e.g., strings from a resource file), Level 2 is for formats that have inline codes (e.g., HTML content, where bold, italics, etc. are inline codes). Depending on what type of original format you are working with, you should get TMX Level 1 or Level 2. A tool that offer HTML support but doesn't generate TMX document with inline codes is not TMX-compliant. Also keep in mind that tools may perhaps only import TMX or only export TMX (or do both). There are compliance tests for each of those aspects... In general standards such as TMX, OLIF, TBX, or XLIFF are good because they allow the users to have their assets - whatever they are - stored in a common and open format. This permits them to use various applications with the same data, and to migrate to newer and better tools without loosing too much data..." [Next version:] "Some additional work to be done would be to provide an XML schema for TMX, in addition to the current DTD, so we can take advantage fully of XML features. A possible addition, linked to XML Schema, would be to allow for non-TMX constructs inside a TMX document, using XML namespaces. This would be more flexible than the <prop> element and the ts attribute currently used for extensibility purpose. And finally, there is the yet-to-be-resolved issue of segmentation. This is not a problem specific to TMX - it affects any TM repository and translation tool in general. Hopefully the Segmentation and Word Count Working Group newly created at OSCAR will be able to bring some solution to the problem, But this will take time..."
- TMX Format. Version 1.3. August 29, 2001. In Translation Memory Exchange (TMX) version 1.3 the lang attribute by xml:lang. The lang attribute is now deprecated. [cache version 1.3]
- TMX Format - Description. [earlier local archive copy, 97-12-12]
- TMX DTD version 1.3 [cache]
- TMX version 1.3 documentation package [cache]
- Version 1.2 DTD, [cache]
- TMX 1.1 Document Type Definition, [local archive copy]
- TMX specifications
- TMX compliance procedures
- TMX resource package
- TMX Format Implementation Notes
[November 09, 2001] XSL Template Collection for XLIFF/TMX. November 09, 2001. 'A set of XSL templates to execute various tasks. It includes for example: XLIFF to Java properties file conversion, XLIFF to TMX, TMX to tab-delimited, Leveraging of existing translation into an XLIFF document, conversion to UTF-8 encoding for any XML document, etc.' From the posting of Yves Savourel 2001-11-09 (and see the README): "...a note to let you know that there is now a small collection of XSL templates freely available that offers utilities for XLIFF and TMX. For now it includes 6 templates: (1) LeverageXLIFF.xsl - Leverages the existing translation of a XLIFF document into a newer XLIFF document. (2) XLIFFToPO.xsl - Converts the <target> elements of an XLIFF document into a PO (Portable Object) file. (3) XLIFFToProperties.xsl - Converts the <target> elements of an XLIFF document into a Java properties files. (4) XLIFFToTMX.xsl - Converts an XLIFF document into a TMX document. (5) TMXToTDF.xsl - Converts the entries of a TMX document into a tab-delimited file. (6) ToUTF8.xsl - Converts any XML file into UTF-8 encoding. More will come later (suggestions are welcome)..." See also "XML Localization Interchange File Format (XLIFF)."
- Press release: "Draft translation memory exchange (TMX) standard released. LISA's OSCAR initiative produces first results." - ". . . maturity is also reflected in OSCAR's concern to build on existing and emerging standards such as Unicode and XML in order to shorten development time and avoid reinventing the wheel. . . The OSCAR group combines key technology vendors, corporate users and service providers worldwide, including such names as AlpNet, IBM, ILE, ITP, Logos, Microsoft, Multiling, Star, Systran, and Trados. Further support for its work has now come through the participation of an OTELO project representative. Co-sponsored by the European Commission, the OTELO project aims to design and develop a comprehensive automated translator's environment, uniting programs such as translation memory, terminology management systems and machine translation under a single interface."
- OSCAR - The primary task for the OSCAR (Open Standard for Container and Content for Reuse) project is to define a standard format to facilitate the exchange of translation memories between various translation tools.
- TMX and OpenTag
- DTD for TXM ("LISA OSCAR:1997//DTD for Translation Memory eXchange//EN"), November 25, 1997. [local archive copy]
- TMX resource package (November 1997); [local archive copy]
- Email for the supporting LISA (Localisation Industry Standards Association) special interest group: email@example.com
- Related information on OSCAR (Open Standards for Container/Content Allowing Re-Use) - name for a future cluster of data exchange standards for machine translation tools
- [October 27, 1998] "The Technical Content of TMX 1.1." By Sarah Carroll. In Multilingual Computing and Technology [#22] Volume 9, Issue 6 (October 1998), pages 48-49. [ISSN: 1065-7657.] Summary: "TMX 1.1. - A format for the exchange of data between competing translation database systems."
- [September 10, 2001] "Use XML as a Java Localization Solution. The reusability that XML affords TMX-formatted data benefits Java internationalization development." By Masaki Itagaki. From LISA web site. "Java has been one of the best programming languages for global market-oriented application development since JDK 1.1 covered basic components for internationalization. Java has many internationalization approaches supporting such aspects as Unicode 2.0, multilingual environment, and Locale objects, to name a few. However, you still have to consider the daunting, fundamental work that is required for a global market, which means translating all text items such as labels, messages, menu items, and so on. Even for these kinds of localization issues, Java offers a nice solution in the ResourceBundle class. You can extract all the text items from original source codes, isolating them into ResourceBundle components such as a ListResourceBundle class or a property file. Although such a scheme makes a developer's life much easier, it's rather clumsy from the translation point of view, especially in terms of reusability of translations. In the localization industry, Translation Memory eXchange (TMX) is a standardized data format that uses XML for software and document translation assets. Most of the commercial translation tools can use the TMX file to reuse translation data. Translators who want to use the TMX solution for Java must implement their own data conversion between TMX and ResourceBundle data... Since 1997 the localization industry has put a lot of effort into standardizing a translation data format. The Localization Industry Standards Association (LISA), a nonprofit internationalization and localization organization, formed a special interest group called Open Standards for Container/Content Allowing Reuse (OSCAR) to define a translation memory data format and publish the TMX standard. This is simply XML-formatted data defining elements and attributes that are necessary to organize translation data efficiently... Most benefits of the TMXResourceBundle class are on the development side. Since the number of words usually determines the cost of translation, requesting translation of the same items is not cost efficient. Using TMX's DTD, you can also embed such information as a package name, a class name, and a project name. This gives you an exact match in translation data, which enables you to extract only new items. Meanwhile, if you want to achieve consistency between software translation and document translation (such as guides, manuals, and even computer-based training programs), TMX proves to be a great solution. By importing your Java TMX file into any translation tool, you can reuse Java translations through a word book or glossary functions, which are included in most translation tools. Thus, TMX benefits not just the translation industry, but Java internationalization development, as well..." Article originally published in JavaPro Magazine.
|Receive daily news updates from Managing Editor, Robin Cover.|