The Cover PagesThe OASIS Cover Pages: The Online Resource for Markup Language Technologies
SEARCH | ABOUT | INDEX | NEWS | CORE STANDARDS | TECHNOLOGY REPORTS | EVENTS | LIBRARY
SEARCH
Advanced Search
ABOUT
Site Map
CP RSS Channel
Contact Us
Sponsoring CP
About Our Sponsors

NEWS
Cover Stories
Articles & Papers
Press Releases

CORE STANDARDS
XML
SGML
Schemas
XSL/XSLT/XPath
XLink
XML Query
CSS
SVG

TECHNOLOGY REPORTS
XML Applications
General Apps
Government Apps
Academic Apps

EVENTS
LIBRARY
Introductions
FAQs
Bibliography
Technology and Society
Semantics
Tech Topics
Software
Related Standards
Historic
Created: March 08, 2002.
News: Cover StoriesPrevious News ItemNext News Item

ISO Working Group on Coding Systems Outlines New Language Encoding Initiatives.

A document prepared by Håvard Hjulstad (Convener of ISO/TC37/SC2/WG1 'Coding systems') outlines a number of important language encoding initiatives that are to be undertaken within the framework of ISO/TC37/SC2/WG1. The language identification codes of ISO 639-1 (alpha-2 code) and ISO 639-2 (alpha-3 code) were designed to meet the needs of terminology and library applications, but are judged inadequate as a basis for language-based text processing within Information and Communication Technology (ICT) industries. XML 1.0 Second Edition normatively references RFC 3066 ("Tags for the Identification of Languages"), which relies upon the ISO 639 language codes. The Convener notes a recognized need to "expand the current set of language identifiers and language identification mechanisms greatly; there may be a need for identifiers for 15-20 times as many linguistic units as the current [code] tables provide." Eleven (11) candidate projects are identified in the document, including: (1) a model for language identification [definitions for 'language', 'individual language', 'language variant', 'dialect']; (2) language identification structure [geographical variation, variation as to script, writing system, and orthography, temporal variation, stylistic variation]; (3) linguistic unit description format; (4) description of linguistic units and default values [script, orthography, geographical area]; (5) resolution of problems in current code tables; (6) further development of ISO 639-1 and ISO 639-2; (7) hierarchical language identifiers [language group identifiers]; (8) additional individual language identifiers [5000-7000 needed]; (9) geographical coordinate information; (10) topic mapping project; (11) mapping with other language identification code sets [e.g., Ethnologue and Linguasphere Register].

Bibliographic information: "Future Development of ISO 639." By Håvard Hjulstad (Convener of ISO/TC37/SC2/WG1 'Coding systems'). Document reference: ISO/TC37/SC2/WG1 N89. Date: 2002-03-04. 4 pages. [source .DOC; cache]

The XML connection: See the XML 1.0 Second Edition specification Section 2.12 (as emended in the 'E29 Substantive erratum'; see "Errata as of 2002-02-20" in "XML 1.0 Second Edition Specification Errata." It reads, with respect to the reserved xml:lang attribute: "The values of the attribute are language identifiers as defined by [IETF RFC 3066], Tags for the Identification of Languages, or its successor." The RFC itself cites ISO 639 as the principal authority for the rules governing the 'Primary-subtag' in the language tag syntax: "All 2-letter subtags are interpreted according to assignments found in ISO standard 639, 'Code for the representation of names of languages' [ISO 639], or assignments subsequently made by the ISO 639 part 1 maintenance agency or governing standardization bodies. (Note: A revision is underway, and is expected to be released as ISO 639-1:2000). All 3-letter subtags are interpreted according to assignments found in ISO 639 part 2, 'Codes for the representation of names of languages -- Part 2: Alpha-3 code [ISO 639-2]', or assignments subsequently made by the ISO 639 part 2 maintenance agency or governing standardization bodies..." See also E11 for the "RFC 1766 / RFC 3066" update. Language-sensitive processing of SGML-encoded text [ISO 8879] also references ISO 639.

Principal references:


Hosted By
OASIS - Organization for the Advancement of Structured Information Standards

Sponsored By

IBM Corporation
ISIS Papyrus
Microsoft Corporation
Oracle Corporation

Primeton

XML Daily Newslink
Receive daily news updates from Managing Editor, Robin Cover.

 Newsletter Subscription
 Newsletter Archives
Bottom Globe Image

Document URI: http://xml.coverpages.org/ni2002-03-08-b.html  —  Legal stuff
Robin Cover, Editor: robin@oasis-open.org