The Internet Engineering Steering Group (IESG) has announced a last call review for the Internet Draft Tags for Identifying Languages, edited by Addison Phillips (webMethods) and Mark Davis (IBM). The IESG intends to make a decision within the next few weeks on the request to approve this document as an IETF Best Current Practice (BCP) RFC.
Commonly referenced as "RFC 3066bis," this working draft of Tags for Identifying Languages is intended to replace Tags for the Identification of Languages (IETF RFC 3066, BCP 47, January 2001). RFC 3066bis describes the "structure, content, construction, and semantics of language tags for use in cases where it is desirable to indicate the language used in an information object. It also describes how to register values for use in language tags and a construct for matching such language tags, including user defined extensions for private interchange."
RFC 3066bis will represent a significant improvement in language identification facility if it is approved as a an IETF BCP that supersedes RFC 3066. Both XML 1.0 and XML 1.1 normatively reference RFC 3066 for purposes of language identification: "In document processing, it is often useful to identify the natural or formal language in which the content is written. A special attribute named xml:lang may be inserted in documents to specify the language used in the contents and attribute values of any element in an XML document. In valid documents, this attribute, like any other, must be declared if it is used. The values of the attribute are language identifiers as defined by IETF RFC 3066, Tags for the Identification of Languages, or its successor..."
The main goals in the revision RFC 3066 are: (1) to maintain backward compatibility, so that all previous codes would remain valid; (2) to reduce the need for large numbers of registrations; (3) to provide a more formal structure to allow parsing into subtags even where software does not have the latest registrations; (4) to provide stability in the face of potential instability in ISO 639, 3166, and 15924 codes — demonstrated instability in the case of ISO 3166; and (5) to allow for external extension mechanisms."
The revision of IETF RFC 3066 represents one of several standards efforts currently underway to enhance intelligent computer processing of machine-readable natural language through the use of language description in markup contexts.
The IESG solicits final comments on the proposal to approve "RFC 3066bis" as an IETF BCP. Feedback should be sent to the relevant IETF mailing lists by 2004-07-05.
Tags for Identifying Languages. By Addison Phillips (Editor, webMethods, Inc.) and Mark Davis (IBM). IETF Network Working Group. Internet Draft. Reference: 'draft-phillips-langtags-03'. June 02, 2004, expires December 1, 2004. 35 pages. Also in PDF format. IETF Source: http://www.ietf.org/internet-drafts/draft-phillips-langtags-03.txt.
From the Introduction to Tags for Identifying Languages
Human beings on our planet have, past and present, used a number of languages. There are many reasons why one would want to identify the language used when presenting or requesting information.
Information about a user's language preferences commonly needs to be identified so that appropriate processing can be applied. For example, the user's language preferences in a brower can be used to select web pages appropriately. A choice of language preference can also be used to select among tools (such as dictionaries) to assist in the processing or understanding of content in different langauges.
In addition, knowledge about the particular language used by some piece of information content may be useful or even required by some types of information processing; for example spell-checking, computer-synthesized speech, Braille transcription, or high-quality print renderings.
One means of indicating the language used is by labeling the information content with a language identifier. These identifiers can also be used to specify user preferences when selecting information content, or for labeling additional attributes of content and associated resources.
These identifiers can also be used to indicate additional attributes of content that are closely related to the language. In particular, it is often necessary to indicate specific information about the dialect, writing system, or orthography used in a document or resource, as these attributes may be important for the user to obtain information in a form that they can understand, or important in selecting appropriate processing resources for the given content.
This document specifies an identifier mechanism, a registration function for values to be used with that identifier mechanism, and a construct for matching against those values. It also defines a mechanism for private use extension and how private use, registered values, and matching interact.
About IETF Best Current Practice (BCP) RFCs
"The BCP subseries of the RFC series is designed to be a way to standardize practices and the results of community deliberations. A BCP document is subject to the same basic set of procedures as standards track documents and thus is a vehicle by which the IETF community can define and ratify the community's best current thinking on a statement of principle or on what is believed to be the best way to perform some operations or IETF process function.
Historically Internet standards have generally been concerned with the technical specifications for hardware and software required for computer communication across interconnected networks. However, since the Internet itself is composed of networks operated by a great variety of organizations, with diverse goals and rules, good user service requires that the operators and administrators of the Internet follow some common guidelines for policies and operations. While these guidelines are generally different in scope and style from protocol standards, their establishment needs a similar process for consensus building.
While it is recognized that entities such as the IAB and IESG are composed of individuals who may participate, as individuals, in the technical work of the IETF, it is also recognized that the entities themselves have an existence as leaders in the community. As leaders in the Internet technical community, these entities should have an outlet to propose ideas to stimulate work in a particular area, to raise the community's sensitivity to a certain issue, to make a statement of architectural principle, or to communicate their thoughts on other matters. The BCP subseries creates a smoothly structured way for these management entities to insert proposals into the consensus-building machinery of the IETF while gauging the community's view of that issue.
Finally, the BCP series may be used to document the operation of the IETF itself. For example, this document defines the IETF Standards Process and is published as a BCP...
Because BCPs are meant to express community consensus but are arrived at more quickly than standards, BCPs require particular care. Specifically, BCPs should not be viewed simply as stronger Informational RFCs, but rather should be viewed as documents suitable for a content different from Informational RFCs.
A specification, or group of specifications, that has, or have been approved as a BCP is assigned a number in the BCP series while retaining its RFC number(s)..." [from Section 5, The Internet Standards Process -- Revision 3]
- Tags for Identifying Languages. Internet Draft 'draft-phillips-langtags-03'. June 02, 2004. Also in PDF format.
- "Tags for Identifying Languages." HTML version. Editors' draft: 'draft-phillips-langtags-04'. June 02, 2004 or later.
- IESG Announcement: "Last Call: 'Tags for Identifying Languages' to BCP.
- Announcement from the editors, noting changes
- Draft-Langtags Issues List
- Inter-Locale Home: Editor's personal website. Contains internationalization content and demos written by Addison Phillips, the Globalization Architect for webMethods and the chair of the W3C I18N Working Group.
- Tags for the Identification of Languages. IETF Request for Comments (RFC) 3066, BCP 47. January 2001. "... describes a language tag for use in cases where it is desired to indicate the language used in an information object, how to register values for use in this language tag, and a construct for matching such language tags..."
- Tags for the Identification of Languages. IETF Request for Comments 1766. March 1995. Superseded by RFC 3066.
- IANA (Internet Assigned Numbers Authority)
- The Internet Engineering Steering Group (IESG)
- IETF Announcements Archive
- Extensible Markup Language (XML) 1.0 (Third Edition). W3C Recommendation 04-February-2004. See Section 2.12 Language Identification
- Extensible Markup Language (XML) 1.1. W3C Recommendation 04-February-2004. Edited in place 15 April 2004. See Section 2.12 Language Identification
- TEI Character Set Working Group. The WG is preparing a new TEI Guidelines chapter for Language Identification. See a draft version for TEI P5. [source PDF]
- "IETF Draft on Language Tags Defines Mechanism for Private Use Extension." News story 2003-11-14.
- "Markup and Multilingualism" - General references.
- "Language Identifiers in the Markup Context" - General references.