ISO / TC 37 / SC 2 / WG 1 N 87

MS Word Version of E-mail:

 

From: Infoman Inc. [mailto:mpereira@istar.ca]

Sent: August 9, 2001 10:03 PM

To: John Clews

Cc: Helen Hutcheson

Subject: IT-enablement and Language Codes

Importance: High

 

Note: John & Helen SVP feel free to pass on this e-mail and attachments to anyone in TC37 (or others who might be interested in these issues and their solutions)

 

John,

 

I have had a chance to scan through TC37/SC2/WG1 documentation re language coding systems, lack of consistency in names of languages among 639-1 & 639-1, language group codings, linkage of language codes and territorial mappings including "jurisdictions", etc. From an Open-edi, e-commerce, e-business, etc. there are similar issues albeit from a different perspective.

 

 

I wish that we had known of this commonality of issues and possible solutions (see some of the SC2/WG1 documents of  Hāvard Hjulstad) earlier. I hope that in Toronto we may be able to achieve this. I conclude this memo with some recommendations. But first I want to introduce some documents approaching these issues from a ISO/IEC JTC1 "Information Technology" perspective and within this those of electronic data interchange (Open-edi), metadata, e-commerce, e-business, e-administration, etc.. These have resulted in the launching of two new standardization activities ISO/IEC 18022 and ISO/IEC 18038 these will need to interwork closely with ISO 639-2. I am the Project Editor for both.

 

Attached are a series of documents which may be of use to TC37/SC2 work in this area. It would be very useful and practical if common solutions can be developed among TC37/SC2 and JTC1/SC32 WG1 & WG2 for what are essentially similar problems but from different perspectives.

 

The series of documents are presented in chronological order. The common point of departure is the "Report of the JTC1 Business Team on Electronic Commerce(BT-EC)" (ISO/IEC JTC1 N5296) and its recommendations for high priority standardization activities.

 

Note: All the documents listed below are public ISOIEC JTC1 documents and accessible via http://www.jtc1.org as JTC1  documents or as JTC1/SC32 documents also available and downloadable by going to the SC32 site from this site.

 

 

1. ISO/IEC JTC1/SC32 N0147 "Horizontal Issues and Encodable Value Domains in Electronic Commerce: Non-Technical Summary and Real World Examples to supplement BT-EC".(1998-08-05).

 

Of interest here are:

 

"Example #2 - Country Codes and Localization with Multilingualism"; and

 

"Example #3 - Language Codes and Concordance Among International Standards". The is a small example of "concordance" among ISO 639-1, the Library of Congress USMARC Code List for Languages and the Universal Decimal Classification (UDC) System.

 

 

2. ISO/IEC JTC1/SC32 Making Standards Work in Electronic Commerce and Among Jurisdictions: IT-Enablement of Data Element-based Standards - Presentation at the Open Forum on Metadata Registries in Santa Fe" (2000-01-19)

 

This document repeats some of the examples found in SC32 N0147 but was prepared after two proposals for new standardization activities were accepted by the ISO/IEC (see pp. 8-12), namely,

 

 > a new ISO/IEC 18022 "Identification, Mapping and IT-Enablement of Existing Standards for Widely Used Encoded Value Domains". {See ISO/IEC JTC1 N5847). Responsibility: ISO/IEC JTC1/SC32/WG2 - Metadata in close liaison with SC32/WG1 - Open-edi. [Note the title has changed to "Identification, Mapping and IT-Enablement of Widely used Coded Value Domains"].

 

 > a new ISO/IEC 18038 "Identification and Mapping of Various Categories of Jurisdictional Domains" {See ISO/IEC JTC1 N5846}

 

I also want to draw your attention to,

 

p.17 "4.2.3 Level-2 Canada & Nunavut - ISO 3166-2 Subdivisions. It is a very useful example of making a clear distinction between IT-Interface requirements of schema/table ID and Code ID, on the one hand, and on the other the possible multiple human interface linguistic equivalent terms (including those using non Latin-1 alphabets);, and,

 

pp.18-23 "4.3 Example 3: Simple Topology Based on ISO CD 19107" (Geomatics). It focuses on the use of UML (Unified Modeling Language) in a linguistically neutral way and from there have multiple linguistic language equivalents. The examples include English, French and Mandarin Chinese language equivalents from a human interface perspectives of the same sets of requirement modeled through UML. (and from UML one can generate XML-based equivalents). As far as I know, this is a first.

 

 

3. ISO/IEC JTC1/SC32 N0486 Progression on Development of the New Standard "Identification, Mapping and IT-enablement of Standards for Widely Used Coded Value Domains" (2000-06-02).

 

This is a short overview of the context and purpose of this new standard. It identifies existing terms and definitions to be utilized. [see further below 32N0534 which contains the actual definitions for the candidate terms including their French language equivalents where available)

 

 

4. ISO/IEC JTC1/SC32 N0534 "Status of the Work on the New ISO/IEC 18022 "Identification, Mapping and IT-Enablement of Standards for Widely Used Coded Value Domains" (2000-10-04)

 

This document brings together from different ISO and ISO/IEC standards existing terms and definitions (and the French language version where available)pertaining to coding, identifiers, business transactions, character sets, etc. It should be useful to TC37/SC2 work on "Coding systems".

 

 

5. ISO/IEC JTC1/SC32 N0535 "Approach to Development of the New ISO/IEC "Identification and Mapping of Various Categories of Jurisdictional Domains" (2000-10-12).

 

This 91 pp. document is important. It demonstrates:

 

(a) that many of the entities listed in ISO 3166-1 are (i) not countries and (ii) really should have been moved to ISO 3166-2 when ISO 3166 became a two-part standard; and,

 

(b) that one should use the 3-digit numeric code as the "pivot code" and identifier as it is the most stable and does not change when the names of the entities identified (e.g., countries) change.

 

This standard will also cover that jurisdictions in the form of "regions", i.e. several jurisdictions forming a "joint" jurisdiction (e.g. NAFTA, the European Union, etc.)

 

---------

Recommendations

 

1. Use ISO 639-2/T as the core set identifiers and pivot codes especially in support of Open-edi and other computer-to-computer IT-interface requirements.

 

2. Integrate ISO 639-1 into 639-2 and make it an "partially equivalent sub-set" freezing its development.

 

3, Declare current ISO 639-2/B to be an alternative equivalent to the 639-2/T "pivot code set".

 

4. Eliminate/by-pass the current problem of various English and French language names/spelling with the creation of a single set of ISO English and ISO French language names as the "official/standard" ISO English ("ien") and French "ifr") language human interface equivalents to the core set of identifiers and pivot codes. Other representations/spellings could be noted simply as "alternative representation" (e.g. like synonyms" or "depreciated terms"). The names of the language in the language would be another set of human interface equivalents. 

 

5. When referencing use of a natural language in a jurisdiction, specify the jurisdiction first using the ISO 3166-1 3-digit numeric first following by the 639-2/T identifier (.g. use of English in Canada would be "124:eng"). Again this it from an IT-interface perspective, one would be free at the human interface to represent "124:eng" with any of its human linguistic equivalent.

 

The current problems in using upper and lower case alpha codes include that,

a) some software programs, parsers, etc. do not distinguish (or are set not to distinguish/differentiate) between upper and lower case; and,

2) the use of alpha-2 and alpha-3 countries codes can be and is easily confused with use of alpha-2 and alpha-3 language codes.

 

Further in computer-to-computer financial transaction, the banking/financial community uses the 3-digit numeric code as it is the most stable and unambiguous.

 

Enough said and done for one day. Looking forward to seeing you both next week.

 

au revoir - Jake Knoppers

 

P.S. I will be bringing a soft copy of this e-mail and the