ISO / TC 37 / SC 2 / WG 1 N 87
MS Word Version of E-mail:
From: Infoman Inc.
[mailto:mpereira@istar.ca]
Sent: August 9, 2001 10:03 PM
To: John Clews
Cc: Helen Hutcheson
Subject: IT-enablement and Language
Codes
Importance: High
Note: John & Helen SVP feel free to
pass on this e-mail and attachments to anyone in TC37 (or others who might be
interested in these issues and their solutions)
John,
I have had a chance to scan through
TC37/SC2/WG1 documentation re language coding systems, lack of consistency in
names of languages among 639-1 & 639-1, language group codings, linkage of
language codes and territorial mappings including "jurisdictions",
etc. From an Open-edi, e-commerce, e-business, etc. there are similar issues
albeit from a different perspective.
I wish that we had known of this
commonality of issues and possible solutions (see some of the SC2/WG1 documents
of Hāvard Hjulstad) earlier. I hope
that in Toronto we may be able to achieve this. I conclude this memo with some
recommendations. But first I want to introduce some documents approaching these
issues from a ISO/IEC JTC1 "Information Technology" perspective and
within this those of electronic data interchange (Open-edi), metadata,
e-commerce, e-business, e-administration, etc.. These have resulted in the
launching of two new standardization activities ISO/IEC 18022 and ISO/IEC 18038
these will need to interwork closely with ISO 639-2. I am the Project Editor
for both.
Attached are a series of documents
which may be of use to TC37/SC2 work in this area. It would be very useful and
practical if common solutions can be developed among TC37/SC2 and JTC1/SC32 WG1
& WG2 for what are essentially similar problems but from different
perspectives.
The series of documents are presented
in chronological order. The common point of departure is the "Report of
the JTC1 Business Team on Electronic Commerce(BT-EC)" (ISO/IEC JTC1 N5296)
and its recommendations for high priority standardization activities.
Note: All the documents listed below
are public ISOIEC JTC1 documents and accessible via http://www.jtc1.org as
JTC1 documents or as JTC1/SC32
documents also available and downloadable by going to the SC32 site from this
site.
1. ISO/IEC JTC1/SC32 N0147
"Horizontal Issues and Encodable Value Domains in Electronic Commerce:
Non-Technical Summary and Real World Examples to supplement
BT-EC".(1998-08-05).
Of interest here are:
"Example #2 - Country Codes and
Localization with Multilingualism"; and
"Example #3 - Language Codes and
Concordance Among International Standards". The is a small example of
"concordance" among ISO 639-1, the Library of Congress USMARC Code
List for Languages and the Universal Decimal Classification (UDC) System.
2. ISO/IEC JTC1/SC32 Making Standards
Work in Electronic Commerce and Among Jurisdictions: IT-Enablement of Data
Element-based Standards - Presentation at the Open Forum on Metadata Registries
in Santa Fe" (2000-01-19)
This document repeats some of the
examples found in SC32 N0147 but was prepared after two proposals for new
standardization activities were accepted by the ISO/IEC (see pp. 8-12), namely,
> a new ISO/IEC 18022 "Identification, Mapping and
IT-Enablement of Existing Standards for Widely Used Encoded Value
Domains". {See ISO/IEC JTC1 N5847). Responsibility: ISO/IEC JTC1/SC32/WG2
- Metadata in close liaison with SC32/WG1 - Open-edi. [Note the title has
changed to "Identification, Mapping and IT-Enablement of Widely used Coded
Value Domains"].
> a new ISO/IEC 18038 "Identification and Mapping of
Various Categories of Jurisdictional Domains" {See ISO/IEC JTC1 N5846}
I also want to draw your attention to,
p.17 "4.2.3 Level-2 Canada &
Nunavut - ISO 3166-2 Subdivisions. It is a very useful example of making a
clear distinction between IT-Interface requirements of schema/table ID and Code
ID, on the one hand, and on the other the possible multiple human interface
linguistic equivalent terms (including those using non Latin-1 alphabets);,
and,
pp.18-23 "4.3 Example 3: Simple
Topology Based on ISO CD 19107" (Geomatics). It focuses on the use of UML
(Unified Modeling Language) in a linguistically neutral way and from there have
multiple linguistic language equivalents. The examples include English, French
and Mandarin Chinese language equivalents from a human interface perspectives
of the same sets of requirement modeled through UML. (and from UML one can
generate XML-based equivalents). As far as I know, this is a first.
3. ISO/IEC JTC1/SC32 N0486 Progression
on Development of the New Standard "Identification, Mapping and
IT-enablement of Standards for Widely Used Coded Value Domains"
(2000-06-02).
This is a short overview of the context
and purpose of this new standard. It identifies existing terms and definitions
to be utilized. [see further below 32N0534 which contains the actual
definitions for the candidate terms including their French language equivalents
where available)
4. ISO/IEC JTC1/SC32 N0534 "Status
of the Work on the New ISO/IEC 18022 "Identification, Mapping and
IT-Enablement of Standards for Widely Used Coded Value Domains"
(2000-10-04)
This document brings together from
different ISO and ISO/IEC standards existing terms and definitions (and the
French language version where available)pertaining to coding, identifiers,
business transactions, character sets, etc. It should be useful to TC37/SC2
work on "Coding systems".
5. ISO/IEC JTC1/SC32 N0535
"Approach to Development of the New ISO/IEC "Identification and
Mapping of Various Categories of Jurisdictional Domains" (2000-10-12).
This 91 pp. document is important. It
demonstrates:
(a) that many of the entities listed in
ISO 3166-1 are (i) not countries and (ii) really should have been moved to ISO
3166-2 when ISO 3166 became a two-part standard; and,
(b) that one should use the 3-digit
numeric code as the "pivot code" and identifier as it is the most
stable and does not change when the names of the entities identified (e.g.,
countries) change.
This standard will also cover that
jurisdictions in the form of "regions", i.e. several jurisdictions
forming a "joint" jurisdiction (e.g. NAFTA, the European Union, etc.)
---------
Recommendations
1. Use ISO 639-2/T as the core set
identifiers and pivot codes especially in support of Open-edi and other
computer-to-computer IT-interface requirements.
2. Integrate ISO 639-1 into 639-2 and
make it an "partially equivalent sub-set" freezing its development.
3, Declare current ISO 639-2/B to be an
alternative equivalent to the 639-2/T "pivot code set".
4. Eliminate/by-pass the current
problem of various English and French language names/spelling with the creation
of a single set of ISO English and ISO French language names as the
"official/standard" ISO English ("ien") and French
"ifr") language human interface equivalents to the core set of
identifiers and pivot codes. Other representations/spellings could be noted
simply as "alternative representation" (e.g. like synonyms" or
"depreciated terms"). The names of the language in the language would
be another set of human interface equivalents.
5. When referencing use of a natural
language in a jurisdiction, specify the jurisdiction first using the ISO 3166-1
3-digit numeric first following by the 639-2/T identifier (.g. use of English
in Canada would be "124:eng"). Again this it from an IT-interface
perspective, one would be free at the human interface to represent
"124:eng" with any of its human linguistic equivalent.
The current problems in using upper and
lower case alpha codes include that,
a) some software programs, parsers,
etc. do not distinguish (or are set not to distinguish/differentiate) between
upper and lower case; and,
2) the use of alpha-2 and alpha-3
countries codes can be and is easily confused with use of alpha-2 and alpha-3
language codes.
Further in computer-to-computer
financial transaction, the banking/financial community uses the 3-digit numeric
code as it is the most stable and unambiguous.
Enough said and done for one day.
Looking forward to seeing you both next week.
au revoir - Jake Knoppers
P.S. I will be
bringing a soft copy of this e-mail and the