ISO / TC 37 / SC 2 / WG 1 N 83
ISO/IEC JTC 1/SC 32
Data Management and Interchange
ISO/IEC JTC 1/SC32 N 147
DATE: 1998-08-05
REPLACES
DOC TYPE:
National Body Contribution
TITLE:
Horizontal Issues and Encodable Value Domains in Electronic Commerce:
Non-technical Summary and Real World Examples to supplement BT-EC
Report
SOURCE:
Canadian National Body
PROJECT:
STATUS:
This document was reviewed at the SC 32 Plenary meetings, July 1998,
Brisbane, Australia.
ACTION ID: FYI
DUE DATE:
DISTRIBUTION: P & L Members
SC Chair
WG Conveners and Secretaries
MEDIUM:
DISKETTE NO.:
NO. OF PAGES: 20
Secretariat, ISO/IEC JTC 1/SC 32,
American National Standards Institute, 11 West 42nd Street, New York,
NY 10036; Telephone: 212-642-4976; Fax: 212-840-2298;
E-mail: mtopping@ansi.org
Title: Horizontal
Issues and Encodable Value Domains in Electronic Commerce: Non-technical
Summary and Real World Examples to supplement BT-EC Report
Source: CAC/JTC1/SC32,
Canada
Status: National Body Contribution
Action: FYI and Discussion at the SC32 HOD/C and Plenary in
Brisbane, July 98
Purpose: This document:
1.
is circulated to JTC1/SC32 as a reference document prepared
to facilitate follow-up to the BT-EC Report (JTC1 N5296) by SC32/WG2 (and
SC30/WG1) as stated in Resolution 8 of the 12th Plenary of JTC1 (N5448) and the
JTC1 Request for National Body and Subcommittee Comments on JTC1 N5296,
Electronic Commerce Business Team Report (N5437)
2.
serves as input to "Elaboration on the definition of
cultural and linguistic adaptability" for the JTC1 Ad-Hoc Meeting of the
new Technical Direction on
"Cultural and Linguistic Adaptability and User Interfaces" as per
Resolution 22 of the 12th JTC1 Plenary (N5448); and,
3.
contributes to the work of JTC1/SC32/WG2 on ISO/IEC PDTR
15452 "Information Technology - Specification of data value domains".
Contents:
1. INTRODUCTORY NOTES.........................................................................................
2. HORIZONTAL ISSUES...............................................................................................
2.1 Overview................................................................................................................................................................
2.2 Information Technology (IT) -enablement.......................................................................................................
2.3 Localization and multilingualism.....................................................................................................................
2.4 Cross-Sectorial issues.......................................................................................................................................
2.5 Cultural adaptability...........................................................................................................................................
3. REAL WORLD EXAMPLES OF ENCODED VALUE DOMAINS.....................
3.1 Introduction........................................................................................................................................................
3.2 Example #1 - Currency Codes.........................................................................................................................
3.3 Example #2 - Country Codes And Localization With Multilingualism...................................................
3.4 Example #3 - Language Codes And Concordance Among International Standards.............................
3.5 Commodity Codes: IT-Enabled With Localization And Multilingualism................................................
1. JTC1 activities of its Business Team on
Electronic Commerce (BT-EC) and the Cultural Adaptability Workshop (CAW) both
completed their work and reported to the 12th Plenary Meeting of
ISO/IEC JTC1, 2-5 June, 1998 in Sendai, Japan.
JTC1 document N5448 contains the resolution of this Plenary. Resolutions 8, 9 and 10 pertain to JTC1
follow-up on the BT-EC Report and recommendations. Resolution 22 pertains to JTC1 follow-up on CAW and its
recommendations.
Members of the BT-EC participated in
CAW. The BT-EC scheduled its final
meeting to be held after the Workshop on Cultural Adaptability so that the
BT-EC could benefit from the results of CAW.
In Resolution 8 (N5448), JTC1
instructs its secretariat to circulate the BT-EC Report to "National
Bodies" and all JTC1 Technical Directions for review and comment.
2. The purpose of this document is to
serve as a "non-technical" summary of work of the ISO/IEC JTC1
Business Team on Electronic Commerce (BT-EC) with respect to "Horizontal
Aspects" and "Encodable Value Domains". {See further below}. The
BT-EC Report (N5296) contains many recommendations pertaining to
"encodable value domains".
Here this document serves as a backgrounder.
3. This document consolidates in one
contribution contents from two existing JTC1 documents; namely:
(1) in
Chapter 2, the text found in Clause 6.0 of the BT-EC "Report to JTC1: Work
on Electronic Commerce Standardization to be initiated" (JTC1 N5296; and,
(2) in
Chapter 3, which is based on text found in a Canadian member body contribution
titled "Additional information in support of the BT-EC Report (JTC1 N5296)
- Examples of Encodable Value Domains with IT-Interface Needs, Localization and
Multilingualism" (JTC1 N5394).
4. While directed at JTC1/SC32/WG2 (and
WG1) and the JTC1 "Ad-Hoc on Cultural and Linguistic
Adaptability". This contribution
is also intended to be circulated outside of JTC1 to raise awareness and obtain
feedback on the topics covered here.
5. Horizontal Issues - Capsule Overview
In the BT-EC Report, cultural and
linguistic adaptability were deemed to be important to electronic
commerce. In addition to being noted as
part of consumer requirements {Section 5.2}, they were identified by the BT-EC
as key components of four horizontal issues which are of general relevance for
all scenarios involving Electronic Commerce.
These issues are:
Ř information
technology (IT)-enablement;
Ř localization
including multilingualism;
Ř cross-sectorial
aspects; and,
Ř cultural
adaptability.
The BT-EC ordered these horizontal
issues on the basis of:
(1) the need to go from
the simpler to more complex challenges;
(2) placing priority on
the "do-able" and immediately
most useful in the context of increasing resource constraints in
standardization work; and,
(3) promotion and
visibility of ISO/IEC JTC1 work within the ISO, IEC and ITU and especially
outside of these standardization communities.
From a user perspective, these four
horizontal issues need to be addressed in a harmonized manner.
From an Electronic Commerce
perspective, i.e., that of the JTC1/BT-EC perspective, standardization work
addressing the first three horizontal issues associated with:
Ř "IT-enablement";
Ř "Localization
and Multilingualism"; and,
Ř "Cross-Sectorial"
aspects,
should resolve many of the
requirements for cultural adaptability.
It then remains to be seen what other "cultural adaptability"
requirements remain, i.e., in addition to those already identified as "cultural
elements" and/or those of a societal nature.
BT-EC
identified four horizontal issues as being of general relevance for all
scenarios involving Electronic Commerce and gave these horizontal issues some
prominent attention in its work. These issues are:
ˇ information
technology (IT)-enablement,
ˇ localization
including multilingualism,
ˇ cross-sectorial
aspects,
ˇ cultural
adaptability.
These
horizontal issues are ordered here on the basis of
1. the
need to go from the simpler to more complex challenges,
2. placing
priority on the "do-able" and immediately most useful in the context
of increasing resource constraints in standardization work; and,
3 promotion
and visibility of ISO/IEC JTC1 work
within the ISO, IEC and ITU and especially outside of these standardization
communities.
From a user
perspective, these four horizontal issues need to be addressed in a harmonized
manner.
A key
characteristic of commerce world-wide, in particular in the
business-to-business and business-to-administration domains, is that it
consists of business transactions which:
1. are
rule-based, i.e., mutually understood and accepted sets of business conventions,
practices, procedures, etc.; and,
2. make
extensive use of "codes", often table-based, representing predefined
possible choices for common aspects of business transactions. Examples include
countries, currencies, languages, manufactures and their products.
Many of these
sets of agreed-upon rules used in business world-wide and their associated
lists of tables/codes are "de
jure" and "de facto"
standards. BT-EC noted that numerous international standards are already in
use in support of commerce world-wide. The problem is that most are paper-based
and lack a computer-processable version. Even if distributed in electronic
form, these standards including those of ISO, used in commerce world-wide
consist of tens of printed pages. They cannot be "plugged-in" for use
in Electronic Commerce. Much of the intelligence in these international
standards is humanly understandable explicitly or implicitly. They have not
been described formally using Formal Description Techniques (FDTs), i.e., in
their present form they do not support "computational integrity".
Consequently, each enterprise using these code sets has to spend
considerable time and effort to (1) determine their meaning and interpret them;
(2) build applications; and, (3) hope that they interoperate with other networks
or enterprises.
Human beings
like to name "objects". But the approach of using "names"
is not very IT friendly, cost-efficient or time-efficient.
Depending on
the interplay of multilingual and localization requirements, in Electronic Commerce,
a singular product or service being offered for sale will have multiple names
and differing names even in the "same" language. Thus, if we wish to
ensure rapid and widespread use of Electronic Commerce globally, we must on
the one hand identify "objects", i.e., products or services being
offered for sale, in an unambiguous, linguistically neutral, and IT-processable
and EC-facilitated manner, and, on the other hand, present the same via a range
of linguistic names (and associated character sets) from a point-of-sale
perspective, i.e., human-readable user interface, as required by the
"local" marketplace.
In order to
provide a focus for its work on horizontal issues, the BT-EC utilized four real
world examples; namely:
ˇ Currency
Codes,
ˇ Country
Codes,
ˇ Language
Codes,
ˇ Commodity
Codes.
(For details of
these examples see Chapter 3 below and JTC 1/BT-EC N 047).
These examples
represent standards used for commerce world-wide and are presently implemented
by enterprises and their information systems in wide variety of different ways.
There are also no "standard" ways for the interworking among these
and similar standards. This does not promote global interoperability. The
recent widespread use of the Internet is exacerbating existing ambiguities.
From a BT-EC
perspective, these four examples underline the fact that with respect to electronic
commerce there may be less of a need for new standards. Rather the immediate
challenge may well be the development of a category of information technology
standards which will facilitate the development of information technology
enabled versions of existing standards used in commerce and do so in a manner
which also supports the interplay of localization and multilingual requirements,
i.e., "bridging standards".
BT-EC wishes to
pass on the following considerations for such standardization work in support
of Electronic Commerce; namely:
1. Standards
must focus on the interface (as opposed to implementation) as the best means of
arriving at globally harmonized solutions for interoperability from both a
business and information technology perspective.
2. Standard
interfaces among information systems must be technology neutral accommodating
advances in technology to the extent possible. Further, such standard
interfaces must be linguistically neutral to the furthest extent possible.
3. In
order to empower users and consumers, standards should be adaptable to local
and multilingual requirements at national and regional levels, while ensuring
full transparency of available market solutions to the consumer. Multilingualism
must be considered. The expansion of open, multilingual standards could
significantly increase the volume and value of world-wide Electronic Commerce.
"IT-enablement"
is the term used to identify the need to transform currently accepted standards
used in commerce world-wide from a manual to a computational perspective.
Electronic commerce, in particular of the Business-to Business or
Business-to-Administration categories, introduces a requirement for standards
that are prepared, structured and made available for unambiguous usage within
and among information systems. This requirement can be expressed as
"computational integrity", in particular:
"the expression of standards in a form that
ensures precise description of behaviour and semantics in a manner that allows
for automated processing to occur, and the managed evolution of such standards
in a way that enables dynamic introduction by the next generation of
information systems".
The objective
of IT-enablement is to capture in a computer-processable manner, and one which
maximizes interoperability, the implicit rules and relations (i.e., those known
to "experts") of the code sets found in standards used in commerce
world-wide, i.e., capture and state from an entity relationship and/or object
technology perspective, using Formal Description Techniques. Also, issues
arising from change management in "code tables", i.e., synchronization,
backwards compatibility, migration, etc. need to be addressed.
IT-enablement
is based on the premise that a detailed and exhaustive identification of
standards and "conventions", etc., used in support of existing
commerce, will eliminate many barriers to Electronic Commerce.
IT-enablement
recognizes that within ISO, IEC and ITU, there are committees which have the
domain responsibility and expertise in areas of work, the primary purpose of
which is to manage and control the content. IT-enablement also recognizes that
outside of ISO/IEC/ITU, there are many other organizations which have domain
responsibility and expertise in subject areas relevant to global Electronic
Commerce. Their "content" and industry sector domain oriented
standards require an IT-enabled version for use in Electronic Commerce.
BT-EC suggests
that JTC 1 gives proper consideration to IT-enablement, initially focused on
currency, country, language and commodity codes. Members of BT-EC are of the
opinion that such work will serve as the necessary practical experience and
expertise needed to develop a generalized approach to
"IT-enablement". This should also help to support localization and
multilingual requirements.
(For further
information, see document ISO/IEC JTC 1/BT-EC N 46.)
IT-enablement
is based on the premise that to ensure rapid and widespread use of Electronic
Commerce globally, we must on the one hand identify "objects", i.e.,
products or services being offered for sale, in an unambiguous, linguistically
neutral, and IT-processable and EC-facilitated manner, and, on the other hand,
present the same via a range of linguistic names (and associated character
sets) from a point-of-sale perspective, i.e., human-readable, as required by
the "local" marketplace.
BT-EC reviewed
existing JTC 1 terms and definitions of "locale", (see ISO/IEC JTC
1/BT-EC N 46). Those aspects normally are related to the character sets
associated with a natural language, including collating/ordering, data/time
formats, monetary formatting, etc., a.k.a. "cultural elements".
From an
Electronic Commerce perspective, BT-EC identified four additional sets of
parameters of "localization" requirements which should be addressed,
namely:
1. jurisdictional
requirements, i.e., various combinations of "top-down" legal and
regulatory frameworks which place constraints on the global marketplace and in
doing so, often define/establish a "local" market;
2. consumer
requirements, i.e., combinations of "bottom-up" consumer demands and
behaviour;
3. supplier
requirements, i.e., combination of factors impacting on suppliers of goods and
services (as well as those involved in supporting logistics chains); and,
4. human
rights-related requirements, (e.g., disabled/handicapped, privacy, etc.).
BT-EC defines
"localization" as:
localization: pertaining
to or concerned with anything that is not global and is bound through specified
sets of parameters of:
(a) a linguistic nature including natural
and special languages and associated multilingual requirements;
(b) jurisdictional nature, i.e., legal,
regulatory, geopolitical, etc.;
(c) a sectorial nature, i.e., industry
sector, scientific, professional, etc.;
(d) a human rights nature, i.e., privacy,
disabled/handicapped persons, etc.; and/or
(e) consumer behaviour requirements.
Within and among "locales", interoperability
and harmonization objectives also apply.
From an
Electronic Commerce perspective, "jurisdiction", on the whole,
represents a set of local market entry and/or participation requirements which
may be of a general nature or product/service-specific.
From a legal
perspective, the basic entity is the country. Two or more countries among themselves
can form a common harmonized "jurisdiction" governing the
marketplace, through a bilateral or multilateral agreement. Where these
agreements are of a general nature, the harmonized "jurisdiction"
is know as a "region". Examples here include the European Union,
NAFTA, etc.. Within countries, there may be various approaches to more
granular legal and regulatory frameworks, e.g., at the level of states,
provinces, etc.
In addition to
a jurisdiction with a geographic dimension, there are jurisdictions bounded by
a goods and services dimension. Examples here include airlines, banking, oil
companies, etc. Here jurisdiction is often expressed through treaties,
regulations, agreements, etc., which are harmonized through an entity
representing these communities (e.g., ICAO, WCO, or WTO).
Combinations of
laws and regulations can be viewed as frameworks. BT-EC can thus define jurisdiction
as:
"jurisdiction: a
distinct legal and regulatory framework which places constraints on the global
marketplace and in doing so often defines/establishes a local market".
Electronic
commerce is "borderless" in its nature - it transcends jurisdictions.
From a BT-EC
perspective, multilingual requirements comprise more than just the need to support
the character sets and sort/collate sequences of the various languages used by
customers world-wide. It also means that a single natural language is utilized
in different ways in various local markets.
In addition,
one should add the concept of special languages, i.e., those of a scientific or
technical nature, as well as those which pertain to a specific industry
sector. Many of these can be considered to be global in nature and use.
Thus from an
Electronic Commerce perspective, "multilingual" requirements embody
not only:
1. multiple
natural languages; but also,
2. multiple
and different uses of the "same" natural language;
3. multiple
source languages in any multilingual thesauri, database, referenceable
permitted value domains (PVDs), i.e., tables, code sets, etc.; and possibly
also,
4. the
use of special languages.
In this
context, one can define:
multilingualism: "the ability to support not only
character sets specific to a language (or family of languages) and associated
rules but also localization requirements, i.e., use of a language from
jurisdictional, sectorial and consumer marketplace perspectives".
From a BT-EC
perspective adding multilingual capabilities in Electronic Commerce can be viewed
as simply mirroring the existing physical world requirements. Prime examples
here are product labelling requirements and product usage instructions. Given
the increasing globalization in trade in goods, single language usage instructions
accompanying products are increasingly rare and multilingual usage instructions
increasingly common place.
Cross-sectorial
issues pertain to differing, at times conflicting, understandings of business
practices, object identification, etc., among economic sectors. The challenge
here is that of resolving two sets of issues:
1. Industry
sectors, scientific fields, and professional disciplines assign their own uses
or meanings to the terms of a natural language. Quite often natural
languages are used in the manner of what we earlier called "special
languages": the same word/term frequently has very different meanings in
other industry sectors. There is a trend in various sectors towards using
existing non-technical "common language" words as terms with new
technical meanings. This problem of polysemy needs to be taken into account in
cross-sectorial Electronic Commerce.
2. Multilingual
equivalency needs to create an added layer of complexity and even more so for
unambiguous cross-sectorial interoperability in support of Electronic Commerce
(as well as world-wide "individual-to-business" Electronic Commerce
via the Internet).
A case study on
cross-sectorial issues (see JTC 1 /BT-EC N 045) led in respect to scientific languages
to the conclusion that a scientific language can be considered a culturally
neutral exchange language which, in turn, has multiple natural language and
culturally dependent linguistic equivalent terms.
Technical
languages and their use in particular industry sectors, however, do present
particular challenges to cultural adaptability and cross-sectorial
interoperability since they do not have the attributes of scientific languages.
Technical languages as linguistic sub-systems are difficult enough to handle
even within their industry sector, in one natural language. To this are added
the challenges of localization, multiculturalism and cross-sectorial
interactions in Electronic Commerce.
Each industry
sector interacts with other sectors. A key characteristic of special languages
is an associated controlled vocabulary of terms, often also in a multilingual
manner.
In conclusion,
it should be noted that within industry sectors, established standards and
conventions exist for unambiguous identification and referencing of unique
objects, and for naming them (often multilingually), along with associated
rules. Although not originally designed to interoperate across and among
industry sectors, many of these sectorial standards have core constructs in common
which could be utilized to support cross-sectorial Electronic Commerce and in a
manner which accommodates localization and multilingual needs.
BT-EC viewed
"cultural adaptability" as a set of requirements affecting global Electronic
Commerce from a cultural perspective and noted that these can co-exist within
"localization" and "multilingualism" requirements. In
addition, there are societal aspects which often are not bounded by
jurisdiction or geographic area (e.g., Jewish and Muslim cultures transcend
jurisdictional boundaries).
The following
definition of "cultural adaptability" is found in JTC 1 N4627:
The special characteristics of natural languages and
the commonly accepted rules for their use (especially in written form) which
are particular to a society or geographic area. Examples are: national
characters and associated elements (such as hyphens, dashes, and punctuation
marks), correct transformation of characters, dates and measures, sorting and
searching rules, coding of national entities (such as country and currency
codes), presentation of telephone numbers, and keyboard layouts".
This definition
of the concept/term "cultural adaptability" is the same as that for
"cultural elements" found in ISO/IEC JTC 1/CAW N 008. It has a focus
on special characteristics of natural languages and commonly accepted rules
for their use which are particular to a society or geographic area. The
emphasis here appears to be on character sets, scripts, glyphs, etc., their
ordering, sorting, search, etc.
However, in
commerce world-wide, it is not so much the natural language but the usage of
special languages (e.g., technical and scientific), which forms a significant
challenge to providing interoperability in Electronic Commerce. This is true
especially for "technical" uses of natural languages by different
industry sectors. Differences in uses of a natural language exist also in industry
sectors which represent sets of requirements other than those particular to a
society or geographic area.
BT-EC made an
effort to coordinate the work on this horizontal issue with the JTC 1/CAW (Cultural
Adaptability Workshop). BT-EC notes Resolution 3 of JTC 1/CAW which states
"that CAW did not have time to address the request of JTC 1 to elaborate
or amend the definition of cultural adaptability as contained in the document
JTC 1 N4627".
From an
Electronic Commerce perspective, standardization work addressing the three
horizontal issues associated with
ˇ "IT-enablement",
ˇ "Localization
and Multilingualism", and
ˇ "Cross-Sectorialization"
should resolve
some of the requirements for "cultural adaptability". It then remains
to be seen what other "cultural adaptability" requirements remain,
i.e., those of a societal nature (see also 5.2.2)"
-------------
[Note: Section 5.2.2 in the BT-EC Report pertains to
"Consumer requirements for Electronic Commerce"].
1. Chapter 3 is based on a Canadian
contribution to JTC1, i.e., N5394. This
contribution provided additional and more detailed information in support of
Clause 12.3 of the BT-EC Report titled "Examples of Encodable Value
Domains" (BT-EC Report, pages 61-66).
The Canadian contribution also provided three exhibits in support of the
examples.
2. The examples are currency codes,
country codes, language codes, and commodity codes. These four real world examples were developed to provide a focus
for the BT-EC work on four horizontal issues.
The three exhibits have proved useful in Canada in illustrating and
explaining the horizontal issues in a simple and non-technical manner to the
business community, policy makers, and various industry sectors.
The exhibits provided are intended
to demonstrate that the identification and referencing of real world objects,
i.e., as "instances" of an object class in an "encodable value
domain" can be done in a linguistically neutral and unambiguous manner.
This supports a global approach to
Electronic Commerce which is capable of meeting localization and associated
multilingual requirements. Linguistically neutral identification and
referencing of objects will also support computational integrity and more
efficient data interchange, with higher quality assurance and at lower costs
for all participants.
3. Those interested in standardization in
areas pertaining to Electronic Commerce may find these exhibits useful in
illustrating the horizontal aspects.
They can also use them and augment them by adding their own country and
language equivalent(s) terms for the linguistically neutral code(s) in the
exhibits.
The contributions from BT-EC members with respect to their
localization and accompanying linguistic requirements as found in the three
exhibits is appreciated.
4. Finally, it is useful to draw attention
to the BT-EC Report (in Clause 6 on pages 21 and 22) which states:
"Human
beings like to name "objects". But the approach of using
"names" is not very IT friendly, cost-efficient or time-efficient.
Depending
on the interplay of multilingual and localization requirements, in Electronic
Commerce, a singular product or service being offered for sale will have
multiple names and differing names even in the "same" language. Thus,
if we wish to ensure rapid and widespread use of Electronic Commerce globally,
we must on the one hand identify "objects", i.e., products or services
being offered for sale, in an unambiguous, linguistically neutral, and
IT-processable and EC-facilitated manner, and, on the other hand, present the
same via a range of linguistic names (and associated character sets) from a
point-of-sale perspective, i.e., human-readable user interface, as required by
the "local" marketplace."
In support of this BT-EC text,
Canada draws attention to ISO 1087 which defines "name: designation of an object by a linguistic
expression".
Consequently, any "object"
will have (1) multiple names; and, (2) in global Electronic Commerce, many of
the "names" used to designate the "object" being traded
will be in the form of linguistic expressions which use non-Latin 1 Characters,
(e.g., Arabic, Chinese, Thai, Hebrew, Japanese, etc.). This is one reason why ISO/IEC 10646 (a.k.a.
"Unicode") will be a key IT infrastructure standard needed to support
global electronic commerce.
A key attribute
of electronic commerce is that it involves business transaction where payment
must be made in a mutually acceptable currency. ISO 4217 is the standard for codes representing currencies and
funds. This standard and its contents
are the responsibility of ISO TC 68 Banking.
The principles for inclusion in the code lists of ISO 4217 is that (1)
they must be/represent currencies and funds used within the entities described
by ISO 3166 (Country Codes); and, (2) the codes listed are intended to reflect
current status, at the date of publication.
ISO 4217 has a
number of features and anomalies which although human understandable need to be
identified and explicitly captured in an IT-enabled manner. In short, ISO 4217 includes objects which
are not currencies (or funds). In ISO
4217, there are countries, i.e., as ISO 3166 entities, where:
Ř the three digit country code is not the
same as the three digit ISO 4217 3-digit code, (e.g., due to the
creation/utilization in ISO 4217 of ISO 3166 "User Extensions"). For example, one can readily identify in ISO
4217 twenty-five (25) instances for ISO 3166 entries where the ISO 3166 Country
Codes 3-digit numeric differs from the ISO 4217 "Code Name" 3-digit
numeric. Nor is there any relation
between the ISO 3166 and ISO 4217 alpha codes for many countries.
Ř a country (or dependency) has no currency
of its own and utilizes the currency of another country;
Ř a country has more than one currency,
i.e., its own and that of another country;
Ř countries having both a currency code and
a funds code;
Ř a set of countries collectively sharing
and using a currency which has no "issuing country", (e.g., SDR, XDR,
XOF, and XAF). Here one notes the need
to add the "euro" as currency (in addition to the "ecu",
i.e., XEU);
Ř special fund types;
Ř "currency" not linked to any
country or organization, (e.g., precious metals such as gold - 959, alpha =
XAU, special settlement currencies, etc.); and,
Ř "currencies" having no numeric
code but only a 3-alpha code, (e.g., XFO = Gold Franc).
Some of the
above noted rules and relationships are stated in ISO 4217, others are implicit
(and known by "experts"). An
IT-enabled version of ISO 4217 is required especially now that in electronic
commerce, and particularly that which is Internet-based. Many suppliers and consumers entering the
electronic commerce market or other Internet-based activities are not aware of
the "peculiarities" of ISO 4217, particularly those outside the
financial community.
Experiences in
the financial services/banking sector indicate that on the Internet those
engaged in electronic commerce as well as in general applications, need to be
made aware of standard notation for currencies. For example, in actual e-com practices, the Canadian dollar is
being represented as "CDN", "CAN", "CA",
etc. Further, the 3 alpha codes of ISO
3166-1 for countries often are confused with the ISO 4217 3-alpha currency
code.
Several
international standards are used internationally for codes representing
countries. The better known ones are
ISO 3166-1, the USMARC Code List for Countries as maintained by the Library of
Congress (LC), and the Universal Decimal Classification (UDC) auxiliary table
for countries. Of these the ISO 3166-1
is the most widely known. {On the LC
and UDC, see further Chapter 3.4 below}
This example
focuses on ISO 3166-1. This standard
and its contents is the responsibility of ISO TC 46 - Information and
documentation. The purpose here is to
highlight the need for an IT-enabled version of this standard, and also bring
to the fore related localization and multilingual aspects. The title of ISO 3166 is "Codes for the
representation of names of countries and their subdivisions". Within ISO 3166 standard, there are now
three parts; namely:
Ř Part 1: Country Codes;
Ř Part 2: Country Subdivision codes; and,
Ř Part 3: Codes for formerly used names of
countries.
Here ISO 3166-1
"established codes that represent the names of countries, dependencies,
and other areas of particular geopolitical interest, on the basis of lists of
country names obtained from the United Nations". Currently, each entry (or "record" of a permitted
instance) contains:
(1) a
three-digit numeric code
(2) a
two letter alpha code
(3) a
three letter alpha code
(4) a
short name - English
(5) a
long, i.e., formal name - English
(6) a
short name - French
(7) a
long, i.e., formal - French
ISO 3166 also
has a note field in English and in French.
ISO 3166-1 thus
has seven (7) "standardized" representations for each unique entity
or object, three (3) of which are codes.
ISO 3166-1 allows any one of the seven to be utilized although in
practice, and especially in IT systems one usually utilizes one of the three
codes.
For this ISO
3166 standard, we currently do not have a common international default
"standard" for the interface among applications/information systems
engaged in support of electronic commerce.
The 3-digit numeric code, the 2-alpha code and the 3-alpha code are all
used in interchanges.
Of these three
codes, the three digit numeric code is the most stable and tends to
change only when the physical boundaries change. Names short and long do change and at times the accompanying two
and three-letter alpha codes as well.
The ISO 3166
Alpha-2 and Alpha-3 codes are not that stable, i.e., whenever a country changes
its name it often also changes, its alpha-2 and alpha-3 codes, (e.g., Burma to
Myanmar, Zaire to the Democratic Republic of the Congo, etc.). The 3-digit numeric code is much more
stable. On the whole, it changes only
when the actual physical boundaries of the countries change, i.e., the entity
being identified and referenced is no longer the same. For example, the alphabetic (written language) equivalents to
"3166:180" recently under went the following changes:
|
Former |
New |
Alpha-2 |
ZR |
CD |
Alpha-3 |
ZAR |
COD |
Short Name
(en) |
Zaire |
Congo,
Democratic Republic of the |
Long Name
(en) |
Republic of
Zaire |
the
Democratic Republic of Congo |
Short Name
(fr) |
Zaďre |
Congo, la
République démocratique du |
Long Name
(fr) |
République de
Zaďre |
La République
démocratique du Congo |
The use of
Alpha-3 code tags for ISO 3166 country name causes overlap confusion with ISO
4217 currency and funds codes which are represented as Alpha-3 codes (upper
case).
Further, the
3-digit numeric code is linguistically neutral and unambiguous. Each of the 3-digit numeric codes has in ISO
3166 associated with it six (6) alphabetic linguistic expressions, two of which
also serve as "human understandable" (and computer-processable codes).
From an
interoperability perspective, i.e., both that of commerce and IT, the
"3166" identifying the scheme and the rule set and the 3-digit
numeric code identifying a country, in the context of this domain, together
form an unique and an unambiguous global identifier for the entity being
referenced. The alpha codes and names
should simply be considered linguistic equivalent expressions, i.e., from an
information systems perspective all that one may need to standardize at the
interface is something akin to "3166:246", "3166:056",
"3166:792", etc.
In addition,
and also very important, it should be
noted that ISO 3166-1 contains many instances and associated codes for entities
which are not "countries", i.e., they are dependencies of
other entities, (e.g., France, Great Britain, USA, etc.). Human beings "filter" and easily
make these distinctions. Computers
being very dumb cannot unless explicitly instructed to make things even worse
many of these 3166-1 "sub-entities" have one code in 3166-1 and a different
code in 3166-2. [Note: One would have thought that when ISO 3166
was split into its three parts, all the sub-entities in ISO 3166 would no
longer be found in 3166-1 but be moved to 3166-2].
Further, we
should note the fact that in their own locale and language, countries have
their own "local" short and long (or formal) name. For example 528 Netherlands =
"Nederland" and "Koninkrijk der Nederlanden". Further, countries which are bilingual have
two sets of local short and long/formal names (e.g., 058 Belgium =
"Belgie" and "Koninkrijk van Belgie", and
"Belgique" and "Le Royaume de Belgique". There are also multilingual countries,
(e.g., Switzerland). Exhibit 3.3
provides an illustrative example.
Then, there is
the fact that many countries use non Latin Alphabet-based character sets. This means that one also has the original
country language character script as "alphas" plus their
"latinized" equivalents.
To this is
added the fact that from the perspective of each country and language, the
"other" countries have (are known by) their "own
names". For example, a person in
France uses "Allemagne" not "Germany" or
"Deutschland" as the linguistic equivalent for "3166:280". All this is common, non-competitive
information.
It suffices to
state that all the rules and intelligence implicit in ISO 3166-1 (as well as
3166-2 and 3166-3) have not yet been captured explicitly in an IT-enabled and
EC-facilitated manner, (e.g., as a "normalized" (callable) database).
This section
concludes, with Exhibit 3.3 where, in matrix form, we present in the left
column titled "IT-Needs", i.e., the "interface" requirement
for ISO 3166-1 among information systems.
In the right-side columns are presented some examples of the multiple
linguistic equivalents required to support "Localization and
Multilingual" requirements for particular implementations and supporting
IT applications which in turn may have human readable User Interfaces.
Notes on Exhibit 3.3
[1] Normally the eight (8)
"fields" under "Localization and multiculturalism" would be
separate (sets of) "columns" in a database schema all forming part of
the "row". It is noted that
the physical presentation here in Exhibit 3.3 does not reflect this.
[2] The 2-letter language codes, (e.g., en,
fi, fr, nl, sv, tr), are taken from ISO 639.
[3] The "->" entries are not
part of ISO 3166-1. Although only the
Latin alphabet character set is utilized in this and the other two examples, it
is understood that non-Latin alphabet-based character sets are also used or
will be used in electronic commerce. As
such this is an "illustrative" example.
EXHIBIT 3.3 - TOWARDS AN IT-ENABLED STANDARD FOR
ISO 3166 - COUNTRY CODES
IT-Needs (Interface) |
Localization
and Multilingual Requirements [1] |
|
3166:246 |
Alpha-2: FI |
Alpha-3: FIN |
|
Short Name
(en) [2]: Finland |
Long Name
(en): Republic of
Finland |
|
Short Name
(fr): Finlande |
Long Name
(fr): République
de Finlande |
[3] -> |
Local Short
Name (fi): Suomi |
Local Long
Name (fi): Suomen tasavalta |
-> |
Local Short
Name (sv): Finland |
Local Long
Name (sv): Republiken av Finland |
3166:056 |
Alpha-2: BE |
Alpha-3: BEL |
|
Short Name
(en): Belgium |
Long Name
(en): Kingdom of
Belgium |
|
Short Name
(fr): Belgique |
Long Name
(fr): Royaume de
Belgique |
-> |
Local Short
Name (nl): Belgie |
Local Long
Name (nl): Koninkrijk van Belgie |
-> |
Local Short
Name (fr): Belgique |
Local Long
Name (fr): Royaume de Belgique |
3166:792 |
Alpha-2: TR |
Alpha-3: TUR |
|
Short Name
(en): Turkey |
Long Name
(en): Republic of
Turkey |
|
Short Name
(fr): Turquie |
Long Name (fr): République turque |
-> |
Local Short
Name (tr): Turkiye |
Local Long
Name (tr): Turkiye Cumhuriyeti |
At times
several different standards are used internationally for the same domain. One such domain is that of codes
representing languages. With respect to
sets of codes representing "languages" (and "countries"),
these provide examples where the ISO is not the only organization to issue and
maintain standards used world-wide even though its standard is the most
widespread used and known, i.e., ISO 639, "Codes representing the names of
languages". This standard and its
contents is the responsibility of ISO TC37 - Terminology (principles and
coordination).
Another
international standard providing a coding schema for country codes and language
codes is that of the US Library of Congress.
Its primary application is in the bibliographic/information sciences
domain. It should be noted that these
coding schemas pre-date those of the ISO.
For country codes the Library of Congress uses two or three character
lower case alphabetic codes. These
represent existing national entities, provinces and territories of Canada,
states of the United States, divisions of the United Kingdom, and internationally
recognized dependencies. It is known as
the USMARC[2] Code List for
Countries and is maintained by the Library of Congress. Similarly the Library of Congress maintains a USMARC Code List
for Languages. This code list
consists of three letter mnemonics representing only written languages of the
modern and ancient world. "Where one spoken language is written in two
different sets of characters, each set of characters is assigned a specific
code. For example, Serbian and Croatian
are the same spoken language but the former is written in the Cyrillic alphabet
and the latter in the Roman alphabet" ("Roman" known within
ISO as the "Latin" character set).
A third
international standard, the Universal Decimal Classification (UDC) scheme also
has language codes. The UDC is used in
the bibliographic/information science work (primarily in Europe), as well as
increasingly used there for classifying documents on the Internet.
Human beings
can recognize and filter these differences, computers cannot unless explicitly
instructed. Keeping in mind that the
scope and definitions of these different coding schemes also differ for what
are generally the same business needs, one can bridge such differences through
construction of concordance tables.
This allows one to maximize, insofar possible, interoperability across
differing sectorial perspective as well as identifying
"non-interoperability" instances.
In ISO 639 each
entry (or permitted instance) consists of:
Ř a language symbol, in the form of a
two-letter code;
Ř the language name - English
Ř the language name - French
Ř the original language name (as written in
the Latin-1 alphabet).
ISO 639 also
has a note field in English and in French.
With respect to
ISO 639, two initial observations must be made. The first is that Canada (and the United States) has not adopted
ISO 639 as a "national standard" due primarily to its current lack of
inclusion of North American aboriginal and native languages. Secondly, the LANG attribute is important in
SGML (ISO/IEC 8879). for example, in
the proposed New Work Item (ISO/IEC JTC1 N4742) for "Standard HTML",
the LANG attribute
"identifies
a natural language spoken, sung, written or otherwise used by human beings for
communication between people. Computer
languages are explicitly excluded. The
value of the LANG attribute is referred to as the "language
tag"... The name space of language
tags is administered by IANA. Example
tags include: en, en-US, en-cockney, i-cherokee and x-pig-latin.
Two
letter primary tags are reserved for ISO 639 language abbreviations. This Committee Draft does not specify
three-letter primary tags, however their description may be found in the
"Ethnologue" {Gri92}. Any
two-letter initial sub-tag is an ISO 3166 country name..."
Serious
reflection and more systematic thinking is required here with respect to
"tags" especially if one wishes to use SGML Ž HTML Ž XML generally and in electronic commerce specifically as well as
ensuring interoperability not only with the use of other syntaxes but
among various consumer markets, industry sectors, etc.
First of all,
"i" and "x" are single characters; and they do not exist in
ISO 639. Secondly, "cherokee"
and "pig-latin" are not ISO 639 languages. Thirdly, for "en-us", it is not clear at all, given the
other examples whether this represents English language as used in the United
States or something else.
Fourthly, use
of Alpha-2 code tags for ISO 3166 country name is confusing vis-ŕ-vis ISO 639
language codes. They overlap and are
not mutually exclusive. This at times
is confusing for humans (and even more so for "dumb" computers). Fifthly, in many sort algorithms and
search/retrieval engines, upper and lower case letters are treated the
same. This causes even more confusion
in IT-enabled processing of these code sets if two letter alphas are used as
codes for both countries and languages.
Finally, there
is an urgent need to update ISO 639 to include North American aboriginal and
native languages as well as providing for a systematic means for handling and
registering user extensions, (e.g., "cockney", "pig latin",
"klingon", etc.).
Alternatively, one could consider developing an "ISO 639 Level
2" standard for codes representing user extensions of the nature noted
above, as well as "historical languages", , i.e., as is being
developed for ISO 3166-3.
Even more important is the need to develop a systematic
and unambiguous interworking in an IT-enabled manner among language code
(ISO 639), currency and fund codes (ISO 4217) and country codes (ISO 3166-1).
One should also
develop mechanisms for the interchange of the "same" data content
from a cross-industry sector perspective but using different code sets in the
same domain. To assist in progressing
work in this area, chapter concludes with an Exhibit 3.4 which consists of a
sample concordance table (English language version only) for ISO 3166
(Country Codes) + ISO 639 code set on
the one hand, and on the other, the equivalent Library of Congress (LC) country
and language code set and the UDC language code set.
Notes on
Exhibit 3.4
[1] As
provided by Féderation internationale d'information et de documentation (FID)
based on documentation prepared in 1994 (and verified with them early
1998). The UDC also has "country codes", i.e.,
"place codes", as a "common" auxiliary table, but this has
not been included in Exhibit 3.4.
[2] For human representation, we have
included the "Short Name - English" as the linguistic equivalent for
the ISO 3166-1 3-digit numeric code.
[3] For
human representation, we have also included the ISO 639 English name of the
language. There is also the French name
and of course the actual "name" of the language in the language
itself. ISO 639 captures this
"Original" name in its Latin alphabet equivalent version.
[4] One
notes that the LC alpha country codes are not the same as ISO 3166-1 alpha
codes for the same entities.
[5] Added
here to indicate that in Canada under the Nunavut Act, a new
"territory" will be established 1 April, 1999 from the existing
Northwest Territories, i.e., "Nunavut". In Nunavut, in addition to English and French, Inuktitut will
become a recognized "official" language. The language code "ik" is the one that has been
reserved for Inuktitut.
[6] In
ISO 639 the "ik" represents "Inupiak" which is
grouped/classified as an "Eskimo language". In UDC, the language code 562 refers to "Inuit". There is no code for "Inuktitut" per se.
[7] The
LC codes place "Inuktitut" under the Eskimo family of languages.
[8] One
notes that all LC language codes are not the same as ISO 639. At times even the first letter is not the
same, (e.g., "nl" versus "dut").
EXHIBIT 3.4 ž SAMPLE CONCORDANCE OF STANDARDS FOR
COUNTRY AND LANGUAGE CODES: ISO, LC AND UDC
ISO |
Library of Congress |
UDC [1] |
||||
3166-1 |
639 |
|
|
|
||
Numeric Code |
Short Name
(E) [2] |
Applicable
Languages (E) [3] |
Applicable
Language Codes |
Country Codes |
Language
Codes |
Language
Codes |
3166-1:124 |
Canada |
English |
en |
xxc [4] |
eng |
= 111 |
|
|
French |
fr |
|
fre |
= 133.1 |
|
|
Inuktitut [5] |
ik [6] |
|
esk [7] |
= 562 |
3166-1:056 |
Belgium |
French |
fr |
be |
fre |
= 133.1 |
|
|
Dutch |
nl [8] |
|
dut |
= 112.5 |
3166-1:246 |
Finland |
Finnish |
fi |
fi |
fin |
= 511.111 |
|
|
Swedish |
sv |
sw |
swe |
= 113.6 |
3166-1:792 |
Turkey |
Turkish |
tu |
tu |
tur |
= 512.164 |
3166-1:840 |
United States |
English |
en |
xxu |
eng |
= 111 |
3166-1:826 |
United Kingdom |
English |
en |
xxk |
eng |
= 111 |
|
|
Scots Gaelic |
gd |
|
gae |
= 152 |
|
|
Welsh |
cy |
|
wel |
= 153.1 |
In combining
multilingualism and localization requirements, one must recognize the fact that
associated with use of a language, (e.g., English, French German, Spanish,
Portuguese, etc.), there are various "local" uses of the same natural
language. The same object may well be
and is often used and known by different terms in the same language in
different local usage conventions. The
Universal Product Code (UPC) and European Article Numbers (EAN) systems
recognize this as they have multilingual terms associated with each code for
"local" packages/labelling purposes.
This has implications for Electronic Commerce and particularly that via
the Internet. For example, English as a
language is in use in many countries or "locales", (e.g., Australia,
Britain, Canada, India, Ireland, Jamaica, New Zealand, USA, etc.), but has
different local uses in each. Similar
examples exist for other languages.
In this
context, the BT-EC took the example of an enterprise wishing to sell potatoes
world-wide. This is a simple example
yet representative of the interplay of the four horizontal issues. This means that these goods have to pass
through customs for export/import into various countries. The custom authorities world-wide have an
organization that sets common rules and procedures, i.e., the World Customs
Organization (WCO), formerly the Cooperative Customs Council (CCC)[3]. The WCO has established a classification
scheme for goods traded called the Harmonized System (HS). It was formerly known as the Brussels Tariff
Nomenclature (BTN). As such the HS for
"commodity codes" is an internationally recognized standard although
of a non-ISO/IEC/ITU origin.
Within the
Harmonized System (HS) of the WCO, the general code for potato (fresh or
chilled) is "0701". This
linguistically neutral code "0701" is a data item or data element
instance in the HS permitted value domain.
Here the German German
equivalent name of "potato" is "kartoffel", but the
Austrian German equivalent is "Erdapfel". Similarly, the Spanish Spanish equivalent name is
"patata" while the Mexican Spanish equivalent name is
"papa", and the Dutch equivalent is "aardappel", etc. In French, the dictionary term is "pomme de terre", with
"patate" as a "local" specific, i.e., Canada/Quebec term
(and one which is not slang). The
equivalent names noted above are thus culturally adapted equivalent linguistic
expressions associated with "0701". Depending on the "locale," the appropriate human
oriented names or linguistic expressions can be systematically/automatically
generated from the linguistically neutral numeric code for human understanding,
product labelling, reporting, filing, etc., and, where required, in multiple
languages.
From a more
detailed analysis, one can conclude two key aspects of the interworking of
"localization", "cultural", and "multilingual"
requirements; namely:
(1) that within a jurisdiction, (e.g., a
country, a province, canton, etc.), there can be more than one natural language
of use; and,
(2) that localization needs can result in a
product, i.e., entity or object, having more than one equivalent
"name" within a particular natural language.
In Exhibit 3.5,
we present the "potato" from an IT-enabled and EC-facilitated
perspective. Again on the left-hand
side of the matrix, under "IT-Needs (Interface)" we identify the
schema ID, i.e, "HS" along with the permitted value, i.e., in this
case 0701 for potato. In the middle
column, we present examples of countries that import potatoes, i.e., using
their ISO 3166 country code and short name (English), while on the right-hand
side, we present linguistic equivalents required to support local and
multilingual requirements from both a jurisdictional and consumer perspective.
Notes on
Exhibit 3.5
[1] Exhibit 3.5 focuses on human
understandable representation of what should be an IT-enabled global standard
for trade in goods based on the existing Harmonized System (HS) of the World
Customs Organization (WCO). The example
here is "potato", i.e., fresh or chilled potatoes, where, under the
HS, "0701" is the primary code, and ".01" is for seed
potato, while ".09" is for "other potatoes". For the purpose of this example we use
"0701". There are additional
codes for potatoes which are "frozen", i.e., 0710.10,
"cut/sliced/broken or powder", i.e., 0712.10, etc. Each of these will have their own
local/linguistic equivalents. Further,
there are "sweet potatoes" which could provide an even richer
example.
Finally, it is understood that there
are other set of codes, i.e., value domains, where "potato" as an
instance is identified and referenced with a different code. For example, in the domains of agriculture,
pesticides, retail/food stores, etc.
In classification and coding schemas
utilized in these other sectors, "potato" has a different code. This is understandable since the goal of the
business context in which they are used is quite different from that of customs
authorities.
[2] The
country code and short name are taken form ISO 3166-1.
[3] The
2-letter language codes, (e.g., de, en, es, fi, fr, ik, nl, sv), are taken from
ISO 639.
[4] In
1999, Nunavut will become a new territory with Inuktitut as an added
"official" language to English and French. In Inuktitut "potato" is "patiti"
(transliterated Latin character set equivalent) to the in Inuktitut language
character string used to designate "potato"
EXHIBIT
3.5 - COMMODITY CODE EXAMPLE "POTATO" [1]
IT-Needs (Interface) |
Country
Code - Short Name (en) [2] |
Localization
and Multilingual Needs [3] |
HS: 0701 |
124 CANADA |
(en): potato (fr): pomme de terre (ik): patiti [4] |
|
464 MEXICO |
(es): papa |
|
724 SPAIN |
(es): patata |
|
040 AUSTRIA |
(de): erdapfel |
|
276 GERMANY |
(de): kartoffel |
|
056 BELGIUM |
(fr): pomme de terre (nl): aardappel |
|
246 FINLAND |
(fi): peruna (sv): potatis |
[1]Apart from some minor editing changes, (e.g., renumbering, spelling, typos, etc.), Chapter 2 is a verbatim extract of Clause 6 of the BT-EC Report to JTC1, i.e., JTC1 N5296, pages 22-27.
[2]The acronym "MARC" stands for "Machine Readable Cataloguing". The preceding characters represent the country who utilize the "MARC" format, have amended it for their specific cataloguing needs, and have an infrastructure at the national level for addressing these national needs. There are primarily 3 countries namely the US, Canada, and the UK Hence the designation of, USMARC, CANMARC, UKMARC.