1. INTRODUCTORY NOTES

1. JTC1 activities of its Business Team on Electronic Commerce (BT-EC) and the Cultural Adaptability Workshop (CAW) both completed their work and reported to the 12^th Plenary Meeting of ISO/IEC JTC1, 2-5 June, 1998 in Sendai, Japan. JTC1 document N5448 contains the resolution of this Plenary. Resolutions 8, 9 and 10 pertain to JTC1 follow-up on the BT-EC Report and recommendations. Resolution 22 pertains to JTC1 follow-up on CAW and its recommendations.

Members of the BT-EC participated in CAW. The BT-EC scheduled its final meeting to be held after the Workshop on Cultural Adaptability so that the BT-EC could benefit from the results of CAW.

In Resolution 8 (N5448), JTC1 instructs its secretariat to circulate the BT-EC Report to "National Bodies" and all JTC1 Technical Directions for review and comment.

2. The purpose of this document is to serve as a "non-technical" summary of work of the ISO/IEC JTC1 Business Team on Electronic Commerce (BT-EC) with respect to "Horizontal Aspects" and "Encodable Value Domains". {See further below}. The BT-EC Report (N5296) contains many recommendations pertaining to "encodable value domains". Here this document serves as a backgrounder.

3. This document consolidates in one contribution contents from two existing JTC1 documents; namely:

(1) in Chapter 2, the text found in Clause 6.0 of the BT-EC "Report to JTC1: Work on Electronic Commerce Standardization to be initiated" (JTC1 N5296; and,

(2) in Chapter 3, which is based on text found in a Canadian member body contribution titled "Additional information in support of the BT-EC Report (JTC1 N5296) - Examples of Encodable Value Domains with IT-Interface Needs, Localization and Multilingualism" (JTC1 N5394).

4. While directed at JTC1/SC32/WG2 (and WG1) and the JTC1 "Ad-Hoc on Cultural and Linguistic Adaptability". This contribution is also intended to be circulated outside of JTC1 to raise awareness and obtain feedback on the topics covered here.

5. Horizontal Issues - Capsule Overview

In the BT-EC Report, cultural and linguistic adaptability were deemed to be important to electronic commerce. In addition to being noted as part of consumer requirements {Section 5.2}, they were identified by the BT-EC as key components of four horizontal issues which are of general relevance for all scenarios involving Electronic Commerce. These issues are:

Ø information technology (IT)-enablement;

Ø localization including multilingualism;

Ø cross-sectorial aspects; and,

Ø cultural adaptability.

The BT-EC ordered these horizontal issues on the basis of:

(1) the need to go from the simpler to more complex challenges;

(2) placing priority on the "do-able" and immediately most useful in the context of increasing resource constraints in standardization work; and,

(3) promotion and visibility of ISO/IEC JTC1 work within the ISO, IEC and ITU and especially outside of these standardization communities.

From a user perspective, these four horizontal issues need to be addressed in a harmonized manner.

From an Electronic Commerce perspective, i.e., that of the JTC1/BT-EC perspective, standardization work addressing the first three horizontal issues associated with:

Ø "IT-enablement";

Ø "Localization and Multilingualism"; and,

Ø "Cross-Sectorial" aspects,

should resolve many of the requirements for cultural adaptability. It then remains to be seen what other "cultural adaptability" requirements remain, i.e., in addition to those already identified as "cultural elements" and/or those of a societal nature.

2. HORIZONTAL ISSUES [1]

2.1 Overview

BT-EC identified four horizontal issues as being of general relevance for all scenarios involving Electronic Commerce and gave these horizontal issues some prominent attention in its work. These issues are:

· information technology (IT)-enablement,

· localization including multilingualism,

· cross-sectorial aspects,

· cultural adaptability.

These horizontal issues are ordered here on the basis of

1. the need to go from the simpler to more complex challenges,

2. placing priority on the "do-able" and immediately most useful in the context of increasing resource constraints in standardization work; and,

3 promotion and visibility of ISO/IEC JTC1 work within the ISO, IEC and ITU and especially outside of these standardization communities.

From a user perspective, these four horizontal issues need to be addressed in a harmonized manner.

A key characteristic of commerce world-wide, in particular in the business-to-business and business-to-administration domains, is that it consists of business transactions which:

1. are rule-based, i.e., mutually understood and accepted sets of business conventions, practices, procedures, etc.; and,

2. make extensive use of "codes", often table-based, representing predefined possible choices for common aspects of business transactions. Examples include countries, currencies, languages, manufactures and their products.

Many of these sets of agreed-upon rules used in business world-wide and their associated lists of tables/codes are "de jure" and "de facto" standards. BT-EC noted that numerous international standards are already in use in support of commerce world-wide. The problem is that most are paper-based and lack a computer-processable version. Even if distributed in electronic form, these standards including those of ISO, used in commerce world-wide consist of tens of printed pages. They cannot be "plugged-in" for use in Electronic Commerce. Much of the intelligence in these international standards is humanly understandable explicitly or implicitly. They have not been described formally using Formal Description Techniques (FDTs), i.e., in their present form they do not support "computational integrity". Consequently, each enterprise using these code sets has to spend considerable time and effort to (1) determine their meaning and interpret them; (2) build applications; and, (3) hope that they interoperate with other networks or enterprises.

Human beings like to name "objects". But the approach of using "names" is not very IT friendly, cost-efficient or time-efficient.

Depending on the interplay of multilingual and localization requirements, in Electronic Commerce, a singular product or service being offered for sale will have multiple names and differing names even in the "same" language. Thus, if we wish to ensure rapid and widespread use of Electronic Commerce globally, we must on the one hand identify "objects", i.e., products or services being offered for sale, in an unambiguous, linguistically neutral, and IT-processable and EC-facilitated manner, and, on the other hand, present the same via a range of linguistic names (and associated character sets) from a point-of-sale perspective, i.e., human-readable user interface, as required by the "local" marketplace.

In order to provide a focus for its work on horizontal issues, the BT-EC utilized four real world examples; namely:

· Currency Codes,

· Country Codes,

· Language Codes,

· Commodity Codes.

(For details of these examples see Chapter 3 below and JTC 1/BT-EC N 047).

These examples represent standards used for commerce world-wide and are presently implemented by enterprises and their information systems in wide variety of different ways. There are also no "standard" ways for the interworking among these and similar standards. This does not promote global interoperability. The recent widespread use of the Internet is exacerbating existing ambiguities.

From a BT-EC perspective, these four examples underline the fact that with respect to electronic commerce there may be less of a need for new standards. Rather the immediate challenge may well be the development of a category of information technology standards which will facilitate the development of information technology enabled versions of existing standards used in commerce and do so in a manner which also supports the interplay of localization and multilingual requirements, i.e., "bridging standards".

BT-EC wishes to pass on the following considerations for such standardization work in support of Electronic Commerce; namely:

1. Standards must focus on the interface (as opposed to implementation) as the best means of arriving at globally harmonized solutions for interoperability from both a business and information technology perspective.

2. Standard interfaces among information systems must be technology neutral accommodating advances in technology to the extent possible. Further, such standard interfaces must be linguistically neutral to the furthest extent possible.

3. In order to empower users and consumers, standards should be adaptable to local and multilingual requirements at national and regional levels, while ensuring full transparency of available market solutions to the consumer. Multilingualism must be considered. The expansion of open, multilingual standards could significantly increase the volume and value of world-wide Electronic Commerce.

2.2 Information Technology (IT) -enablement

"IT-enablement" is the term used to identify the need to transform currently accepted standards used in commerce world-wide from a manual to a computational perspective. Electronic commerce, in particular of the Business-to Business or Business-to-Administration categories, introduces a requirement for standards that are prepared, structured and made available for unambiguous usage within and among information systems. This requirement can be expressed as "computational integrity", in particular:

"the expression of standards in a form that ensures precise description of behaviour and semantics in a manner that allows for automated processing to occur, and the managed evolution of such standards in a way that enables dynamic introduction by the next generation of information systems".

The objective of IT-enablement is to capture in a computer-processable manner, and one which maximizes interoperability, the implicit rules and relations (i.e., those known to "experts") of the code sets found in standards used in commerce world-wide, i.e., capture and state from an entity relationship and/or object technology perspective, using Formal Description Techniques. Also, issues arising from change management in "code tables", i.e., synchronization, backwards compatibility, migration, etc. need to be addressed.

IT-enablement is based on the premise that a detailed and exhaustive identification of standards and "conventions", etc., used in support of existing commerce, will eliminate many barriers to Electronic Commerce.

IT-enablement recognizes that within ISO, IEC and ITU, there are committees which have the domain responsibility and expertise in areas of work, the primary purpose of which is to manage and control the content. IT-enablement also recognizes that outside of ISO/IEC/ITU, there are many other organizations which have domain responsibility and expertise in subject areas relevant to global Electronic Commerce. Their "content" and industry sector domain oriented standards require an IT-enabled version for use in Electronic Commerce.

BT-EC suggests that JTC 1 gives proper consideration to IT-enablement, initially focused on currency, country, language and commodity codes. Members of BT-EC are of the opinion that such work will serve as the necessary practical experience and expertise needed to develop a generalized approach to "IT-enablement". This should also help to support localization and multilingual requirements.

(For further information, see document ISO/IEC JTC 1/BT-EC N 46.)

2.3 Localization and multilingualism

IT-enablement is based on the premise that to ensure rapid and widespread use of Electronic Commerce globally, we must on the one hand identify "objects", i.e., products or services being offered for sale, in an unambiguous, linguistically neutral, and IT-processable and EC-facilitated manner, and, on the other hand, present the same via a range of linguistic names (and associated character sets) from a point-of-sale perspective, i.e., human-readable, as required by the "local" marketplace.

BT-EC reviewed existing JTC 1 terms and definitions of "locale", (see ISO/IEC JTC 1/BT-EC N 46). Those aspects normally are related to the character sets associated with a natural language, including collating/ordering, data/time formats, monetary formatting, etc., a.k.a. "cultural elements".

From an Electronic Commerce perspective, BT-EC identified four additional sets of parameters of "localization" requirements which should be addressed, namely:

1. jurisdictional requirements, i.e., various combinations of "top-down" legal and regulatory frameworks which place constraints on the global marketplace and in doing so, often define/establish a "local" market;

2. consumer requirements, i.e., combinations of "bottom-up" consumer demands and behaviour;

3. supplier requirements, i.e., combination of factors impacting on suppliers of goods and services (as well as those involved in supporting logistics chains); and,

4. human rights-related requirements, (e.g., disabled/handicapped, privacy, etc.).

BT-EC defines "localization" as:

localization: pertaining to or concerned with anything that is not global and is bound through specified sets of parameters of:

(a) a linguistic nature including natural and special languages and associated multilingual requirements;

(b) jurisdictional nature, i.e., legal, regulatory, geopolitical, etc.;

(c) a sectorial nature, i.e., industry sector, scientific, professional, etc.;

(d) a human rights nature, i.e., privacy, disabled/handicapped persons, etc.; and/or

(e) consumer behaviour requirements.

Within and among "locales", interoperability and harmonization objectives also apply.

From an Electronic Commerce perspective, "jurisdiction", on the whole, represents a set of local market entry and/or participation requirements which may be of a general nature or product/service-specific.

From a legal perspective, the basic entity is the country. Two or more countries among themselves can form a common harmonized "jurisdiction" governing the marketplace, through a bilateral or multilateral agreement. Where these agreements are of a general nature, the harmonized "jurisdiction" is know as a "region". Examples here include the European Union, NAFTA, etc.. Within countries, there may be various approaches to more granular legal and regulatory frameworks, e.g., at the level of states, provinces, etc.

In addition to a jurisdiction with a geographic dimension, there are jurisdictions bounded by a goods and services dimension. Examples here include airlines, banking, oil companies, etc. Here jurisdiction is often expressed through treaties, regulations, agreements, etc., which are harmonized through an entity representing these communities (e.g., ICAO, WCO, or WTO).

Combinations of laws and regulations can be viewed as frameworks. BT-EC can thus define jurisdiction as:

"jurisdiction: a distinct legal and regulatory framework which places constraints on the global marketplace and in doing so often defines/establishes a local market".

Electronic commerce is "borderless" in its nature - it transcends jurisdictions.

From a BT-EC perspective, multilingual requirements comprise more than just the need to support the character sets and sort/collate sequences of the various languages used by customers world-wide. It also means that a single natural language is utilized in different ways in various local markets.

In addition, one should add the concept of special languages, i.e., those of a scientific or technical nature, as well as those which pertain to a specific industry sector. Many of these can be considered to be global in nature and use.

Thus from an Electronic Commerce perspective, "multilingual" requirements embody not only:

1. multiple natural languages; but also,

2. multiple and different uses of the "same" natural language;

3. multiple source languages in any multilingual thesauri, database, referenceable permitted value domains (PVDs), i.e., tables, code sets, etc.; and possibly also,

4. the use of special languages.

In this context, one can define:

multilingualism: "the ability to support not only character sets specific to a language (or family of languages) and associated rules but also localization requirements, i.e., use of a language from jurisdictional, sectorial and consumer marketplace perspectives".

From a BT-EC perspective adding multilingual capabilities in Electronic Commerce can be viewed as simply mirroring the existing physical world requirements. Prime examples here are product labelling requirements and product usage instructions. Given the increasing globalization in trade in goods, single language usage instructions accompanying products are increasingly rare and multilingual usage instructions increasingly common place.

2.4 Cross-Sectorial issues

Cross-sectorial issues pertain to differing, at times conflicting, understandings of business practices, object identification, etc., among economic sectors. The challenge here is that of resolving two sets of issues:

1. Industry sectors, scientific fields, and professional disciplines assign their own uses or meanings to the terms of a natural language. Quite often natural languages are used in the manner of what we earlier called "special languages": the same word/term frequently has very different meanings in other industry sectors. There is a trend in various sectors towards using existing non-technical "common language" words as terms with new technical meanings. This problem of polysemy needs to be taken into account in cross-sectorial Electronic Commerce.

2. Multilingual equivalency needs to create an added layer of complexity and even more so for unambiguous cross-sectorial interoperability in support of Electronic Commerce (as well as world-wide "individual-to-business" Electronic Commerce via the Internet).

A case study on cross-sectorial issues (see JTC 1 /BT-EC N 045) led in respect to scientific languages to the conclusion that a scientific language can be considered a culturally neutral exchange language which, in turn, has multiple natural language and culturally dependent linguistic equivalent terms.

Technical languages and their use in particular industry sectors, however, do present particular challenges to cultural adaptability and cross-sectorial interoperability since they do not have the attributes of scientific languages. Technical languages as linguistic sub-systems are difficult enough to handle even within their industry sector, in one natural language. To this are added the challenges of localization, multiculturalism and cross-sectorial interactions in Electronic Commerce.

Each industry sector interacts with other sectors. A key characteristic of special languages is an associated controlled vocabulary of terms, often also in a multilingual manner.

In conclusion, it should be noted that within industry sectors, established standards and conventions exist for unambiguous identification and referencing of unique objects, and for naming them (often multilingually), along with associated rules. Although not originally designed to interoperate across and among industry sectors, many of these sectorial standards have core constructs in common which could be utilized to support cross-sectorial Electronic Commerce and in a manner which accommodates localization and multilingual needs.

2.5 Cultural adaptability

BT-EC viewed "cultural adaptability" as a set of requirements affecting global Electronic Commerce from a cultural perspective and noted that these can co-exist within "localization" and "multilingualism" requirements. In addition, there are societal aspects which often are not bounded by jurisdiction or geographic area (e.g., Jewish and Muslim cultures transcend jurisdictional boundaries).

The following definition of "cultural adaptability" is found in JTC 1 N4627:

The special characteristics of natural languages and the commonly accepted rules for their use (especially in written form) which are particular to a society or geographic area. Examples are: national characters and associated elements (such as hyphens, dashes, and punctuation marks), correct transformation of characters, dates and measures, sorting and searching rules, coding of national entities (such as country and currency codes), presentation of telephone numbers, and keyboard layouts".

This definition of the concept/term "cultural adaptability" is the same as that for "cultural elements" found in ISO/IEC JTC 1/CAW N 008. It has a focus on special characteristics of natural languages and commonly accepted rules for their use which are particular to a society or geographic area. The emphasis here appears to be on character sets, scripts, glyphs, etc., their ordering, sorting, search, etc.

However, in commerce world-wide, it is not so much the natural language but the usage of special languages (e.g., technical and scientific), which forms a significant challenge to providing interoperability in Electronic Commerce. This is true especially for "technical" uses of natural languages by different industry sectors. Differences in uses of a natural language exist also in industry sectors which represent sets of requirements other than those particular to a society or geographic area.

BT-EC made an effort to coordinate the work on this horizontal issue with the JTC 1/CAW (Cultural Adaptability Workshop). BT-EC notes Resolution 3 of JTC 1/CAW which states "that CAW did not have time to address the request of JTC 1 to elaborate or amend the definition of cultural adaptability as contained in the document JTC 1 N4627".

From an Electronic Commerce perspective, standardization work addressing the three horizontal issues associated with

· "IT-enablement",

· "Localization and Multilingualism", and

· "Cross-Sectorialization"

should resolve some of the requirements for "cultural adaptability". It then remains to be seen what other "cultural adaptability" requirements remain, i.e., those of a societal nature (see also 5.2.2)"

-------------

[Note: Section 5.2.2 in the BT-EC Report pertains to "Consumer requirements for Electronic Commerce"].

3. REAL WORLD EXAMPLES OF ENCODED VALUE DOMAINS

3.1 Introduction

1. Chapter 3 is based on a Canadian contribution to JTC1, i.e., N5394. This contribution provided additional and more detailed information in support of Clause 12.3 of the BT-EC Report titled "Examples of Encodable Value Domains" (BT-EC Report, pages 61-66). The Canadian contribution also provided three exhibits in support of the examples.

2. The examples are currency codes, country codes, language codes, and commodity codes. These four real world examples were developed to provide a focus for the BT-EC work on four horizontal issues. The three exhibits have proved useful in Canada in illustrating and explaining the horizontal issues in a simple and non-technical manner to the business community, policy makers, and various industry sectors.

The exhibits provided are intended to demonstrate that the identification and referencing of real world objects, i.e., as "instances" of an object class in an "encodable value domain" can be done in a linguistically neutral and unambiguous manner.

This supports a global approach to Electronic Commerce which is capable of meeting localization and associated multilingual requirements. Linguistically neutral identification and referencing of objects will also support computational integrity and more efficient data interchange, with higher quality assurance and at lower costs for all participants.

3. Those interested in standardization in areas pertaining to Electronic Commerce may find these exhibits useful in illustrating the horizontal aspects. They can also use them and augment them by adding their own country and language equivalent(s) terms for the linguistically neutral code(s) in the exhibits.

The contributions from BT-EC members with respect to their localization and accompanying linguistic requirements as found in the three exhibits is appreciated.

4. Finally, it is useful to draw attention to the BT-EC Report (in Clause 6 on pages 21 and 22) which states:

"Human beings like to name "objects". But the approach of using "names" is not very IT friendly, cost-efficient or time-efficient.

In support of this BT-EC text, Canada draws attention to ISO 1087 which defines "name: designation of an object by a linguistic expression".

Consequently, any "object" will have (1) multiple names; and, (2) in global Electronic Commerce, many of the "names" used to designate the "object" being traded will be in the form of linguistic expressions which use non-Latin 1 Characters, (e.g., Arabic, Chinese, Thai, Hebrew, Japanese, etc.). This is one reason why ISO/IEC 10646 (a.k.a. "Unicode") will be a key IT infrastructure standard needed to support global electronic commerce.

3.2 Example #1 - Currency Codes

A key attribute of electronic commerce is that it involves business transaction where payment must be made in a mutually acceptable currency. ISO 4217 is the standard for codes representing currencies and funds. This standard and its contents are the responsibility of ISO TC 68 Banking. The principles for inclusion in the code lists of ISO 4217 is that (1) they must be/represent currencies and funds used within the entities described by ISO 3166 (Country Codes); and, (2) the codes listed are intended to reflect current status, at the date of publication.

ISO 4217 has a number of features and anomalies which although human understandable need to be identified and explicitly captured in an IT-enabled manner. In short, ISO 4217 includes objects which are not currencies (or funds). In ISO 4217, there are countries, i.e., as ISO 3166 entities, where:

Ø the three digit country code is not the same as the three digit ISO 4217 3-digit code, (e.g., due to the creation/utilization in ISO 4217 of ISO 3166 "User Extensions"). For example, one can readily identify in ISO 4217 twenty-five (25) instances for ISO 3166 entries where the ISO 3166 Country Codes 3-digit numeric differs from the ISO 4217 "Code Name" 3-digit numeric. Nor is there any relation between the ISO 3166 and ISO 4217 alpha codes for many countries.

Ø a country (or dependency) has no currency of its own and utilizes the currency of another country;

Ø a country has more than one currency, i.e., its own and that of another country;

Ø countries having both a currency code and a funds code;

Ø a set of countries collectively sharing and using a currency which has no "issuing country", (e.g., SDR, XDR, XOF, and XAF). Here one notes the need to add the "euro" as currency (in addition to the "ecu", i.e., XEU);

Ø special fund types;

Ø "currency" not linked to any country or organization, (e.g., precious metals such as gold - 959, alpha = XAU, special settlement currencies, etc.); and,

Ø "currencies" having no numeric code but only a 3-alpha code, (e.g., XFO = Gold Franc).

Some of the above noted rules and relationships are stated in ISO 4217, others are implicit (and known by "experts"). An IT-enabled version of ISO 4217 is required especially now that in electronic commerce, and particularly that which is Internet-based. Many suppliers and consumers entering the electronic commerce market or other Internet-based activities are not aware of the "peculiarities" of ISO 4217, particularly those outside the financial community.

Experiences in the financial services/banking sector indicate that on the Internet those engaged in electronic commerce as well as in general applications, need to be made aware of standard notation for currencies. For example, in actual e-com practices, the Canadian dollar is being represented as "CDN", "CAN", "CA", etc. Further, the 3 alpha codes of ISO 3166-1 for countries often are confused with the ISO 4217 3-alpha currency code.

3.3 Example #2 - Country Codes And Localization With Multilingualism

Several international standards are used internationally for codes representing countries. The better known ones are ISO 3166-1, the USMARC Code List for Countries as maintained by the Library of Congress (LC), and the Universal Decimal Classification (UDC) auxiliary table for countries. Of these the ISO 3166-1 is the most widely known. {On the LC and UDC, see further Chapter 3.4 below}

This example focuses on ISO 3166-1. This standard and its contents is the responsibility of ISO TC 46 - Information and documentation. The purpose here is to highlight the need for an IT-enabled version of this standard, and also bring to the fore related localization and multilingual aspects. The title of ISO 3166 is "Codes for the representation of names of countries and their subdivisions". Within ISO 3166 standard, there are now three parts; namely:

Ø Part 1: Country Codes;

Ø Part 2: Country Subdivision codes; and,

Ø Part 3: Codes for formerly used names of countries.

Here ISO 3166-1 "established codes that represent the names of countries, dependencies, and other areas of particular geopolitical interest, on the basis of lists of country names obtained from the United Nations". Currently, each entry (or "record" of a permitted instance) contains:

(1) a three-digit numeric code

(2) a two letter alpha code

(3) a three letter alpha code

(4) a short name - English

(5) a long, i.e., formal name - English

(6) a short name - French

(7) a long, i.e., formal - French

ISO 3166 also has a note field in English and in French.

ISO 3166-1 thus has seven (7) "standardized" representations for each unique entity or object, three (3) of which are codes. ISO 3166-1 allows any one of the seven to be utilized although in practice, and especially in IT systems one usually utilizes one of the three codes.

For this ISO 3166 standard, we currently do not have a common international default "standard" for the interface among applications/information systems engaged in support of electronic commerce. The 3-digit numeric code, the 2-alpha code and the 3-alpha code are all used in interchanges.

Of these three codes, the three digit numeric code is the most stable and tends to change only when the physical boundaries change. Names short and long do change and at times the accompanying two and three-letter alpha codes as well.

The ISO 3166 Alpha-2 and Alpha-3 codes are not that stable, i.e., whenever a country changes its name it often also changes, its alpha-2 and alpha-3 codes, (e.g., Burma to Myanmar, Zaire to the Democratic Republic of the Congo, etc.). The 3-digit numeric code is much more stable. On the whole, it changes only when the actual physical boundaries of the countries change, i.e., the entity being identified and referenced is no longer the same. For example, the alphabetic (written language) equivalents to "3166:180" recently under went the following changes:

	Former	New
Alpha-2	ZR	CD
Alpha-3	ZAR	COD
Short Name (en)	Zaire	Congo, Democratic Republic of the
Long Name (en)	Republic of Zaire	the Democratic Republic of Congo
Short Name (fr)	Zaïre	Congo, la République démocratique du
Long Name (fr)	République de Zaïre	La République démocratique du Congo

The use of Alpha-3 code tags for ISO 3166 country name causes overlap confusion with ISO 4217 currency and funds codes which are represented as Alpha-3 codes (upper case).

Further, the 3-digit numeric code is linguistically neutral and unambiguous. Each of the 3-digit numeric codes has in ISO 3166 associated with it six (6) alphabetic linguistic expressions, two of which also serve as "human understandable" (and computer-processable codes).

From an interoperability perspective, i.e., both that of commerce and IT, the "3166" identifying the scheme and the rule set and the 3-digit numeric code identifying a country, in the context of this domain, together form an unique and an unambiguous global identifier for the entity being referenced. The alpha codes and names should simply be considered linguistic equivalent expressions, i.e., from an information systems perspective all that one may need to standardize at the interface is something akin to "3166:246", "3166:056", "3166:792", etc.

In addition, and also very important, it should be noted that ISO 3166-1 contains many instances and associated codes for entities which are not "countries", i.e., they are dependencies of other entities, (e.g., France, Great Britain, USA, etc.). Human beings "filter" and easily make these distinctions. Computers being very dumb cannot unless explicitly instructed to make things even worse many of these 3166-1 "sub-entities" have one code in 3166-1 and a different code in 3166-2. [Note: One would have thought that when ISO 3166 was split into its three parts, all the sub-entities in ISO 3166 would no longer be found in 3166-1 but be moved to 3166-2].

Further, we should note the fact that in their own locale and language, countries have their own "local" short and long (or formal) name. For example 528 Netherlands = "Nederland" and "Koninkrijk der Nederlanden". Further, countries which are bilingual have two sets of local short and long/formal names (e.g., 058 Belgium = "Belgie" and "Koninkrijk van Belgie", and "Belgique" and "Le Royaume de Belgique". There are also multilingual countries, (e.g., Switzerland). Exhibit 3.3 provides an illustrative example.

Then, there is the fact that many countries use non Latin Alphabet-based character sets. This means that one also has the original country language character script as "alphas" plus their "latinized" equivalents.

To this is added the fact that from the perspective of each country and language, the "other" countries have (are known by) their "own names". For example, a person in France uses "Allemagne" not "Germany" or "Deutschland" as the linguistic equivalent for "3166:280". All this is common, non-competitive information.

It suffices to state that all the rules and intelligence implicit in ISO 3166-1 (as well as 3166-2 and 3166-3) have not yet been captured explicitly in an IT-enabled and EC-facilitated manner, (e.g., as a "normalized" (callable) database).

This section concludes, with Exhibit 3.3 where, in matrix form, we present in the left column titled "IT-Needs", i.e., the "interface" requirement for ISO 3166-1 among information systems. In the right-side columns are presented some examples of the multiple linguistic equivalents required to support "Localization and Multilingual" requirements for particular implementations and supporting IT applications which in turn may have human readable User Interfaces.

Notes on Exhibit 3.3

[1] Normally the eight (8) "fields" under "Localization and multiculturalism" would be separate (sets of) "columns" in a database schema all forming part of the "row". It is noted that the physical presentation here in Exhibit 3.3 does not reflect this.

[2] The 2-letter language codes, (e.g., en, fi, fr, nl, sv, tr), are taken from ISO 639.

[3] The "->" entries are not part of ISO 3166-1. Although only the Latin alphabet character set is utilized in this and the other two examples, it is understood that non-Latin alphabet-based character sets are also used or will be used in electronic commerce. As such this is an "illustrative" example.

3.4 Example #3 - Language Codes And Concordance Among International Standards

At times several different standards are used internationally for the same domain. One such domain is that of codes representing languages. With respect to sets of codes representing "languages" (and "countries"), these provide examples where the ISO is not the only organization to issue and maintain standards used world-wide even though its standard is the most widespread used and known, i.e., ISO 639, "Codes representing the names of languages". This standard and its contents is the responsibility of ISO TC37 - Terminology (principles and coordination).

Another international standard providing a coding schema for country codes and language codes is that of the US Library of Congress. Its primary application is in the bibliographic/information sciences domain. It should be noted that these coding schemas pre-date those of the ISO. For country codes the Library of Congress uses two or three character lower case alphabetic codes. These represent existing national entities, provinces and territories of Canada, states of the United States, divisions of the United Kingdom, and internationally recognized dependencies. It is known as the USMARC[2] Code List for Countries and is maintained by the Library of Congress. Similarly the Library of Congress maintains a USMARC Code List for Languages. This code list consists of three letter mnemonics representing only written languages of the modern and ancient world. "Where one spoken language is written in two different sets of characters, each set of characters is assigned a specific code. For example, Serbian and Croatian are the same spoken language but the former is written in the Cyrillic alphabet and the latter in the Roman alphabet" ("Roman" known within ISO as the "Latin" character set).

A third international standard, the Universal Decimal Classification (UDC) scheme also has language codes. The UDC is used in the bibliographic/information science work (primarily in Europe), as well as increasingly used there for classifying documents on the Internet.

Human beings can recognize and filter these differences, computers cannot unless explicitly instructed. Keeping in mind that the scope and definitions of these different coding schemes also differ for what are generally the same business needs, one can bridge such differences through construction of concordance tables. This allows one to maximize, insofar possible, interoperability across differing sectorial perspective as well as identifying "non-interoperability" instances.

In ISO 639 each entry (or permitted instance) consists of:

Ø a language symbol, in the form of a two-letter code;

Ø the language name - English

Ø the language name - French

Ø the original language name (as written in the Latin-1 alphabet).

ISO 639 also has a note field in English and in French.

With respect to ISO 639, two initial observations must be made. The first is that Canada (and the United States) has not adopted ISO 639 as a "national standard" due primarily to its current lack of inclusion of North American aboriginal and native languages. Secondly, the LANG attribute is important in SGML (ISO/IEC 8879). for example, in the proposed New Work Item (ISO/IEC JTC1 N4742) for "Standard HTML", the LANG attribute

"identifies a natural language spoken, sung, written or otherwise used by human beings for communication between people. Computer languages are explicitly excluded. The value of the LANG attribute is referred to as the "language tag"... The name space of language tags is administered by IANA. Example tags include: en, en-US, en-cockney, i-cherokee and x-pig-latin.

Two letter primary tags are reserved for ISO 639 language abbreviations. This Committee Draft does not specify three-letter primary tags, however their description may be found in the "Ethnologue" {Gri92}. Any two-letter initial sub-tag is an ISO 3166 country name..."

Serious reflection and more systematic thinking is required here with respect to "tags" especially if one wishes to use SGML ® HTML ® XML generally and in electronic commerce specifically as well as ensuring interoperability not only with the use of other syntaxes but among various consumer markets, industry sectors, etc.

First of all, "i" and "x" are single characters; and they do not exist in ISO 639. Secondly, "cherokee" and "pig-latin" are not ISO 639 languages. Thirdly, for "en-us", it is not clear at all, given the other examples whether this represents English language as used in the United States or something else.

Fourthly, use of Alpha-2 code tags for ISO 3166 country name is confusing vis-à-vis ISO 639 language codes. They overlap and are not mutually exclusive. This at times is confusing for humans (and even more so for "dumb" computers). Fifthly, in many sort algorithms and search/retrieval engines, upper and lower case letters are treated the same. This causes even more confusion in IT-enabled processing of these code sets if two letter alphas are used as codes for both countries and languages.

Finally, there is an urgent need to update ISO 639 to include North American aboriginal and native languages as well as providing for a systematic means for handling and registering user extensions, (e.g., "cockney", "pig latin", "klingon", etc.). Alternatively, one could consider developing an "ISO 639 Level 2" standard for codes representing user extensions of the nature noted above, as well as "historical languages", , i.e., as is being developed for ISO 3166-3.

Even more important is the need to develop a systematic and unambiguous interworking in an IT-enabled manner among language code (ISO 639), currency and fund codes (ISO 4217) and country codes (ISO 3166-1).

One should also develop mechanisms for the interchange of the "same" data content from a cross-industry sector perspective but using different code sets in the same domain. To assist in progressing work in this area, chapter concludes with an Exhibit 3.4 which consists of a sample concordance table (English language version only) for ISO 3166 (Country Codes) + ISO 639 code set on the one hand, and on the other, the equivalent Library of Congress (LC) country and language code set and the UDC language code set.

Notes on Exhibit 3.4

[1] As provided by Féderation internationale d'information et de documentation (FID) based on documentation prepared in 1994 (and verified with them early 1998). The UDC also has "country codes", i.e., "place codes", as a "common" auxiliary table, but this has not been included in Exhibit 3.4.

[2] For human representation, we have included the "Short Name - English" as the linguistic equivalent for the ISO 3166-1 3-digit numeric code.

[3] For human representation, we have also included the ISO 639 English name of the language. There is also the French name and of course the actual "name" of the language in the language itself. ISO 639 captures this "Original" name in its Latin alphabet equivalent version.

[4] One notes that the LC alpha country codes are not the same as ISO 3166-1 alpha codes for the same entities.

[5] Added here to indicate that in Canada under the Nunavut Act, a new "territory" will be established 1 April, 1999 from the existing Northwest Territories, i.e., "Nunavut". In Nunavut, in addition to English and French, Inuktitut will become a recognized "official" language. The language code "ik" is the one that has been reserved for Inuktitut.

[6] In ISO 639 the "ik" represents "Inupiak" which is grouped/classified as an "Eskimo language". In UDC, the language code 562 refers to "Inuit". There is no code for "Inuktitut" per se.

[7] The LC codes place "Inuktitut" under the Eskimo family of languages.

[8] One notes that all LC language codes are not the same as ISO 639. At times even the first letter is not the same, (e.g., "nl" versus "dut").

^[1]Apart from some minor editing changes, (e.g., renumbering, spelling, typos, etc.), Chapter 2 is a verbatim extract of Clause 6 of the BT-EC Report to JTC1, i.e., JTC1 N5296, pages 22-27.

[2]The acronym "MARC" stands for "Machine Readable Cataloguing". The preceding characters represent the country who utilize the "MARC" format, have amended it for their specific cataloguing needs, and have an infrastructure at the national level for addressing these national needs. There are primarily 3 countries namely the US, Canada, and the UK Hence the designation of, USMARC, CANMARC, UKMARC.

[3]The WCO is but one example of "coordinated autonomy" among autonomous organizations. The degree to which autonomous organizations achieve interoperability from a business operational perspective sets the limit to the extent of interoperability of supporting IT-based functional services.

IT-Needs (Interface)	Localization and Multilingual Requirements [1]
3166:246	Alpha-2: FI	Alpha-3: FIN
	Short Name (en) [2]: Finland	Long Name (en): Republic of Finland
	Short Name (fr): Finlande	Long Name (fr): République de Finlande
[3] ->	Local Short Name (fi): Suomi	Local Long Name (fi): Suomen tasavalta
->	Local Short Name (sv): Finland	Local Long Name (sv): Republiken av Finland
3166:056	Alpha-2: BE	Alpha-3: BEL
	Short Name (en): Belgium	Long Name (en): Kingdom of Belgium
	Short Name (fr): Belgique	Long Name (fr): Royaume de Belgique
->	Local Short Name (nl): Belgie	Local Long Name (nl): Koninkrijk van Belgie
->	Local Short Name (fr): Belgique	Local Long Name (fr): Royaume de Belgique
3166:792	Alpha-2: TR	Alpha-3: TUR
	Short Name (en): Turkey	Long Name (en): Republic of Turkey
	Short Name (fr): Turquie	Long Name (fr): République turque
->	Local Short Name (tr): Turkiye	Local Long Name (tr): Turkiye Cumhuriyeti

ISO				Library of Congress		UDC [1]
3166-1		639
Numeric Code	Short Name (E) [2]	Applicable Languages (E) [3]	Applicable Language Codes	Country Codes	Language Codes	Language Codes
3166-1:124	Canada	English	en	xxc [4]	eng	= 111
		French	fr		fre	= 133.1
		Inuktitut [5]	ik [6]		esk [7]	= 562
3166-1:056	Belgium	French	fr	be	fre	= 133.1
		Dutch	nl [8]		dut	= 112.5
3166-1:246	Finland	Finnish	fi	fi	fin	= 511.111
		Swedish	sv	sw	swe	= 113.6
3166-1:792	Turkey	Turkish	tu	tu	tur	= 512.164
3166-1:840	United States	English	en	xxu	eng	= 111
3166-1:826	United Kingdom	English	en	xxk	eng	= 111
		Scots Gaelic	gd		gae	= 152
		Welsh	cy		wel	= 153.1

IT-Needs (Interface)	Country Code - Short Name (en) [2]	Localization and Multilingual Needs [3]
HS: 0701	124 CANADA	(en): potato (fr): pomme de terre (ik): patiti [4]
	464 MEXICO	(es): papa
	724 SPAIN	(es): patata
	040 AUSTRIA	(de): erdapfel
	276 GERMANY	(de): kartoffel
	056 BELGIUM	(fr): pomme de terre (nl): aardappel
	246 FINLAND	(fi): peruna (sv): potatis